EVALUATING BOX COMPRESSION STRENGTH (BCS) USING AN ARTIFICIAL NEURAL NETWORK (ANN) By Juan Gu A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Packaging – Doctor of Philosophy 2025 ABSTRACT Though box compression strength (BCS) is commonly used as a performance criterion for shipping containers, the state-of-the-art BCS estimation produces results within a broad range of values. In this study we implemented a new approach, artificial neural networks (ANN), to explore how much data may be needed for an ANN to reasonably predict compression strength, and how the ANN approach performs while facing variation that adversely impacts other modeling methodologies. An ANN model can be built by comprehensively adjusting four modeling factors that interact with each other to influence model accuracy and can be optimized by minimizing model MSE. Using both data available from the literature and a “synthetic” data set using idealized data based on the McKee equation, we find that model estimation accuracy remains limited by the uncertainty or error in the input parameters combined with uncertainty from the ANN process itself, and we produce an estimate for this impact. The population size to build an ANN model that can reasonably estimate BCS has been identified based on different data sets in this study. Packaging design plays a crucial role in ensuring the protective performance of packages. Various factors must be considered to ensure package strength during the packaging design process. Understanding the relative importance of each influencing factor or design feature provides valuable insights for optimizing packaging material utilization. However, current methods such as testing and finite element analysis have limitations in evaluating the relative significance of these parameters. In response to these challenges, in this research, we applied different methods to comprehensively evaluate the relative importance of different packaging design features on a given packaging property. Using BCS as a representative packaging property, the relative importance of up to six BCS features (Edge Crush Test (ECT), Perimeter, Thickness, Depth, and Flexural Stiffness in both the machine and cross-machine directions) were evaluated. Four distinct ANN methods were employed - Connection weights method, Gradient-based method, Permutation method, and SHAP values. These techniques were applied to two datasets: one comprising "synthetic" data based on the McKee formula and the other representing real-world scenarios. The reliability of these methods was assessed. Different input feature importance (FI) scores obtained from the four methods have been calculated and compared with theoretical BCS FI derived from the McKee formula. The BCS feature ranking result given by the synthetic data is verified by the theoretical feature importance ranking indicated by the McKee formula. Although box depth is considered to have zero importance in the McKee formula, the BCS feature importance ranking from the real dataset highlights its significance, aligning with buckling theory. The study gives an insight into the BCS feature importance evaluation using ANN approach and guides packaging design material and cost saving. The ultimate objective of this research is to develop a comprehensive ANN model for predicting Box Compression Strength (BCS). To achieve this, we utilized a dataset encompassing a wide range of box dimensions commonly encountered in industrial applications. After applying multiple optimization methods to determine the optimal number of hidden neurons and further identifying the key factor values influencing the model, a generalized ANN model was trained. The trained ANN model can predict the BCS commonly used in the industrial applicable level with an error of 9.51%. The primary factor contributing to the high BCS error is the presence of boundary data points and the small sample size of the current data set. One possible strategy to improve ANN prediction accuracy is to continually expand the current dataset sample size using available resources. In essence, this study serves as a roadmap for forthcoming research endeavors seeking to leverage ANN techniques to tackle challenges and provide solutions within the corrugated industry. COPYRIGHT BY JUAN GU 2025 Dedicated to my parents and my husband. Thank you for always believing in me. v ACKNOWLEDGEMENTS I wish to extend my sincere gratitude to several individuals and organizations whose contributions have been instrumental in the completion of this dissertation. Firstly, I want to express profound gratitude to Dr. Euihark Lee. His mentorship, encouragement, and willingness to explore unconventional avenues of inquiry have been invaluable to my growth and development as a researcher. Secondly, I am immensely grateful to the Packaging Corporation of America for their generous provision of resources and data that underpinned this research. Without their support, this work would not have been possible. I extend my heartfelt thanks to Packaging Corporation of America for their invaluable insights into corrugated board compression in the industry. Their guidance has greatly enriched the depth of this study. I also acknowledge with gratitude the support and advice of Dr. Amin Joodaky, Dr. Qiang Yang, and Dr. Yan, whose contributions have significantly enhanced the quality of this research. Special thanks are due to the School of Packaging for allowing me to begin my PhD journey and for providing the necessary infrastructure and equipment for this study. I am indebted to my colleagues for their invaluable suggestions in machine learning and for the cherished memories we have shared. I express my deepest appreciation to the professors at the School of Packaging for their wisdom, guidance, and unwavering support throughout this endeavor. Thank them for their remarkable contributions and support throughout this journey. vi TABLE OF CONTENTS CHAPTER 1: BACKGROUND .................................................................................................. 1 1.1 BOX COMPRESSION STRENGTH (BCS) ......................................................................... 1 1.2 APPROACHES FOR BCS ESTIMATION ........................................................................... 8 1.3 ARTIFICIAL NEURAL NETWORK (ANN) ..................................................................... 15 1.4 APPLICATIONS OF ANNS ............................................................................................... 24 1.5 RESEARCH OVERVIEW .................................................................................................. 28 CHAPTER 2: A COMPARATIVE ANALYSIS OF ARTIFICIAL NEURAL NETWORK (ANN) ARCHITECTURES FOR BOX COMPRESSION STRENGTH ESTIMATION ... 31 2.1 INTRODUCTION ............................................................................................................... 31 2.2 DATA SETS APPLIED ....................................................................................................... 32 2.3 ANN KEY FACTORS INITIALIZATION ......................................................................... 33 2.4 ANN AND MCKEE DATE SET ......................................................................................... 37 2.5 ANN AND AN IDEALIZED DATA SET ........................................................................... 43 2.6 ANN AND A DATA SET WITH VARIATION .................................................................. 50 2.7 CONCLUSION.................................................................................................................... 53 CHAPTER 3: EVALUATION OF PACKAGING DESIGN RELATIVE FEATURES IMPORTANCE USING ANN ................................................................................................... 55 3.1 INTRODUCTION ............................................................................................................... 55 3.2 CURRENT METHODS FOR EVALUATING BCS FEATURE IMPORTANCE .............. 55 3.3 ANN APPROACH FOR EVALUATING FEATURE IMPORTANCE .............................. 57 3.4 FLOW OF FEATURE IMPORTANCE EVALUATION USING ANN .............................. 64 3.5 CASE STUDY FOR FEATURE IMPORTANCE ANALYSIS ........................................... 73 3.6 CONCLUSION.................................................................................................................... 84 CHAPTER 4: BUILDING A GENERALIZED ANN MODEL TO EVALUATE BCS ...... 86 4.1 INTRODUCTION ............................................................................................................... 86 4.2 EXTRACT REAL DATA SET TO COVER THE MAJORITY OF BCS IN THE INDUSTRY ............................................................................................................................... 87 4.3 DETERMINATION OF HIDDEN LAYER NEURON SETTING ..................................... 88 4.4 TRAINING ANN MODEL TO EVALUATE BCS IN THE INDUSTRY .......................... 89 4.5 CONCLUSION ................................................................................................................... 92 CHAPTER 5: RESEARCH SUMMARY AND FUTURE RESEARCH ............................... 94 5.1 RESEARCH SUMMARY ................................................................................................... 94 5.2 FUTURE RESEARCH ........................................................................................................ 97 BIBLIOGRAPHY ....................................................................................................................... 99 vii CHAPTER 1: BACKGROUND This chapter focuses on the Box Compression Strength (BCS) of corrugated packaging using the Artificial Neural Network (ANN) approach. This chapter covers four main sections: BCS, existing approaches for BCS evaluation, ANN, and the application of ANN. The BCS section discusses the application of corrugated packaging, the reasons for its failure, and the factors influencing BCS. The section on existing methods for evaluating BCS introduces the shortcomings of each method. The ANN section covers the architecture and components of ANN, its working principles, and its various types. The application of ANN section introduces different fields involving ANN applications, with a particular emphasis on packaging. 1.1 BOX COMPRESSION STRENGTH (BCS) In the packaging industry, evaluating packaging properties is essential to ensure the reliability of a package's utilization. Among various types of packaging, corrugated packages have gained significant popularity in the modern market. Due to the unique properties of corrugated paperboard, evaluating the properties of these packages has become a critical research topic. Given the diverse demands of the market, estimating the strength of corrugated boxes has become increasingly important. Box Compression Strength (BCS) is one of the most crucial parameters to consider for corrugated packages. Over the past 130 years, the compression strength of corrugated boxes has been extensively studied due to failures occurring during the shipping, distribution, and storage processes of various [1]. 1.1.1 Ubiquitous corrugated box Corrugated boxes are one type of shipping container that is widely used in the market currently. Corrugated boxes are made from paper, and machine-shaped from corrugated box boards with hollow structures. Since the corrugated box was first accepted by legal freight classification 1 organizations as containers for freight transportation, the application history of corrugated boxes and studying corrugated box dynamics has been over more than 100 years. Corrugated boxes are widely applied in various fields [2] because of their lightweight, low cost, ease of assembly and disassembly, good sealing performance, certain cushioning and anti-vibration ability, and easy recovery and waste treatment. The most commonly used corrugated box structure is the Regular Slotted Container (RSC) due to its simplicity in production, formation, and ease of use. With the development of the economy, e- commerce has become increasingly popular. As e-commerce advances, the types of corrugated boxes have diversified. Various box structures are now being utilized in the market, as shown in Figure 1 [3]. Figure 1 Different structures of corrugated boxes used in the market The utilization of corrugated boxes has become widespread across various countries. According to the 2015 Global Corrugated Packaging Market Overview report, based on data from the United 2 Nations, each person in the world uses packaging worth over USD 110 annually, significantly contributing to the expansion of the packaging industry [4]. The corrugated packaging industry is witnessing an incredible growth due to the increasing demand in packaging for food & beverages, personal and household care products, medicines, and other products. The booming e-commerce industry is playing a vital role in the adoption of corrugated packaging for consumer goods. Around 85% of corrugated packaging is used for shipping boxes where high protection is required. Moreover, the increasing popularity of corrugated retail display stands which are used for effective highlighting of products in retail stores is likely to contribute to the expansion of the corrugated packaging business. These factors are increasing the global production of corrugated boards. In 2017, as per International Corrugated Case Association (ICCA) over 240 billion square meters of corrugated boards were produced, where North America occupies 30% revenue share of the total production [4]. Furthermore, innovative solutions provided to key vendors for the adoption of corrugated packaging are also contributing to the growth of the global corrugated packaging market. For instance, International Paper Co. uses cellulose fibers in corrugated packaging which is mainly used for packaging textiles, construction material, paints and coatings, and other non-durable goods [4]. The global corrugated packaging market is segmented on the following basis: Slotted boxes, Telescope boxes, Folder boxes, Self-Erecting Boxes / Auto-Bottom Boxes, Bliss / Rigid Boxes, and Others (Mailing Boxes, Bin Boxes, Slide Boxes). Based on End-User, the global corrugated packaging market is segmented into Food & Beverages, Electronic Goods, Personal and Home Care Goods, Glassware & Ceramics, Healthcare & Pharmaceuticals, and Others (Textile, Chemical, Paper products). Based on Geography, the global corrugated packaging market is segmented into North America (U.S. & Canada); Latin America (Brazil, Mexico & the rest of Latin America); 3 Europe (the U.K., Germany, France, Italy, Spain, Poland, Sweden & the rest of Europe); Asia- Pacific (China, India, Japan, Singapore, South Korea, Australia, New Zealand, the rest of Asia); Middle East & Africa (GCC, South Africa, North Africa, the rest of Middle East and Africa) and Rest of World. All these markets have a large market size measured in USD billions and a production quantity measured in tonnes. Almost 80% of the volume of paper packaging used in the United States are corrugated boxes. A similar proportion of goods are transported using corrugated boxes. The goods are not only the goods from the distribution process to the end user but also the parts of goods brought to their assembly locations using corrugated boxes as well. Corrugated boxes protect the products during almost all the phases of the distribution process [1]. Corrugated boxes are one of the main types of delivery packages in China as well. 9.9 billion corrugated boxes have been used in China as reported in 2015 [5]. Due to a huge spike in the e-commerce segment, the corrugated packaging market is growing rapidly, as a consequence the global corrugated market is growing faster at the rate of 5.62% annually and it has been predicted to achieve $386 billion in 2026, as reported by the Indian Pulp and Paper Technical Association [6]. E-commerce retail sales are continuing to surge, with estimates of around 20% annual growth in e-commerce trade in Europe. This will have a profound impact on packaging demand, especially in the corrugated industry as it represents 80% of demand in e-commerce. The corrugated Packaging Market size was valued at USD 70 billion in 2022 and is poised to depict a 4% CAGR through 2023-2032, on account of the burgeoning e-commerce sales worldwide [7]. As sustainable development becomes a global priority, corrugated packages are gaining popularity in packaging, reflecting the growing emphasis on sustainability throughout the value chain. Corrugated packages are easy to recycle, and the pulp and paper industry has already adapted to 4 converting these into new generations of container boards. Consumers prefer corrugated protective formats over polymer-based alternatives, such as expanded polystyrene (EPS) foams [6]. 1.1.2 Reason of failure of corrugated packages The failure of corrugated boxes can be influenced by both distribution and material factors. The actual BCS of corrugated will decrease over time due to various environmental factors, such as stacking height, the mass of the filled box, the number of layers packed high, types of pallets used and overhang, unitizing practice, and the number of pallets high packed at the storage, storage, and distribution time, and transportation circumstances [8]. The overhang has a significant influence on the BCS of corrugated boxed during the storage and shipping process. The major strength of a container is mostly derived from the corners [9] as demonstrated in Figure 2 below. Figure 2 Load Distribution along the Perimeter of Corrugated Box (source: chrome- extension://efaidnbmnnnibpcajpcglclefindmkaj/https://www.ijltemas.in/DigitalLibrary/Vol.6Issu e7/26-28.pdf) Practice such as overhang should be avoided as it has been found that the deficit in BCS of packed boxes as an effect of overhang can range between 23–49% but vary on the extent and direction (length, width, or adjacent panel) of the overhang [10]. Another practice that should be avoided is misalignment of the boxes when packed on each other on a pallet as it plays a significant role in 5 decreasing the strength and the lifetime of the box, where the percentage decrease in BCS (lateral) value can be as much as 11% and 31% for 90% contact area and 80% contact area, respectively [11]. To ease the consequences of environmental factors, the end user must minimize the habit of particular methods that negatively affect the strength characteristics of the corners, including overhang on the pallet, packing on pallets with only a few slats, excessive shrink warp tension, and using “interlocking” stacking patterns [11]. Corrugated cardboard is a highly deformable material, the limit of its use may be delimited by the deformation of the box [12]. Corrugated boxes are made from a specialized material known as corrugated paperboard. It is necessary to accurately estimate the strength of corrugated boxes before applying them in real-world scenarios. This is due to their unique material composition, which allows their structure to be easily customized and strengthened to achieve high packaging performance but can also degrade over time due to prolonged use or environmental factors such as humidity. Paper material is orthotropic, exhibiting non-linear mechanical properties, which means that it possesses varying strength in different directions. For instance, the tensile strength of paper fibers in the machine direction can double compared to that in the cross-machine direction as strain increases [13]. Consequently, the orientation of corrugated board utilization becomes critical. However, even when the corrugated board is used in the correct direction, the weaker direction of this anisotropic material can lead to the failure of the corrugated box under specific conditions. For example, unexpected damage to the box's side panels can be induced by shock or piercing, significantly reducing the strength of the corrugated box. 1.1.3 Influencing factors of BCS BCS is influenced by various factors, such as material properties, flute types, dimensions, and more. Each factor, or BCS feature, affects the BCS differently. The BCS features for corrugated packaging 6 can be systemized into three groups, the mechanical strengths of the raw paper material, the corrugated board, and the corrugated box itself. In the level of raw paper, the key factors involve liner type, weights of liner and a constant related to the fluting. The critical influencing factors for BCS involve the ring crush test (RCT), Concora liner test (CLT), Takeup factor, thickness, flexural stiffnesses in Machine Direction and Cross Machine Direction (MD and CD), Edge Crush Strength (ECT) and moisture content, etc, as shown in Figure 3. When it comes to the level of corrugated box, dimensions and perimeter of the box, design structure, applied load ratio, stacking time, and buckling ratio all have significant impact on the BCS value of a corrugated package. What’s more, some other factors also make a difference in BCS, such as the presence of openings, ventilation holes and perforations, moisture content of the box, storage time, stacking conditions, etc [14]. t ECT Various dimension Flexural stiffnesses in MD& CD (EI x ) EI y Ring crush test (RCT) Flute type Concora liner test (CLT) Different paper strength Ventilation holes Figure 3 BCS Influencing Factors ECT (Edge Crush Test) can be a vital indicator for the BCS of the corrugated packages. The BCS of a packaging container of the regular slotted container (RSC) design has been anticipated from the ECT value of the board [15]. The ECT & BCS, Stiffness & BCS, and Thickness & BCS links were proven to be solid and positive correlations. The effect of box depth (which is not included in the McKee formula) is that the box becomes weaker as the height increases due to the wall buckling, where the compression strength dropped by as much as 62% from the 127mm to 1219mm box heights, which points out the weakness of 7 using McKee formula [9]. The stacking pattern has a significant impact on the BCS. Existing research has revealed that column stacking practice results in higher strength as compared to interlocking stacking patterns, and while stacking boxes, one needs to ensure that the four carton corners are placed in alignment [9], see Figure 4 below. Figure 4 Column and interlocking stacking pattern of corrugated boxes The mechanical properties are important design features because the function and performance of a product depend on its capacity to resist deformation under the stresses encountered in use, hence in design, the usual objective is for the product and its components to withstand these stresses without significant change in geometry [16]. Edge Crush Test (ECT) and Flat Crush Test (FCT) are the main two tests that determine if the mechanical properties of the corrugated board will meet the set or targeted performance of the box in the market [17]. 1.2 APPROACHES FOR BCS ESTIMATION The evaluation methods of BCS primarily include three approaches: compression tests (the most traditional method), mathematical models, and Finite Element Analysis (FEM). Each method has its drawbacks in BCS evaluation, and while researchers have made efforts to address these challenges, improving evaluation efficiency and accuracy still presents difficulties. 8 1.2.1 Compression Test Compression test is one of the commonly used methods to test the corrugated box compression strength or stack load, to make sure the boxes do not fail when stacked over each other during the storage and distribution process. The Box Compression Test (BCT) is a standardized procedure designed to measure the maximum pressure or force that a material can withstand before rupturing. It is particularly relevant for assessing the strength of corrugated and paperboard materials commonly used in packaging applications. Different test standards are applied based on the requirements of corrugated packaging of different utilizations. The test standards involve ISO 12048 Packaging - Complete, Filled Transport Packages - Compression and Stacking Tests Using a Compression Tester, TAPPI T 804 Compression test of fiberboard shipping containers, ASTM D642 Standard Test Method for Determining Compressive Resistance of Shipping Containers, Components, and Unit Loads, JIS Z0212 Japanese Industrial Standard Method of Compression Test for Packaged Freights and Containers, ASTM D4169 Standard Practice for Performance Testing of Shipping Containers and Systems. The ASTM D642 is developed by the American Society for Testing and Materials (ASTM) to determine the compressive resistance of shipping containers, components, and unit loads. The key points include the test method which is to apply the compressive force to the package until failure and the result interpretation about how to determine the maximum compressive load. TAPPI T804 is provided by the Technical Association of the Pulp and Paper Industry (TAPPI) focusing on the determination of the compressive strength of fiberboard shipping containers by applying the compressive load until the failure of the packages. ISO 12048 is an international standard that defines the compressive and stacking tests for transport packages by applying compressive force to a package until failure or set load to simulate the stacking conditions in warehouses. JIS Z0212 focuses on compression test methods for corrugated 9 fiberboard boxes by applying compressive load until collapse and recording the force [18]. ASTM D4169 is a (Food and Drug Administration (FDA) - recognized consensus standard for conducting a transit simulation study for sterile barrier medical device packaging systems, it is the most common choice in the medical packaging industry [19]. Overall, the box compression Test is a fundamental assessment that evaluates the strength and resilience of packaging materials, and the compression test gives insight into the optimization of packaging design and material selection. However, there are some limitations of the compression test. The package sample is limited by the laboratory facilities. The test conduction is limited by the laboratory setting and environment. Besides, a lot of other factors can reduce the accuracy of the compression test, including systematic errors, instrumental errors, environmental errors, procedure errors, and human errors [20]. Some instruments have limitations, which can cause consistent deviation from the real value. Laboratory temperature and humidity can change because of the unexpected failure of electronic devices. Human errors can cause measurement deviation, which cannot be eliminated in laboratory testing. In addition, the compression test process is very time-consuming and costly, starting with preconditioning of at least 24 hours in the required temperature and humidity environment, then setting up the sample, mounting it onto the testing apparatus, and ensuring that it is evenly aligned and free from any wrinkles or folds that could affect the test results, then applying the pressure steadily using the testing machine with a certain moving speed [21]. A testing machine for the compression test is shown in Figure 5. To minimize errors, multiple samples need to be tested. When it comes to various box structures, box dimensions, box materials, etc, the workload of physical testing increases dramatically. The current compression test can only test boxes one by one, which is very low efficiency. On top of repeated testing of one single sample, samples with different material properties, and different batches of product source can further increase the error 10 of compression test. Thus, these are the drawbacks of compression testing. Figure 5 Testing machine for Box Compression Test 1.2.2 Mathematical models Many mathematical models have been developed for BCS evaluation. The McKee formula is one of the most commonly used mathematical models used in industry for BCS estimation for corrugated packages, as shown in equation (1) [9]. The McKee formula was developed by McKee et al. [22] in the 1960s. McKee’s formula estimates the BCS of corrugated boxes by employing three basic physical parameters of a box, including ECT, thickness, and box perimeter. However, the McKee formula is limited in its predictive accuracy by the uncertainty in the measurements of package properties [23]. The McKee formula is simple and accurate to predict the regular slotted container (RSC). The research by McKee et al. [22]. presented certain limitations due to the simplification of more general physical relationships. Fundamentally, these were linear regression 11 analyses based on specific data sets, typically limited by processing constraints. 𝐵𝐶𝑆 = 5.87 × 𝐸𝐶𝑇 × √𝑇ℎ𝑖𝑐𝑘𝑛𝑒𝑠𝑠 × 𝑃 Where: ECT – Edge Crush Strength (lb/in). Thickness – Thickness of the corrugated board (in). P – Perimeter of the box (in). (1) The McKee formula has limitations in that it must only be utilized when the length-to-width ratio or the height-to-length ratio of the box is not too large. Specifically, this assumes that the length is less than three times the width, and the perimeter is less than seven times the depth [14]. However, corrugated packaging patterns have become more and more diverse as the advancing of e-commerce, corrugated packages with various dimensions become more and more common. Plus, the McKee formula is not able to estimate the BCS of packages with various patterns available in the market, as it does not account for variations in material properties and box structures. The limitation of the McKee formula is the limited consideration of only three physical parameters of a box. There are a lot of other physical parameters that influence the BCS, such as structural mechanics factors (flexural stiffness, torsional stiffness, diagonal stiffness), production factors (crush, scoring, slotting quality), use factors (the squareness of the box when erected, how the box is sealed). Many of these are difficult or impossible to capture in a closed-form mechanistic model of BCS. For example, torsional stiffness, also called shear stiffness, measures the torsional resistance of a corrugated board in the machine direction (MD). When a corrugated box undergoes compressive loading in the MD, the side walls tend to deform outwardly in a buckling response to the compression. This deformation is affected by the longitudinal shear stiffness. The shear stiffness can directly influence how well the box can protect its product [24]. However, MD torsional stiffness is a more sensitive predictor of corrugated board performance and there is no test standard for this parameter [25]. 12 As mentioned above, buckling is another critical factor that influences the compression strength of corrugated boxes. Urbanik, T. J., & Frank, B. (2006) studied the impact of buckling on the box compression strength and formulated a mathematical equation to demonstrate the relationship between buckling and BCS [26], as shown in equation (2). However, this equation involves some parameters (such as the flexural stiffness in transverse, axial, and twisting directions) that are difficult to obtain after the corrugated board production process, causing the utilization restriction of this equation to estimate the BCS. Although researchers have attempted to develop other equations to estimate BCS more accurately, these equations still contain parameters that are difficult to access. For example, an improved version of the McKee formula after considering buckling is shown in equation (3), the flexural stiffnesses in the transverse, and axial directions are included in this equation, and they are usually not measured after the corrugated board production process. For inelastic buckling 𝑃1= 𝑃𝑓l= α𝑃𝑚l For elastic buckling (1−η) (𝐸𝐼𝑥𝐸𝐼𝑦)-η l(1−2η) (( ( d 𝑃1 = 𝑃𝑓l= α(4𝜋2)η𝑃𝑚 𝐸𝐼𝑥 𝐸𝐼𝑦 )1/4)𝜏 2ĉ+𝑀 4(1−𝑣2) )η 𝑙 Where: P – compression. Pm – ECT. l – panel length. d – depth. EIx, EIy, EIxy – Flexural stiffness per unit width in transverse, axial, and twisting directions. 𝑣 – Geometric mean Poisson’s ratio. ĉ – Normalized in-plane shear modulus of elasticity derived in Urbanik (1992) cˆ= 𝑣 + 2(1 - 𝑣2)(𝐸𝐼𝑥𝑦/ 𝐸𝐼𝑥)√𝐸𝐼𝑦/𝐸𝐼𝑥 . 𝜏- Empirical improvement. η = 1 - b (b is McKee formula constants). 𝐵𝐶𝑆 = 2.028𝐸𝐶𝑇0.746√𝐸𝐼𝑥 × 𝐸𝐼𝑦) 𝑃0.492 0.254 Where: ECT - Edge Crush Strength (lb/in). 13 (2) (3) EIx, EIy - Flexural stiffness in the machine direction & cross-machine direction of the corrugated board (lb*in). P - Perimeter of the box (in). 1.2.3 Finite Element Analysis (FEA) Finite element analysis (FEA), a powerful technique often used for the simulation of engineering processes, is finding a home in the corrugated industry, and has been applied to evaluate the BCS of corrugated packages. FEA models generate predictions by leveraging fundamental physical mechanics across different length scales, stitching together functional relationships to estimate the effect of changes in very basic material properties (e.g., paper elasticity) on the larger final system (e.g., box strength). When the functional form is known, the propagation of parameters and their impacts produce a prediction of the result. Various studies have explored using an FEA approach to predict ECT 26-29. or BCS 30-39., allowing for a detailed examination of the impact of moisture, perforations, holes, openings, crushing, and more complex structures. Literature has grown so extensively that even review articles addressing the usefulness of FEA on broader topics have sections discussing corrugated paperboard packaging [27]. Each of these studies requires detailed information on the material parameters to input into the models, typically producing a reasonable agreement between the model and the limited number of physical samples evaluated. As such, they potentially contribute to our understanding of the impact of specific changes examined (i.e., hole size and placement.) [28]. However, very few of these studies address or investigate how well their models work with boxes made of different, varied, or unknown materials. They also don't often discuss how the varying physical and mechanical properties of paper or combined board can affect the accuracy of their predictions. Typically, the input parameters required for an FEA are not properties regularly measured in the papermaking or box-making process. Thus, existing (published) analyses cannot reasonably be used for a generalized assessment of a random box in the same way that we can use the McKee equation. 14 1.3 ARTIFICIAL NEURAL NETWORK (ANN) ANN is a subset of AI, which serves as an intelligent tool with great advantages in data processing and estimation. Artificial neural network was introduced in 1956 [29]. Artificial neural networks are inspired by the human biological neural network. An ANN is an algorithm that can recognize the relationships of a set of data and use the computer to make decisions or predictions. The Artificial neural network (ANN) model involves computations and mathematics, which simulate the human–brain processes. The Artificial Neural Networks (ANNs) are a very different computing approach that can also be used to explore the underlying relationships in a set of data and generate predictive models. ANNs have many advantages because they can strive to take whatever information we happen to know in terms of materials inputs and gather relationships to the outputs of interest. This inference process can take in a broader range of inputs, teasing out their connections (implicit or explicit) to “understand” their relationship to a given output. The goal of ANNs is to minimize the error of the predicted property. By mapping features in data, ANNs can substantially add to the power of exploratory data analysis [30]. Using ANNs can bring many benefits to scientific research [31], making more consistent decisions and shortening the decision-making process [32]. Given the fundamentally non-linear relationship between fiber characteristics and the mechanical properties of paper, combined board, and boxes, this alternate approach is beginning to garner interest among researchers [33, 34]. The prediction capability of ANN potentially allows us to incorporate a large number of input parameters into a single prediction model, limited only by the size of our data set. ANN research to date has focused on specific areas or factors influencing box strength [35]. 1.3.1 Components and architectures of ANNs An ANN consists of three types of layers, including the input layer, hidden layers, and output 15 layers. Each layer contains several neurons which are real numbers, and these neurons are connected by weights which are the influence strength between the two neurons connected. If the neuron in the last layer has a strong influence on the neuron in the next layer, the weight is a large number. If the influence of the neuron in the last layer is weak on the neuron in the next layer, the weight is a small number, a typical ANN schematic is shown in Figure 6. An activation function is involved to realize nonlinear patterns between the input and output. An ANN can have two or more hidden layers, and each layer can have several neurons. Therefore, all the connections or weights between neurons allow an ANN to have a high number of degrees of freedom. Sometimes, a bias is also added to the summation (or the weighted sum of all neurons) to allow an ANN to get activated above a certain value. The biases also increase the number of degrees of freedom of an ANN. As a result, an ANN can have a high flexibility and high capability to recognize the nonlinear pattern for a set of data and provide the best possible prediction through several iteration weights updating. Figure 6 Schematic of a typical ANN Input: Input data is usually labeled which is used for ANNs to learn and recognize the underlying patterns (or relationships) in the input data. Input data can be collected through physical tests, mathematical fabrication, and some other approaches. The values of input data will be the neurons’ 16 values in the input layer. Weights and biases: Weights and biases are the parameters through a whole neuronal network. The weights of the connections between neurons are the adjustable model parameters that govern how the model calculates the output from the given inputs. To some degree, weights can be also regarded as the coefficients of input data. By adjusting the weights, ANNs can reduce the influence of those not important inputs and increase the influence of those critical inputs. In this way, an ANN can nudge its output as close as possible to the real values. The adjusting of weights and biases is the key part of ANNs’ learning. Epoch: An epoch signifies the process of feeding a dataset into the model and the model's weights are adjusted to reduce the overall error. This iterative repetition of the process, known as multiple epochs, continues until the error reduction rate falls below the given criteria. Activation function: Activation function is a series of functions that allows an ANN to realize non- linear operations. Considering the complex problems in real situations in our lives, most of those problems are complicated non-linear problems, such as the changeable temperature around a year or several years. To fit the non-linear problems using ANNs, activation functions are necessary. Without applying a non-linear activation function, an ANN will be just a linear combination of input values. There are different types of activation functions, both linear functions and non-linear functions. Although the linear function has very limited application, it can still be counted as a type of activation function. The non-linear activation functions are more commonly used. Figure 7 presents graphical representations of several commonly used activation functions, which play a crucial role in ANNs. These activation functions introduce non-linearity into the network, allowing it to model complex relationships within data that would otherwise be impossible for a purely linear system to capture. It is this non-linear behavior that enables ANNs to process intricate 17 patterns, make meaningful decisions, and effectively solve complicated real-world problems across various domains, such as image recognition, natural language processing, and predictive analytics. Figure 7 Common activation functions (Source: https://en.wikipedia.org/wiki/Activation_function) The working principle of ANNs is related to the weight updates during the training process. Weights are assigned randomly at the beginning of the ANN learning process. An ANN calculates the outputs by receiving the input neurons’ values, computing the weighted sum of all neurons in the last layer, adding biases, and passing the weighted sum to an activation function. This process is called propagation. Since the weights are randomly assigned initially, there is usually a difference between the outputs and the real values. An ANN will minimize the difference by updating the weights and biases, which is called the backpropagation process. An ANN usually updates the weights and biases several times to give a best-predicted result. This is the working principle of ANNs or how ANNs learn from the data and make predictions [36, 37]. 18 1.3.2 Cost Function The cost function is the criteria of an ANN to adjust its weights. Generally, the cost function calculates the error between the output and actual values and chooses the output with minimum error. There are different types of cost functions used in an ANN depending on what problem an ANN is solving 50-52.. Generally, there are two common types of problems, regression problems and classification problems. Based on the problems that need to be solved, the commonly used cost function includes three types, which are Regression cost Functions, Binary Classification cost Functions, and Multi-class Classification cost Functions. If the problem that needs to be solved is a regression problem, then a regression cost function should be used. If the problem needs to be solved is a classification problem, then a binary classification problem or multiple-class classification cost functions should be chosen. Regression cost function: The regression cost function deals with predicting a continuous value, for example, the weather during a day, or the mileage that a person drives. The regression cost function measures the average error or the average difference between the output and real value of data training over the entire data set. There are three different errors to calculate by using the regression cost function, including Mean Error (ME), Mean Squared Error (MSE), and Mean Absolute Error (MAE). Mean Error (ME) is equal to the sum of the error between the output and the real value, as shown in equation (4). For each training data point, the error between the output and the real value can be either positive or negative, and they can cancel out each other when they are added up then giving zero error to the regression models. Due to this cancel-out problem, Mean Error (ME) is usually not used frequently. 𝑀𝐸 = 1 𝑛 𝑛 ∑(𝑂𝑢𝑡𝑝𝑢𝑡 − 𝑟𝑒𝑎𝑙 𝑣𝑎𝑙𝑢𝑒) (4) 𝑖=1 19 Where n is the number of samples in a trained dataset. Mean Squared Error (MSE) is the average square difference between the output and real value, which is shown in equation (5). MSE doesn’t have the drawback of cancelling out problems in Mean Error (ME), which is more commonly used for regression models. However, the disadvantage of MSE is that it is not very robust for the outliers in a dataset because the square can enlarge the error from the outlier data points. 𝑀𝑆𝐸 = 1 𝑛 𝑛 ∑(𝑂𝑢𝑡𝑝𝑢𝑡 − 𝑟𝑒𝑎𝑙 𝑣𝑎𝑙𝑢𝑒)2 𝑖=1 Where n is the number of samples in a trained dataset. (5) Mean Absolute Error (MAE) is the average absolute difference between the output and real value, as shown in equation (6). MAE improves the shortcoming of Mean Error (ME) by using the absolute value of the error for each data point. MAE is very robust to the outliers in a dataset. So, if a dataset that an ANN needs to train has many noises or outliers, calculating MAE for the regression models is better. 𝑀𝐴𝐸 = 1 𝑛 𝑛 ∑ |𝑂𝑢𝑡𝑝𝑢𝑡 − 𝑟𝑒𝑎𝑙 𝑣𝑎𝑙𝑢𝑒| 𝑖=1 Where n is the number of samples in a trained dataset. (6) 1.3.3 Classification of ANNs ANNs are classified into two types, feed-forward neural networks, feed-back neural networks, or recurrent neural networks [38]. The first type of ANN is called feed-forward neural networks [39]. In feed-forward neural networks, the connections between nodes don’t form a cycle, which means the signals move only in one direction, from input to output. Figure 8 shows the schematic of a typical feed-forward propagation. The feed-forward ANN calculation process cycle includes a forward-step 20 computation of fitting input data into an ANN and a backward-step computation of calculating errors and updating the weights in the model. A single iteration of this computational process is termed an epoch within an ANN. Figure 8 Schematic of a typical feed forward propagation Training process of feed-forward ANN: While training a model using data, a feed-forward ANN approach segments a given set of known data into two uneven groups, training data and testing data. The former is used to build and refine the model, and the latter is used to evaluate the model accuracy. Generally, to assess an ANN, 67% of data of a data set is split into training data, and the remaining 33% is split into testing data. Each node in the hidden layers could be defined based on a weighted sum of the parameters in the prior layer, as shown in equation (7). 𝑖 = 𝑓 ∑(𝑤𝑘 ℎ𝑗 𝑘 𝑗 × 𝑥𝑘) Where 𝑖 is the value of jth neuron in ith hidden layer. j = 1, 2, 3,…, n1 when i =1 ; j = ℎ𝑗 1,2,3,…, n2 when i = 2. 𝑥𝑘 is the value of kth neuron in previous layer, k = 1, 2, 3,…, n. 𝑤𝑘 𝑗 is the weight from the kth neuron in previous layer to jth neuron. (7) 21 f is the activation function. One of the most popular feed-forward neural networks is the convolutional neural network (CNN). CNN is especially good for image recognition and classification because CNN can identify the patterns in an image. CNN can be used to recognize the content or numbers in an image. Figure 9 shows an example of CNN recognizing a handwritten digital number ‘2’. Figure 9 An Example CNN architecture for a handwritten digit recognition task The second type of ANN is called feed-back neural networks also can be called recurrent neural networks (RNN) [40, 41]. Feedback neural networks allow signals to move in both directions, say input to output or output to input, which can form a loop traveling for signals. Feed-back neural networks are dynamic networks that keep changing until they reach an equilibrium point [42]. 1.3.4 Learning strategies of ANNs ANNs use different learning strategies, including supervised learning, unsupervised learning, and reinforced learning. 22 In supervised learning, ANNs learn the underlying relationships between input data and output data. ANNs recognize the governing function involving all input data and the output. It is like a fitting process that fits a function between the input data and output data. In supervised learning, ANNs need labeled input data and output data, which can teach the computer to learn the patterns between the input data and output data. Supervised learning ANNs are mainly used for classification and regression problems. For unsupervised learning, ANNs don’t need a labeled input data set to guide the computer to learn the underlying patterns between input and output data. Instead, unsupervised learning ANNs classify a set of elements according to some similar patterns between data and data. Unsupervised learning ANNs are mainly used for clustering and anomaly detection problems. Reinforced learning neural networks are different from supervised learning neural networks [43]. Reinforced learning doesn’t need labeled input data and output data. Figure 10 is the typical framing of a reinforced learning scenario. Figure 10 The typical framing of a reinforced learning scenario An intelligent agent takes actions in an environment. The environment interprets the agent’s action result as a reward and a representation of the state and gives feedback to the agent so that the agent 23 can adjust its action to maximize the accumulative reward. The environment of reinforced learning typically adopts the Markov decision process (MDP) [44], a mathematical framework that is good for modeling decision-making when the outcomes are partly random and partly under the control of a decision-maker. Many reinforced learning neural networks use dynamic programming techniques. Reinforced learning can be used for environmental learning. 1.4 APPLICATIONS OF ANNS The ANN approach has been utilized in many different applications and various fields over the past few decades [45]. In recent years ANNs have drawn attention in the areas of Facial Recognition, Image Analysis, and Natural Language Processing (NLP). In the field of packaging, ANNs have also been utilized to solve certain problems, such as transport packaging cushioning property evaluation, polymer product characteristics prediction, and municipal solid waste (MSW) management. However, the application of ANNs in packaging strength estimation is very limited. 1.4.1 Applications of ANNs in Facial Recognition, Image Analysis, and NLP ANNs have been applied in facial recognition area. An optimized ANN system using a harmony search algorithm was developed to improve the accuracy of face recognition, which can give a lower Mean Squared Error than the other hybrid ANN system called hybrid particle swarm optimization ANN [46]. Application of ANNs in predicting turbulent stock markets was discussed and studied [47]. A hybrid ANN model based on a genetic algorithm and simulated annealing was developed to predict the stock market with improved accuracy and a new set of input variables for ANN models was proposed [48]. ANNs with different algorithms (including Levenberg- Marquardt, Scaled Conjugate Gradient, and Bayesian Regularization) were studied to predict the Indian stock market, getting an accuracy of 99.9% using tick data [49]. In the field of image classification and regression, Deep Learning (DL), a subset of ANN, has also 24 been applied to characterize the symmetries of simulated measurements of samples. In ref. 216, Ziletti et al. (2018) obtained a large database of perfect crystal structures, introduced defects into the perfect lattices, and simulated diffraction patterns for each structure [50]. DL models were trained to identify the space group of each diffraction pattern. The model achieved high classification performance, even on crystals with significant numbers of defects, surpassing the performance of conventional algorithms for detecting symmetries from diffraction patterns. DL has also been applied to classify symmetries in simulated STM measurements of 2D material systems by Choudhary et al. (2021) [51]. In Natural language processing, one of the major uses of NLP methods is to extract datasets from the text in published studies. Cooper et al. (2019) demonstrated a “design-to-device approach” for designing dye-sensitized solar cells that are co-sensitized with two dyes [52]. Natural language processing can also directly make material predictions without intermediary models. shitoyan et al. (2019) reported that word embeddings (i.e., numerical vectors representing distinct words) trained on materials science literature could directly predict materials applications through a simple dot product between the trained embedding for a composition word (such as PbTe) and an application word (such as thermos electrics) [53]. 1.4.2 Applications of ANNs in Packaging ANNs have been applied in packaging since the 1990s, involving different fields in packaging. According to the reports recently published, ANNs’ applications have been explored in the various parts of packaging, from transport packaging, and cushioning packaging, to packaging design and manufacturing systems, PE product characteristics prediction, and municipal solid waste (MSW) management for classifying different packaging materials. Applications of ANN in transport packaging: Bahrami et al. (1995) developed an intelligent 25 packaging system to retrieve a design from a standard set of chair designs that can satisfy the required needs using ANNs [54]. Siripong Malasri (2015) applied an artificial neural network in transportation packaging to measure the temperature of a wooden software pallet stringer under different temperatures at the time of the drop test, by building several temperature profiles from data collection with different starting temperatures. This application solved the problem of thermocouple cords interfering with the free-fall drop of a pallet sample [55]. Applications of ANN in cushioning packaging: Yanchun Liang & Xiaowei Yang et al. (1996) developed neural networks to identify the nonlinear characteristics in cushioning packaging to help reduce the shock and vibration during the transportation process [56]. Applications of ANN in design and manufacturing: Siripong Malasri et al. (2016) developed a neural network to estimate the temperature profile in a wooden Softwood Pallet Stringer during the time of drop test [34]. Applications of ANN in material product characteristics prediction: Polyethylene (PE) is one of the most widely used polymers in packaging materials. The ethylene index (EIX) is an important variable for PE product characteristics. However, EIX is hard to measure because it is affected by various factors, such as pressure, ethylene flow, hydrogen flow, catalyst flow, etc. To estimate EIX, different neural network models were developed by Akbar Maleki & Mostafa Safdari Shadloo et al (2020). Their result showed that the Multi-Layer Perceptron model could predict the production level of HDPE with a high Regression coefficient [57]. Applications of ANN in municipal solid waste (MSW) management: Municipal solid waste (MSW) is waste from rejected packages with different packaging materials. The sustainable management of MSW is a challenging task when it comes to packaging sustainability, because MSW involves all kinds of packaging materials, including plastics, paper, metal, glass, and wood, and the 26 characterization of different packaging materials is very expensive. That’s how the modeling approaches comes. The classical models are less effective, and artificial intelligence models have drawn the attention of researchers. Adeleke & Akinlabi et al. (2021) explored the application of neural networks in predicting the physical composition of MSW. They optimized the network architecture, training algorithms and activation function of a neural network to predict the fraction of MSW streams from meteorological parameters with high accuracy. Multiple training algorithms, and activation functions are combined and compared to optimize the neural network to predict the percentage composition of four different maximum packaging materials streams based on the data of minimum temperatures, wind speed, and humidity in their case study. Their study result concluded that the complex physical composition of MSW can be predicted with a single hidden layer neural network, which provided the theoretical support for handling MSW and contributed to the academic community related to packaging sustainability [58]. Oliveira & Sousa et al. (2019) also studied a feedforward neural network to identify the varies (from the level of education of the population, the size and level of urbanization of the municipality to factors intrinsic to the waste collection service) influencing the amount of separately collected packaging waste. With a dataset of 42 municipalities in Portugal, their study result showed that the high-performance neural network gave a 34% higher coefficient of determination (R-value) than the traditional regression models [59]. Although the ANN approach has been explored in various aspects of the packaging field, there are limited studies evaluating the BCS of corrugated packages at an industry-applicable level using ANN models. The primary challenge lies in collecting a sufficiently large data set that encompasses the majority of BCS values used in the industry. Existing studies often rely on small datasets that do not adequately represent the broad range of commonly used box dimensions or BCS values, 27 making it difficult to build a generalized ANN model suitable for industry applications. This study aims to bridge this gap by developing a generalized ANN model for BCS evaluation, using a dataset that encompasses the majority of commonly used box dimensions in the industry. 1.5 RESEARCH OVERVIEW This paper evaluates the Box Compression Strength (BCS) of corrugated boxes using an Artificial Neural Network (ANN). Chapter one describes the background of the research, including fundamental knowledge of BCS and ANN, as well as the motivation behind the study. Chapter two details the training of the ANN model using available datasets for BCS evaluation, examining key modeling factors including the number of neurons in the hidden layers, epoch number, number of modeling cycles, and number of data points. By applying datasets from both the literature and synthetic data created using the McKee formula, the optimal values for these factors and the minimum data population needed were identified. This chapter provides insights into the ANN's performance in evaluating BCS values and demonstrates the feasibility of using ANN to estimate BCS. Chapter three investigates the relative importance of packaging design features using the ANN approach. Using BCS as a representative packaging property, four different ANN algorithms (Connection Weights method, Gradient-based method, Permutation method, and SHAP values) were employed to determine the relative importance of different BCS features (Edge Crush Strength (ECT), box dimensions, thickness, and flexural stiffness). A synthetic dataset generated using the McKee formula was used to compute the theoretical relative importance of these BCS features. The ANN predicted BCS feature importance ranking aligns with the theoretical relative importance of studied BCS features. A real dataset from the industry was also used to estimate the relative importance of five BCS features. The ANN predicted feature importance ranking was also 28 consistent with theoretical relative importance calculated from the McKee formula. In addition, the ANN predicted BCS feature importance from the real data set proves the importance of depth was not zero, which aligns with the buckling theory and reveals the inaccuracy of the McKee formula. The result indicates the feasibility of applying the ANN approach to evaluate the relative importance of packaging design features, allowing designers to minimize the design effort by prioritizing changing the more impactful packaging design features. These findings guide for material and cost savings in packaging design. Chapter four covers the development of an ANN using a real dataset that includes box dimensions representative of the majority of BCS values at an industry application level. An extracted dataset from the real data that covers the majority of BCS values used in the industry was applied to train a generalized ANN model. The values of key ANN modeling factors were determined based on the study of previous datasets and five optimization methods for optimizing the hidden layers neuron configuration, including Information Criteria using the AIC method, Hebb's rule, Information Criteria using the BIC method, Optimal Brain Damage rule, and Bayesian method. The optimal hidden neuron configuration was identified by striking a balance between the model prediction error minimization and computational efficiency maximization. The final ANN model prediction error for the test data was calculated for BCS prediction. The error was 9.52%. A possible solution for improving the ANN prediction accuracy is given in the end. Chapter Five summarizes the work of this research and highlights the research directions for future studies. This study provides a methodological guide for future research exploring the applicability of ANN approaches to address problems and answer questions in the packaging industry. Objectives of this research: • Objective 1: Study how the ANN performs in evaluating BCS and determine the amount of 29 data required for reliable ANN predictions. • Objective 2: Validate the ANN capability for evaluating the relative importance of packaging design features using BCS as a representative packaging design property. • Objective 3: Develop a generalized ANN model applicable at an industrial level using a real-world data set. 30 CHAPTER 2: A COMPARATIVE ANALYSIS OF ARTIFICIAL NEURAL NETWORK (ANN) ARCHITECTURES FOR BOX COMPRESSION STRENGTH ESTIMATION 2.1 INTRODUCTION In this chapter, we investigate the data requirements for an artificial neural network (ANN) to estimate compressive strength and evaluate the ANN's ability to address input variation limitations in the papermaking and box manufacturing process. Supervised learning methods are applied. Given the limited existing research, it remains to be seen whether an ANN can estimate BCS any more accurately than our historical, closed-form approaches. A properly structured ANN might be able to identify additional parameters that contribute to BCS with similar level of impact as known existing factors (e.g., Edge crush test (ECT) value) and thus improve current models over the known levels of inherent variation in the input data. In order to leverage those opportunities, we need to clearly identify the size of the data set required. Compared with many ANN applications which automatically create the underlying data to build a model, collecting data point for BCS estimation model is comparatively expensive, necessitating a series of off-line tests. For ANN modeling of corrugated packaging, the challenges required to generate sufficient data sets may well be the limiting factor on the capability of the model. This research aims to apply ANN to the box compression strength (BCS) of corrugated boxes. Various datasets of BCS have been collected and used to train the ANN model for BCS evaluation. The training process is complex and influenced by multiple factors, including internal factors related to ANN architecture and external factors pertaining to the applied datasets. Internal factors of ANN involve the number of input neurons, hidden neurons, hidden layers, output neurons, and epochs. A new concept called modeling cycle was introduced to mitigate the noise in ANN predictions. This concept aims to obtain results that accurately reflect the average 31 error level of ANN predictions, thereby enhancing the reliability of the model's output. In this study, the BCS features serve as the input neurons, and BCS values are the output neurons. Thus, the number of input neurons corresponds to the number of BCS features used during ANN model training, and the number of output neurons is one. The ANN training process involves determining the optimal number of hidden neurons, hidden layers, epochs and modeling cycles. External factors include the number of data points needed to achieve reliable training results for the ANN model. A dataset that is too small cannot provide reliable results, while an excessively large dataset can unnecessarily increase ANN training time. Therefore, it is crucial to determine the minimum amount of data needed to avoid resource wastage while ensuring robust model performance. 2.2 DATA SETS APPLIED In this study, three datasets were trained to build an ANN model for BCS estimation. The three data sets include the McKee data set, an idealized data set and a data set with variation. The McKee data set is from literature presented by McKee in 1963 [22], specifically compiled for BCS estimation. It consists of 63 data points derived from box compression testing. The idealized data set is a synthetic data set based on McKee equation [22]. This data set was generated by including the box dimensions, ECT values, and thicknesses of 3,009 boxes commonly used in commerce and substituting them into the McKee equation. The data set with variation was created by introducing random errors to the parameters of the idealized data set's boxes. BCS values were then calculated using the McKee equation [22]. This process was carried out to achieve a variation of ±5.4% for BCS. It contains an equivalent number of data points as the idealized data set. Detailed descriptions of these three datasets are provided in the ANN training section, delineating the specifics for each dataset. 32 2.3 ANN KEY FACTORS INITIALIZATION To begin, we apply the ANN approach to the existing data from McKee's 1963 research. Although the McKee data set proves too small for a robust ANN study, its well-established nature enables us to define our ANN methodology. Moreover, it illustrates the process to readers who are familiar with box compression modeling but less acquainted with ANNs. Next, we employ the ANN approach to analyze a significantly larger "synthetic" data set, constructed using idealized data derived from the McKee equation. This dataset allows us to evaluate the potential accuracy of an ANN model when applied to an established large data set and physical relationship. Furthermore, we introduce variation to the input data of the idealized data set, enabling us to assess how this variation propagates through the ANN. This investigation addresses the fundamental question of data set size and evaluates whether the current data collection approaches in the corrugated industry are sufficiently advanced to support the application of ANN in assessing box performance. The conclusion has been appropriately presented at the end, encapsulating the main findings and providing a conclusive summary. A general ANN is structurally composed of three fundamental types of layers: the input layer, the hidden layer(s), and the output layer. The input layer receives raw data and passes it forward, while the hidden layer(s) performs complex computations by applying activation functions to weighted inputs. The output layer then generates the final prediction or classification result based on the processed information. Each layer consists of multiple neurons, which are interconnected with neurons from adjacent layers, forming a network of weighted connections that facilitate learning and pattern recognition. In this study, the ANN model specifically designed for evaluating BCS follows this structural framework and is visually represented in Figure 11, illustrating the organization and connectivity of the network’s layers. By leveraging this multi-layered structure, 33 the ANN can effectively capture non-linear relationships in the data, improving the accuracy and reliability of BCS evaluations. Output Box Compression Strength (lb) – BCS Input Edge Crush Strength (lb/in) – ECT Flexural stiffness in the machine direction of the combined board (lb*in) – EIx Flexural stiffness in the cross-machine direction of the combined board (lb*in) – EIy Board Thickness (in) – BT Box Length (in) – BL Box Width (in) – BW Box Depth (in) –BD Figure 11 A model of an Artificial Neural Network (ANN) structure for predicting box compression strength (BCS) using inputs provided by the McKee data set At the beginning of the ANN training process, all weights between nodes are randomly assigned. The squared difference between predicted BCS values from our training data and their actual BCS values is then calculated in equation (8), and the weights are adjusted via a backpropagation process. 𝑀𝑆𝐸 = 1 𝑛 𝑛 ∑(𝐵𝐶𝑆𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 − 𝐵𝐶𝑆𝑎𝑐𝑡𝑢𝑎𝑙)2 𝑖=1 (8) 34 where n is the number of samples in a trained dataset, and MSE represents mean squared error As mentioned above, an ANN approach segments a given set of known data into two uneven groups, training data and testing data. The former is used to build and refine the model and the latter is used to evaluate the model accuracy. To assess our ANN, 67% of each data set were split into training data and the remaining 33% were split into testing data. Each node in the hidden layers could be defined based on a weighted sum of the parameters in the prior layer, as shown in equation (9). 𝑖 = 𝑓 ∑(𝑤𝑘 ℎ𝑗 𝑘 𝑗 × 𝑥𝑘) 𝑖 is the value of jth neuron in ith hidden layer. j = 1, 2, 3,…, n1 when i =1 ; j = ℎ𝑗 1,2,3,…, n2 when i = 2. 𝑥𝑘 is the value of kth neuron in previous layer, k = 1, 2, 3,…, n. 𝑤𝑘 f is the activation function. 𝑗 is the weight from the kth neuron in previous layer to jth neuron. (9) The choice of activation function is critical for ANN model prediction. To enhance efficiency, the Rectified Linear Unit (ReLU) is used as the activation function for hidden layers since it’s the default activation function for hidden layers and perhaps the most common function used for hidden layers in Machine Learning studies [60]. Besides, since only a subset of neurons is activated at any given time, the ReLU activation function significantly mitigates the vanishing gradient problem, which often hampers deep neural network training by causing gradients to diminish as they propagate backward through layers. This property allows ReLU to enhance learning efficiency and contribute to faster convergence during model training. Since the output layer typically uses a different activation function from the hidden layers, a sigmoid function was selected. The sigmoid function is widely used in neural network research due to its smooth and S- shaped curve, which maps input values to a range between 0 and 1. This characteristic makes it 35 particularly suitable for binary classification tasks and probabilistic interpretation. Moreover, its first derivative is computationally convenient, facilitating gradient-based optimization methods [61]. The sigmoid function is an efficient way of producing an output p∈0,1., which can be interpreted as a probability. Plots of the ReLU function and sigmoid function are shown in Figure 12 and Figure 13. Figure 12 The curve of the ReLU function Figure 13 The curve of the sigmoid function This study involved running programming tasks on an HP Laptop 15t-dy100 featuring an Intel(R) Core(TM) i5-1035G1 CPU, operating at a processing speed of 1.00GHz. The coding process to train the ANN model was conducted using Jupiter Notebook software, an integrated development 36 environment (IDE). Figure 14 illustrates the sequential steps in constructing an ANN model. The training duration varied depending on the dataset's size and characteristics, influenced by the combination of hardware and software. For instance, training a smaller data set of around 60 data points took approximately 3 minutes, whereas training a larger data set comprising approximately 3000 data points required about 30 minutes. Figure 14 Flow for building an ANN model for BCS estimation 2.4 ANN AND MCKEE DATE SET Like many of the modeling efforts in the industry, we begin our exploration of the applicability of ANNs on box compression estimation with the work of McKee et.al. Their model was built using 63 data points including A-, B-, and C-flute boxes. This data set captured information on ECT, flexural stiffness in the machine and cross-machine directions of the combined board (Elx and Ely), thickness of the corrugated board, and the length, width, and depth of the box. Those seven physical parameters serve as the input parameters for an ANN model with BCS as the output, as shown in Figure 11. Of note, these parameters are not independent - flexural stiffness depends in part on the thickness of the board. Including all the available parameters in the data allows the ANN to appropriately assess the relative importance of each parameter to BCS estimation. 37 To assess our ANN given the limited data presented by McKee et.al., the 63 data points were split into 42 training data points and 21 testing data points. Two hidden layers were implemented to generate the output value (BCS). We initially considered utilizing 200 epochs for conducting the calculations. Model accuracy and consistency can be influenced by many modeling factors. The first task in developing an ANN model includes optimizing neuron number combinations in each of the hidden layers. We implemented an exhaustive search method [62] examining different neuron number combinations in the various layers, as shown in Figure 15: To assess the model accuracy, the neuron numbers for the first hidden layer were examined ranging from 80 to 184 (with an increment of 8). Similarly, for the second hidden layer, the neuron numbers were examined from 24 to 42 (with an increment of 3). The MSE was calculated for each combination to evaluate the model's accuracy. In each case a random selection of data points from the underlying data set was used, which has implications on the robustness of the minimum MSE. The minimum MSE occurred with 160 neurons in the first hidden layer and 36 in the second hidden layer. To confirm this result, the increments for both hidden layers were reduced. The increment of 8 in the first hidden layer decreased to 2, and the increment of 3 in the second hidden layer decreased to 1. Remarkably, even with these decreased increments, the minimum MSE still occurred with the same combination of neuron numbers. The same structural framework was maintained for analyzing the McKee data to ensure consistency and comparability in the evaluation process. Notably, this design choice allows for a significantly higher degree of freedom in the model compared to the amount of available data in the McKee dataset. This imbalance suggests that the model has the flexibility to capture a wide range of potential relationships and interactions within the data, which may not be fully constrained by the limited dataset. As a result, the excess degrees 38 of freedom could contribute to the observed variations across different parameter combinations, as illustrated in Figure 15. Figure 15 Mean Squared Error (MSE) calculations for the model using McKee data with varying numbers of neurons in each of two hidden layers for the McKee data set. The error is minimized for 160 neurons in the first layer and 36 neurons in the second To understand the computational load further, we explored how the number of epochs impacted model convergence. While again this calculation is not resource intensive for a small data set like the one provided by McKee et.al., it becomes critical to stop the process at convergence once the data set grows. Figure 16 illustrates that the MSE begins to converge in less than 50 epochs. As the number of epochs increases up to 200, the rate at which the MSE decreases gradually slows down, indicating that each additional epoch yields only a small incremental improvement. This phenomenon suggests that the model's performance is reaching a plateau, where further training offers minimal benefits relative to the computational cost. To address this challenge and strike an optimal balance between computational time and model accuracy, we implement a stopping criterion: the training process is halted once the MSE reduction rate falls below 3.0%. This threshold serves as a practical indicator that the model has achieved sufficient convergence, and that additional training is unlikely to significantly enhance performance. In our experiments, this 39 threshold is typically reached at approximately 100 epochs, ensuring that we maintain optimal computational efficiency while still achieving a robust level of accuracy. Consequently, this approach not only saves valuable computational resources but also minimizes the risk of overfitting by preventing unnecessary prolonged training. Figure 16 MSE versus epoch plot of different data numbers (McKee data set) The number of data points plays a very important role in the ANN accuracy. Figure 17 displays how the number of data points influences the ANN accuracy. Exploring different total population sizes from 30 to the full data set of 63 points, the chosen population was randomly divided into training data (2/3 of the points) and testing data (1/3 of the points). In the modeling process, partitions of underlying data vary from modeling cycle to cycle. This can have a significant impact on the model accuracy due to some special data points may fall into training data in a given cycle and fall into testing data in the subsequent cycle. To assess the impact of variation in the input data on model results and predictions, the process of partitioning a data set into training and testing data was regularly repeated. Multiple modeling cycles were performed. For the McKee data set, a sufficient number of “modeling cycles” of 60 were performed in each population size from 30 data 40 points to the full data set. Overall, the calculation process reflects a confidence interval on the model output related to the breadth of variation in potential input data sets. The confidence intervals around the mean error reflected the ANN training accuracy, as shown in Figure 17. As expected, ANN accuracy increases with population size. For this small data set like the McKee data set, ANN accuracy of the whole data set is notably higher than that of the partial data set, which concludes that the whole data set is needed for the McKee data set to minimize the error. Figure 17 Average error in estimated box compression strength given different subsets of the data (McKee data set), each run through 60 modeling cycles. Note error bars indicate 95% confidence intervals on the mean values Since ANN randomly splits the data into training and testing data in the modeling process, each modeling cycle can have different underlying data partitions. As a result, each modeling cycle can generate a unique model that optimally fits the training data provided but can produce very different values for the error when assessing the testing data. Therefore, it is important to understand how many modeling cycles are required to have results converge to a “typical” reliability. To investigate the impact of different underlying data partitions on the accuracy of the ANN, various numbers of modeling cycles were examined. Figure 18 shows that when we 41 partitioned the full McKee data set (all 63 points), the training data accuracy remained relatively consistent as the number of evaluation cycles increased. The average testing data accuracy converged after roughly 60 rounds of testing, very similar to the total number of data points in the database. Figure 18 Mean of average error in estimated box compression strength given different numbers of modeling cycles. Note that the testing data values converge after 60 modeling cycles (McKee data set). Error bars indicate 95% confidence intervals on the mean values We have explored four modeling factors common in the ANN process using the data presented by McKee: the combination of neuron numbers in hidden layers, the number of epochs, the number of modeling cycles, and the number of data points in a data set. An optimal combination of neuron numbers in hidden layers can minimize the MSE and increase the ANN accuracy for BCS estimation. As the epoch number increases, the MSE reduction rate becomes increasingly slow. To strike a balance between computational time and accuracy, a stopping point when the MSE reduction rate falls below 3.0% was selected to ensure optimal computational efficiency without 42 significantly compromising accuracy. Consistency for the ANN model is realized when the number of modeling cycles reaches a critical number for a given population size, and a minimum number of data points can be identified at which the MSE is minimized, and the ANN is most robust. We carry these observations forward into our analysis of larger, more generalizable data sets. 2.5 ANN AND AN IDEALIZED DATA SET McKee et.al.’s simplified model for box compression strength can be used to explore the applicability of ANN to compression strength estimation. However, the limited size of their data set constrains the ANN approach as noted above. Therefore, a larger data set is desirable. Using the simplified McKee equation [22] as shown in equation (1), a synthetic data set could be generated. In this way, the “idealized” data set was created with 3,009 data points. These data points represent boxes with ranges in length, width, aspect ratio, ECT, thickness and flute types (B- & C-flute) commonly used in North America (Table 1). Note that each “data point” discussed in this section is a specific set of information defining the physical properties of the box (lengths, width, thickness, and ECT) and the associated BCS calculated using equation (1). Table 1 Minimum and maximum values of the data incorporated in the idealized data set Property Length (cm) Width (cm) L/W (aspect ratio) Perimeter (cm) Thickness (cm) ECT (lbs/inch) Min 19.05 12.70 1.00 71.12 0.26 64.77 Max 99.38 76.96 4.00 346.71 0.44 228.35 Given the “perfect” nature of the fabricated data set, it is obvious that a simple least-squares fit of equation (1) to the input parameters reproduces the BCS values with 100% accuracy and 0% error. With a data set this large, one might also hope to overcome the ANN challenges experienced in 43 fitting the much more limited data from McKee, and potentially reproduce the expected values in a test data subset perfectly, with close to no variation from the actual values. To start this process, 67% of the data set (2,016 randomly selected samples) were used for the ANN training process and the rest were used for evaluation of the resulting model. With 200 epochs, the optimal neuron number combination in the hidden layers was again explored using an exhaustive search method. Figure 19 displays the examination of neuron numbers in the first hidden layer, ranging from 128 to 160 (with an increment of 8), as well as the examination of neuron numbers in the second hidden layer, ranging from 24 to 48 (with an increment of 3). Figure 19 Mean Squared Error (MSE) calculations for the model using idealized data with varying numbers of neurons in each of two hidden layers for the idealized data set. The error is minimized for 142 neurons in the first layer and 45 neurons in the second The MSE was calculated for each combination to assess the model's accuracy. The MSE was minimized with 144 neurons in the first hidden layer and 45 neurons in the second hidden layer. To validate this result, the increments for both hidden layers were reduced to 1. Notably, the minimum MSE was observed with 142 neurons in the first hidden layer and 45 neurons in the second hidden layer. As with our McKee analysis above, this number of neurons provides more degrees of freedom in our modeling space than data sets in our model population. Note that the MSE is much lower than for the McKee data set because the data is perfect. However, the values 44 are not zero indicating some residual uncertainty in the estimation of BCS even for this idealized data. In exploring the idealized dataset, we employed an analytical approach similar to that used in the McKee dataset analysis within the context of ANNs. Specifically, we investigated how varying the number of training epochs impacted the model's MSE, with these findings initially illustrated in Figure 20-a. As the training progressed and the number of epochs increased to 200, we observed a general downward trend in the MSE, indicating improvements in model performance. However, this improvement was not entirely smooth; significant fluctuations were present in the MSE values, suggesting intermittent variability in the learning process. As the number of epochs continued to increase, the rate at which the MSE decreased began to slow down, highlighting a diminishing return in performance gains with additional training. To address the challenge posed by these fluctuations and to provide a clearer, more interpretable view of the overall trend, we applied the Moving Average technique [63], This method effectively smoothed out short-term irregularities, resulting in a refined graphical representation of the MSE trend that is depicted in Figure 20-b. The smoothed graph not only offers a more consistent and reliable perspective on the model's performance improvements over time but also aids in the identification of the optimal training duration. By reducing the visual noise in the MSE curve, we can more accurately pinpoint the stage at which additional epochs no longer yield significant benefits, thereby enhancing our understanding of the model's learning dynamics and informing decisions regarding computational resource allocation. This comprehensive analysis ultimately provides valuable insights into the balance between training efficiency and model accuracy, further demonstrating the robustness of our approach when applied to both idealized and real-world datasets. Moreover, these findings underscore the importance of tailoring the training process to the specific characteristics of the 45 dataset at hand, paving the way for future research into adaptive training strategies and alternative smoothing techniques that could further optimize model performance. Figure 20 Mean Squared Error (MSE) of the fits as a function of epochs for different sized data subsets from the idealized data set. 6a displays the raw MSE calculated for each epoch, while 6b presents smoothed data, more clearly displaying the asymptotic nature of the functional relationships This revealed that the MSE experienced a rapid decrease before reaching 50 epochs. From 50 to 200 epochs, the MSE decreased steadily, and the large fluctuations disappeared after 140 epochs. 46 Similar to the study of the McKee data set, to strike a balance between computational time and accuracy, a stopping point was selected when the MSE reduction rate falls below 3.0% after applying the Moving Average technique. This threshold is typically reached at approximately 140 epochs, ensuring optimal computational efficiency without significantly compromising accuracy. When examining the full data set of 3,009 data points, the ANN accuracy remained relatively consistent while the confidence interval around the mean error decreased as the number of modeling cycles increased (Figure 21). Figure 21 Mean of average error in estimated box compression strength given different numbers of modeling cycle (Idealized data set). Note error bars indicate 95% confidence intervals on the mean values To better understand why the error in the model was not zero as might be expected for a model fitting “perfect” data, the specific results from each cycle were examined. It was observed that the BCS errors of four data points in particular always showed higher estimation errors than other data points. Those four data points are at the limits of the data set (or boundary data points). Figure 22 shows the frequency of the actual BCS values. As is typical for data at the end points of a distribution, these four points have excessive leverage in the modeling. Their impact on model accuracy in test data depends on what adjacent points happen to be in the training data. When the 47 boundary data points are randomly selected to be part of the testing data and thus do not appear in the training data, the result tends to show higher BCS average error. The average error across multiple cycles is impacted by this contribution. Figure 22 BCS distribution of 3,009 data points (Idealized data set) To see the influence of population size on the ANN accuracy for the perfect model (similar to Figure 17 above), we examined populations from 600 to 3,009 data points using 10 modeling cycles. The results show that the mean of BCS average errors fluctuates notably when we consider a limited number of data subsets (Figure 23). Even for this larger population, the impact of limiting population size remains significant if the iterative process is not executed for a sufficient number of cycles. In our analysis, we observed that over the course of 50 modeling cycles, the mean BCS average error exhibited a steady decline as the volume of included data increased. However, this trend reached a plateau at approximately 1,500 data points, beyond which additional data did not yield noticeable improvements in accuracy. This finding suggests a crucial relationship between the number of modeling cycles and the population size, indicating that both factors must be carefully considered together to optimize model performance. If the population size is too small, 48 even a high number of iterations may not be sufficient to minimize error effectively. Conversely, if the process is not iterated enough times, the model may fail to fully leverage the benefits of a larger dataset. Therefore, achieving an optimal balance between these two parameters is essential to maximizing model accuracy and reliability. Figure 23 Average error in estimated box compression strength given different numbers with same modeling cycles of 10 and 50 (Idealized data set). Note error bars indicate 95% confidence intervals on the mean values The combination of neuron numbers, the number of epochs, the number of modeling cycles, and the number of data points impacts the accuracy of the ANN prediction. Even when using the full data population (>3000 data points) and many modeling cycles on a perfect data set generated by a closed form equation, the average relative error of the BCS prediction is not zero. From Figure 23, in conjunction with Figure 21, this analysis identifies the error contribution of the ANN approach itself at around 0.4% when estimating BCS from this type of data and data sets of this size. This residual error is independent of any physical properties; rather, it arises from the ANN process itself. As such, we would expect it to be additive 10. to any other errors that may arise in 49 using a model for prediction, including measurement errors of the input parameters to the model as well as fundamental errors in the model functional form. 2.6 ANN AND A DATA SET WITH VARIATION Variation occurs naturally in all processes. Typical variations in measurement of inputs associated with the performance of a corrugated box are on the order of 4-5% for measured quantities like ECT and BCS. To further study if and how the ANN model works while handling a data set incorporating variation, we modified the ideal set to represent boxes that might appear in commerce. We added fluctuations to each input value, using randomized, normally distributed values on the order of the variation observed in the different test methods. As with the idealized data set, a “data point” represents a specific set of information defining the physical properties of the box (lengths, width, thickness, and ECT) and the associated BCS calculated by equation (1). The average absolute difference between the new predicted BCS values for the 3,009 data points and the “actual” BCS of the idealized model was obtained by adding variations to the input parameters and calculated by equation (1). This process was carried out to achieve a variation of ±5.4% for BCS. We then followed the same process as for the idealized data set: 67% of the data set (2,016 randomly selected samples) were used for the ANN training process and the remaining were used for evaluation of the model. We used the same number of epochs and neuron numbers in hidden layers as in the idealized modeling. To explore the impact of the number of epochs on the convergence behavior of the ANN model, we conducted experiments by running the model on different subsets of data for up to 250 epochs. In these experiments, we observed that the MSE decreased rapidly during the first several epochs across all data subsets, as illustrated in Figure 24 Notably, the largest dataset consistently achieved the lowest MSE for any given epoch, indicating that a greater volume of data can enhance the 50 learning efficiency and accuracy of the model. In alignment with our earlier modeling efforts, Figure 24-a displays the raw MSE values computed at each epoch, providing a detailed view of the initial rapid improvement followed by a gradual tapering in error reduction. a) b) Figure 24 Mean Squared Error (MSE) as a function of epochs for different sized data subsets from the variation data set. 10a displays the raw MSE calculated for each epoch, while 10b presents smoothed data, more clearly displaying the asymptotic nature of the functional relationships 51 To further elucidate the long-term convergence behavior and to reduce the impact of short-term fluctuations, Figure 24-b presents a smoothed version of the data using an appropriate smoothing technique. This smoothed graph clearly highlights the asymptotic behavior of the MSE, allowing us to discern the point at which additional epochs yield diminishing returns. As expected, given the deliberate addition of variation to the input data, the MSE values observed in these experiments are considerably higher than those recorded for the idealized dataset shown in Figure 20. This contrast underscores the challenges introduced by increased data variability and emphasizes the need for robust modeling strategies when working with more complex, real-world datasets. We modeled different population sizes as above to again identify the influence of population size on ANN accuracy (Figure 25). Figure 25 Average error in estimated box compression strength given different numbers with same modeling cycles of 10 and 50 (Data set with Variation). Note error bars indicate 95% confidence intervals on the mean values The accuracy of both the training data and testing data remained relatively consistent as the number of modeling cycles increased. While the accuracy of the training data was in line with expectations 52 from the variation built into the data set (~5.4%), the influence of limiting population size can have a meaningful impact if we don’t iterate the process sufficiently. Notably, the ANN approach was not working with any more information than the closed form equation itself, and so the prediction accuracy did not improve upon what we would get from the closed form equation. Test data accuracy didn’t begin to converge until around 1500 data points when we used 20 modeling cycles, yet accuracy occurred slightly sooner (~1250 data points) when we used 70 modeling cycles. The BCS average error levels out at 2,500 data points, nearly the entire data set, at a value combining the inherent uncertainty in the input data and the uncertainty of the ANN process itself, identified above. This is notably larger than for the idealized data, because of the influence of variation in the input parameters. As with the idealized data above, the influence of modeling cycles and population size need to be considered together. The minimum data population size to get a robust result is also larger for the variation data set. 2.7 CONCLUSION In this session of our study, we explored BCS estimation using Artificial Neural Networks across input data sets that included both actual data from the literature and data based on literature models. Partitioning each data set into test and training subsets and running multiple modeling cycles on different partitions provides an analysis of average model estimation accuracy that can be expected when the resulting models encounter new data. An ANN model with high accuracy and consistency can be built by adjusting four modeling factors: the combination of neuron numbers in hidden layers, the number of epochs, the number of modeling cycles, and the size of the data set. All four interact to influence model accuracy and can be optimized by minimizing model MSE. The combination of neuron numbers in the two hidden layers was determined as 160 and 36 for the McKee data set, and 142 and 45 for the idealized data set. Employing the same stopping 53 criterion, where the MSE reduction rate is required to be below 3.0%, the epoch numbers were established as 100 for the McKee data set and 140 for the idealized data set. To ensure a robust result with high consistency in the ANN, it was found that 60 modeling cycles are needed for the McKee data set, 50 modeling cycles are required for the idealized data set, and 70 modeling cycles are necessary for the data set with variation. The data size needed to get a robust result varies based on the input data variations and can be identified by minimizing average BCS error: For the McKee data set, 63 data points are not enough for an ANN to predict the BCS reasonably. The other two data sets (idealized data set and data set with variation) need at least 1000 data points to get a robust result for ANN prediction. The data size needed is significant and data collection can be expensive considering the physical testing required. Our ANN models had more degrees of freedom than the number of underlying data sets, which might lead us to expect that we could perfectly fit the underlying data and achieve BCS estimations very close to “measured” values. Instead, we found that model estimation accuracy remains limited by the uncertainty or error in the input parameters combined with uncertainty from the ANN process itself. The variation of input parameters had a positive correlation with an ANN process (high variation increases the training error and vice versa). By identifying the challenges of small data sets and the interrelationship between modeling parameters and the estimation error in the data space, this study provides a methodological guide for future research exploring the applicability of ANN approaches to address problems and answer questions in the corrugated industry. 54 CHAPTER 3: EVALUATION OF PACKAGING DESIGN RELATIVE FEATURES IMPORTANCE USING ANN 3.1 INTRODUCTION This chapter focuses on leveraging artificial neural network (ANN) models to evaluate the relative importance of packaging design features. In this section, Box Compression strength (BCS) was used as a representative packaging property, and the relative importance of up to six packaging design features was assessed using four ANN-based methods (the weight connections method, the permutation method, the gradient-based method, and SHAP values) based on the reliability of feature importance rankings. Two datasets were utilized: one synthetic dataset generated using a widely used mathematical model (the McKee formula) and one real dataset [26]. These datasets were used to train ANNs to assess packaging design feature importance. Theoretical feature importance was calculated and compared with the feature importance from the four ANN-based methods. This result demonstrates the feasibility of applying the ANN approach to evaluating the relative importance of packaging design features. 3.2 CURRENT METHODS FOR EVALUATING BCS FEATURE IMPORTANCE Packaging design plays a vital role in ensuring packaging performance. Effective design can reduce costs by minimizing material usage and waste while protecting products during transportation, storage, and handling. To improve packaging performance, the design process must consider various influencing factors, each of which impacts specific packaging properties differently. Understanding the relative importance of these influencing factors for packaging design, or packaging design features, is crucial for enhancing packaging performance and maximizing cost savings. 55 The evaluation of the relative importance of packaging design has traditionally relied on conventional physical testing, which involves numerous mechanical tests to obtain accurate measurements, making the process both resource-intensive and time-consuming. Consequently, several new methods for assessing the importance of packaging design features have emerged in recent years. Techniques such as the Analytical Hierarchy Process (AHP) [64, 65], and Finite Element Analysis (FEA) [27, 66, 67] have been developed to optimize different packaging systems. Alicia Pérez et al. (2020) applied AHP to optimize a company business strategy of corrugate cardboard boxes to support multicriteria decision-making, generating multiple improvements, such as the reduction of the overall cost, the optimal fill rate operations, and the articulation of the strategic and functional decisions in this organization [65]. Jongmin Park et al. (2020) investigated the edgewise compression behavior (load vs. displacement plot, ECT, and failure mechanism) of corrugated paperboard based on different types of testing standards and flute types using finite element analysis (FEA) and experimental analysis [67]. However, these methods are not consistently applied or fully integrated into industry practices. For instance, a primary disadvantage of AHP is expert subjectivity [68]. AHP relies on expert input for pairwise comparisons between options, where experts evaluate the relative importance or performance of one option over another. These judgments, being influenced by personal opinions, can introduce subjectivity [69]. On the other hand, FEA requires material properties that are challenging to obtain due to the anisotropic and non-linear mechanical behavior exhibited by paper fibrous material [70]. Additionally, there are limited applications of these methods for systematically evaluating the importance of packaging design features. Therefore, there is a substantial opportunity to develop a more efficient approach for assessing the feature importance for packaging design. 56 3.3 ANN APPROACH FOR EVALUATING FEATURE IMPORTANCE Despite the limited advancement in methods for evaluating packaging design feature importance, researchers have extensively explored various methods for evaluating the relative importance of input variables. In the last decades, Artificial Neural Networks (ANNs) have gained growing interest in various engineering and multidisciplinary research fields, such as the tourism industry [71], the financial sector [72], and complex engineering applications [73]. However, in the field of packaging, Artificial Neural Networks (ANNs) have been applied in only a few areas, such as estimating edge crush resistance and evaluating other packaging properties [74, 75], with limited applications beyond these. Tomasz Gajewski et al. (2024) predicted the crush resistance of corrugated packaging boxes with ventilation openings, packages with perforations, and typical flap boxes using different ANN models [76]. Siripong Malasri et al. (2016) trained a small data set of 74 box samples using ANN to predict the compression strength of cubical RSC single-wall corrugated boxes [34]. ANNs have been a focal point due to their capability to generalize complex non-linear problems. ANN modes have been implemented to evaluate the relative importance of input variables using many methods. Some commonly used methods include the weight connections method [77], the sensitivity analysis method, the gradient-based method [78], and SHAP (Shapley Additive exPlanations) values [70]. Within the weight connections method, the measures of input variable importance rely on the connection strengths (weights) within a trained neural network [79]. In 1991, Garson et al. (1991) proposed a method to determine the relative impact of each input variable by calculating the percentage of output weight values associated with the contribution of a single input across the entire network [79, 80]. Yoon et al. (1994) provided a representation of the relative contribution of input i with respect to the overall behavior of the neural network [81]. 57 Compared with Garson’s method, Yoon’s method considers the direction (positive/negative) of the contribution of an input. In 1999, Howes and Crook also proposed a formulation to determine the relative influence of input variables in neural networks [77, 79], which is similar to Yoon’s method. However, it normalizes the effect of extreme weights connecting input and hidden nodes and is the only method measuring the importance of the variables in multiple hidden layers. Additionally, SH Tsaur et al. (2002) [82] defined the input importance scores by taking the sum of the weights connecting the input to the output layer. JD Olden et al. (2021) measured the relative importance of the input variables based on the product of the input-hidden and hidden-output connection weights and summing the products across all hidden neurons [77]. Within the gradient- based method, A Hill et al. (2020) [78] proposed a gradient-based method to identify the relative importance of influencing factors for robotic control by obtaining the gradient of the output with respect to any component of the neural network using the chain rule. As for the sensitivity method, one of the most commonly used techniques is the permutation method. H Mandler et al. (2023) used the permutation method to measure the sensitivity of a model with the presence or absence of a feature to determine the importance scores of input features in fluid dynamics using a neural network-based turbulence model. Regarding SHAP values [83], SM Lundberg et al. [70] presented a unified framework for interpreting predictions, assigning each feature an importance value for a particular prediction in a deep learning model. These methods have found application in various domains for extracting the influence of inputs in machine learning models. For instance, they have been used to analyze design parameters' impact on complex engineering systems, guest loyalty to hotels in the tourist industry, the relative importance of textual indexes in predicting the future performance of banks, and geographical phenomena visualization, among others. NL da Costa et al. (2021) [77] utilized Garson’s method 58 to evaluate the contribution of inputs to outputs in trained neural networks for both classification and regression problems. Goh, A.T. C. [84] employed Yoon’s method to identify the relative importance of input factors influencing cone stresses and soil properties in an ANN model. HF Luoh et al. (2014) [85] applied Tsaur’s method to identify moderating effects concerning tour leader age stereotypes, age in-group bias, and respondents’ age on perceived roles played by tour leaders. J Iqbal et al. (2023) [86] utilized Olden’s connection weights method to predict banks' future performance by identifying the relative importance of textual indexes representing management sentiments in banks’ annual reports. K Fukumizu et al. (2012) [87] conducted dimension reduction for both feature extraction and variable selection based on the gradient-based method in supervised learning. A Altmann et al. (2010) [88] estimated the distribution of measured importance for each variable in a non-informative setting based on the permutation method in RandomForest (RF) models. Ziqi Li (2022) [89] applied SHAP values to extract spatial effects for interpreting and visualizing complex geographical phenomena and processes in machine learning models. Based on the reliability of packaging feature importance rankings of the synthetic dataset generated using the widely used mathematical model, four ANN-based methods were selected to evaluate the importance of the target packaging design features in this study. These four methods are the Connection Weights method, the Gradient-based method, the Permutation method, and SHAP values. The principles underlying each method for extracting input feature importance are explained in this section. 3.3.1 Connection Weights method The Connection Weights method is a valuable technique for interpreting ANN models. An ANN comprises three types of layers: input, hidden, and output layers, as depicted in Figure 26. Neurons 59 in each layer are connected by weights, which indicate the strength of the connections between the neurons. The Connection Weights method employs the weight matrix to determine the relative significance of each input feature in relation to the output [77]. The weights in the first layer, which connect the input neurons, can reflect the relative importance of each input feature. Thus, the Connection Weights method derives the relative importance of input features by extracting these weights from the first layer, as illustrated in equation (10) and Figure 26. The get-weights function was employed to extract the weights from the first layer of the ANN model. 𝐼𝑖 = 𝜎𝑖 ∑ 𝑛ℎ𝑖𝑑𝑑𝑒𝑛 𝑗=1 |𝑤𝑖𝑗| Where: 𝜎𝑖 - The standard deviation of the ith input. 𝐼𝑖 - The ith input's importance. 𝑛ℎ𝑖𝑑𝑑𝑒𝑛- The number of hidden nodes in the first layer. 𝑤𝑖𝑗 - The weight connecting the ith input to the jth hidden node in the first layer. (10) Figure 26 The ANN structure & Connection weights method extracts the weights in the first layer as the input feature importance 60 3.3.2 Gradient-based method The gradient-based method is a key technique for evaluating how a model’s outputs are influenced by its input features. It involves calculating the partial derivative of the output with respect to the input, which measures sensitivity. This approach is applicable in Deep Neural Networks (DNNs), a specialized type of ANN. [90]. Examining how variations in input features influence the predicted output can reveal insights into feature importance within ANN models. The gradient's magnitude indicates the extent of change in the predicted output due to an infinitesimal alteration in the input feature [91]. The gradient of output 𝑦̂ with respect to input X is represented in equation (11), 𝜕𝐹(𝑋) 𝜕𝑥1 The differentiability of deep neural networks is determined by the activation function used. ∇𝑦̂ = ∇𝐹(𝑋) = [ 𝜕𝐹(𝑋) 𝜕𝑥𝑑 (11) ⋯ ] 𝑇 Activation functions like sigmoid, ReLU, and Tanh are differentiable almost everywhere. In this study, where the sigmoid function is utilized, a central difference method was applied to numerically approximate the gradient of F(X) at X, as defined in equation (12), 𝜕𝐹(𝑋) 𝜕𝑥𝑘 ≜ 𝐹(𝑋(𝑘+)) − 𝐹(𝑋(𝑘−)) 2𝛿𝑥 (12) Where: 𝑋(𝑘+) ≜ X + 𝛿𝑥 ∙ 𝑒𝑘 𝑋(𝑘−) ≜ X - 𝛿𝑥 ∙ 𝑒𝑘 𝛿𝑥∈ R is the step size, and for all k = 1, . . . , d, ek ∈ Rd is the standard basis vector. The terms F(X (k+)) and F(X (k−)) are obtained from two forward passes of the model. The importance of the kth feature is then defined as the absolute value of the partial derivative with respect to xk. This gradient vector provides the feature importance for a single test sample. To determine the global feature importance, the feature importances for all samples in the test set SN were averaged, where N represents the number of samples in the test set [91], as outlined in equation (13), Sample gradient imp (𝑥𝑘) ≜ | 𝜕𝐹(𝑋) 𝜕𝑥𝑘 |𝑋 Global gradient imp (𝑥𝑘) ≜ 1 𝑁 ∑ 𝑋∈𝑆𝑁 | 𝜕𝐹(𝑋) 𝜕𝑥𝑘 |𝑋 (13) 61 In this study, gradients were computed using tf.GradientTape(), a Python tool that allows for nesting to calculate higher-order derivatives. 3.3.3 Permutation method The Permutation method measures a feature's importance by observing how model accuracy changes when the values of that feature are randomly shuffled while keeping other feature values intact. A feature with higher importance will have a greater effect on the model's accuracy when its values are shuffled [83]. As detailed in equation (14) and Table 2, to determine the feature importance of feature fj, the column corresponding to fj is randomly shuffled to generate a corrupted data set Dkj. The ANN model is then trained on Dkj, and its accuracy is compared with the accuracy of the original model s. The difference in accuracy scores indicates the importance of feature fj. 𝑖𝑗 = 𝑠 − 1 𝐾 𝐾 ∑ 𝑠𝑘𝑗 𝑘=1 Where: ij - Importance for feature fj. s - Reference score of the model m on data set D. K - Repetition times for randomly shuffling column j of the dataset D to generate a corrupted version of the data named Dkj. skj - Score of the model on corrupted data Dkj. (14) Table 2 Principle of permutation feature importance In this study, the np.random.permutation() function from the NumPy library in Python is utilized to randomly shuffle the input features in the ANN. 62 3.3.4 SHAP values SHAP (SHapley Additive exPlanations) values provide a method for interpreting the outputs of machine learning models. This approach uses principles from game theory to measure the contribution of each feature to the final prediction [92, 93]. It aims to fairly allocate the contributions of each feature towards achieving the overall result [94]. SHAP values can be applied in machine learning to measure the contribution of each feature to the model's prediction, providing insights into how each feature collectively influences the final outcome [95]. In evaluating feature importance, SHAP values compare the model's output with and without a specific feature for a given data point, accounting for all possible combinations of the other features. The average difference in the output is then computed. The SHAP value for feature Xj in a model is represented by equation (15): Shapley (𝑋𝑗) = ∑ 𝑆⊆N\{𝑗} 𝐾!(𝑝−𝑘−1)! 𝑝! (𝑓(𝑆 ∪ {𝑗}) − 𝑓(𝑆)) Where: p – The total number of features. N\{j} – A set of all possible combinations of features excluding Xj. S – A feature set in N\{j}. f(S) – The model prediction with features in S. f(S∪{j}) – The model prediction with features in S plus feature Xj. (15) According to equation (15), the SHAP value of a feature represents its marginal contribution to the model’s prediction, averaged across all possible models with varying combinations of features [95]. Equation (15) determines feature importance for a single data point. In our study, which involves multiple data points, this process is repeated for each data point. The average difference in output across all data points is then used as a metric to quantify the contribution and importance of a single feature in the model's predictions. This approach ensures that the impact of each feature is evaluated in a comprehensive manner, considering various interactions within the dataset. One practical example of how SHAP values are calculated for multiple data points in our dataset is 63 illustrated in Figure 27, where the contributions of individual features are systematically analyzed to provide a clear interpretation of their influence on the model's output. 1. Calculate the SHAP value of one data point (take Length of the box for example). a. Take all the combinations of different features. b. Calculate the difference between the model predictions with and without feature length (for each combination). 2. Calculate the absolute mean SHAP value of all data points. c. Average combinations. the difference of all Figure 27 Calculation process of SHAP values with multiple data points in a data set In this study, the explainer.shap_values() function from the SHAP library in Python is employed to compute the SHAP values for each BCS feature. 3.4 FLOW OF FEATURE IMPORTANCE EVALUATION USING ANN To evaluate the input feature importance, the first step is to train an ANN model. In principle, training an ANN model involves building a function that recognizes the underlying relationships between input variables and output variables. By feeding the models with values of relevant input features and the corresponding output values of available data points, an ANN can be trained to learn the relationship between output(s) and their input features. Consequently, the trained ANN 64 model can forecast the output values for new data points based on their input feature values as they become available. In this section, the process began by constructing an ANN model through the training of available data sets, analogous to the procedure used for predicting output values. Then the four aforementioned methods (the weight connections method, the permutation method, the gradient-based method, and SHAP values) were applied to assess the relative importance of various packaging design features within the trained ANN model. The results of the packaging design feature importance assessment were validated by comparing them with the theoretical feature importance calculated using the well-established mathematical model. Figure 28 illustrates the sequential procedure for mapping input feature importance within an ANN model utilizing the four selected methods. The development and training of the ANN model were carried out using Jupyter Notebook, an integrated development environment (IDE). Figure 28 Flow of mapping feature importance using four ANN based methods 3.4.1 Methods for hidden layer neuron number optimization When it comes to the methods for optimizing the hidden layer neuron numbers, the key is to balance the model accuracy and the computational efficiency. Recall Chapter 3, the exhaustive search method could identify the hidden layer neuron setting by locating the minimum error of ANN model prediction. However, the exhaustive search method is very time-consuming. 65 Therefore, researchers have investigated various computation techniques to achieve the output with minimum calculation while maintaining a high model accuracy. The Akaike information criterion (AIC) was introduced by Akaike, Hirotogu in the 1998 [96], it originally was developed to identify an optimal model from a class of competing models [96] but has been adapted to the detection of outlier gene expression and model evaluation. By evaluating the model with different input features deleted, the input feature importance can be evaluated at the same. Hebb’s rule, also known as Hebb’s law or Hebbian learning, is a neuropsychological theory proposed by Canadian psychologist Donald Hebb in 1949 [97]. Hebb’s Rule is based on the idea that the brain is capable of reorganizing itself in response to experience: when two neurons are activated simultaneously, the connection between them is strengthened. The Bayesian information criterion (BIC) or Schwarz information criterion (also SIC, SBC, SBIC) was developed by Gideon E. Schwarz and published in a 1978 paper [98]. The BIC is a criterion for model selection among a finite set of models in statistics, attempting to resolve this problem by introducing a penalty term for the number of parameters in the model [99]. The Optimal Brain Damage (OBD) rule was proposed by Yann Lecun et al. in 1989 by removing unimportant weights from a network to reduce the training examples required and improve the learning speed. OBD uses the second derivatives of the error function to determine which weights in the network are least important to the overall performance, making a trade-off between network complexity and training set error [100]. Bayesian Optimization is created by Jonas Mockus in the 1970s [101, 102]. Bayesian Optimization builds a probability model of the objective function and uses it to select a hyperparameter to evaluate the true objective function [103]. In this study, five computation methods, including Information Criteria using the AIC method, Hebb's rule, Information Criteria using the BIC method, Optimal Brain Damage rule, and Bayesian 66 Optimization method, were applied to optimize the hidden layer neuron number setting in order to reduce the computation time and achieve a high efficiency of ANN model training. 3.4.2.1 Bayesian optimization method Bayesian Optimization is designed for black-box derivative-free global optimization [104]. Bayesian Optimization builds a probability model of the objective function and uses it to select hyperparameters to evaluate the true objective function. The true objective function is a fixed function, as shown in the dotted line in Figure 29. Generally, for a derivative-free function, what can be accessible are some data points (or observations), but not all, as shown as the black points in Figure 29. A surrogate model (surrogate function) can be built to approximate the true objective function. The surrogate function is represented as the black line in Figure 29. Figure 29 Schematic of Bayesian optimization process (source: http://haikufactory.com/files/bayopt.pdf) The blue shade represents the deviation. A surrogate function by definition is “the probability representation of the objective function”, which is essentially a model trained on the 67 (hyperparameter, true objective function score) pairs. As some observations have been known, it is possible to find new observations by trying different parameters, and this is where an acquisition function needs to be built. An acquisition function can be generated using the surrogate function, which will be detailed in this chapter later. The way to identify the new observation is to locate the maximum point of the acquisition function and calculate the corresponding hyperparameter and its objective function. After the new observation is found, the surrogate function and acquisition function will be updated. This process is repeated till the surrogate function is as close as possible to the objective function. The schematic of the Bayesian Optimization process is shown above in Figure 29. In this study, the Gaussian Process Model (GP) was used as the surrogate function. The Gaussian Process Model’s acquisition function is the Expected Improvement function, as shown in equation (16), ∞ 𝐸𝐼(𝑥) = ∫ 𝒎𝒂𝒙(𝑓(𝑥)∗ − 𝑓(𝑥), 0)𝑝𝑀(𝑓(𝑥)|𝑥)𝑑𝑓(𝑥) −∞ Where: 𝑝𝑀(𝑓(𝑥)|𝑥) : the surrogate function. 𝑓(𝑥) is the true objective function score, 𝑥 is the hyperparameter. 𝑓(𝑥)∗ : the minimum observed true objective function score so far. 𝑓(𝑥) : new scores. (16) Function BayesianOptimization() was used in Python to conduct the Bayesian Optimization for ANN hidden layer optimization. 3.4.2.2 Information Criteria using Akaike information criterion (AIC) method The Akaike information criterion (AIC) is an estimator of prediction error and thereby relative quality of statistical models for a given set of data 126-128.. Given a collection of models for the data, AIC estimates the quality of each model, relative to each of the other models. Thus, AIC provides a means for model selection. 68 Let k be the number of estimated parameters in a statistical model of some data. Let 𝐿 ̂ be the maximized value of the likelihood function for the model. Then the AIC value of the model is the following, as shown in equation (17) [105] [106]. 𝐴𝐼𝐶 = 2𝑘 − ln 𝐿̂ Where: 𝐿̂ - The maximized value of the likelihood function for the model. 𝑘 - The number of estimated parameters in a statistical model. (17) This number estimates the amount of information that is lost when the model M is used to approximate reality. The model with the lowest AICM is considered the one best fitting the data [107]. In this study, the mean squared error (MSE) was calculated as the 𝐿̂ in the equation above. Given a set of candidate models for the data, the preferred model is the one with the minimum AIC value. Thus, AIC rewards goodness of fit (as assessed by the likelihood function), but it also includes a penalty that is an increasing function of the number of estimated parameters. The penalty discourages overfitting, which is desired because increasing the number of parameters in the model almost always improves the goodness of the fit [108]. 3.4.2.3 Information Criteria using Bayesian information criterion (BIC) method Bayesian information criterion (BIC) (Stone, 1979) is another criterion for model selection that measures the trade-off between model fit and complexity of the model. A lower AIC or BIC value indicates a better fit [109]. In statistics, the Bayesian information criterion (BIC) or Schwarz information criterion (also SIC, SBC, SBIC) is a criterion for model selection among a finite set of models; models with lower BIC are generally preferred. It is based, in part, on the likelihood function and it is closely related to the Akaike information criterion (AIC). When fitting models, it is possible to increase the maximum likelihood by adding parameters, but doing so may result 69 in overfitting. Both BIC and AIC attempt to resolve this problem by introducing a penalty term for the number of parameters in the model; the penalty term is larger in BIC than in AIC for sample sizes greater than 7 [110]. The BIC is formally defined as shown in equation (18) [111] [112], 𝐵𝐼𝐶 = 𝑘𝑙𝑛(𝑛) − 2ln 𝐿̂ Where: 𝐿̂ - The maximized value of the likelihood function of the model, i.e. 𝐿̂ = 𝑝(𝑥 | 𝜃̂, 𝑀) , where are the parameter values that maximize the likelihood function and 𝑥 is the observed data. 𝑛 - The number of data points in 𝑥, or the sample size. 𝑘 - The number of parameters estimated by the model. (18) In this study, the mean squared error (MSE) was calculated as the 𝐿̂ in the equation above. 3.4.2.4 Hebb's rule According to Hebb’s rule, in a network, the more often two neurons are activated together, the more efficient the connection between them becomes. When we learn new information or skills, the connections between neurons in our brain are modified to facilitate the formation of new neural pathways. Hebb’s rule suggests that this process of neural reorganization is driven by the repeated co-activation of neurons. From the point of view of artificial neurons and artificial neural networks, Hebb's principle can be described as a method of determining how to alter the weights between model neurons. The weight between two neurons increases if the two neurons activate simultaneously and reduces if they activate separately. Nodes that tend to be either both positive or both negative at the same time have strong positive weights, while those that tend to be opposite have strong negative weights. The implementation of Hebb's rule is: a) Train the neural network using all of the neurons in the hidden layer. b) Use the weights learned during training to calculate the Hessian matrix. c) Use the Hessian matrix to calculate the sensitivity of the cost function with respect to 70 each neuron. d) Prune the neurons with the smallest sensitivity values. This process can be repeated until the desired level of network complexity is reached. In this study, Hebb’s rule used two empirical functions in the Python library. 3.4.2.5 Optimal Brain Damage (OBD) rule The Optimal Brain Damage (OBD) rule is to use the second derivatives of the error function to determine which weights in the network are least important to the overall performance, making a trade-off between network complexity and training set error [100]. The OBD procedure can be carried out as follows: a) Choose a reasonable network architecture. b) Train the network until a reasonable solution is obtained. c) Compute the second derivatives for each parameter. d) Compute the saliencies for each parameter. e) Sort the parameters by saliency and delete some low-saliency parameters. f) Iterate to step b). In this study, OBD is applied based on an empirical function in the Python library. 3.4.2 ANN neuron number determination in the hidden layer To build an ANN model, the values of three key modeling factors need to be determined. These three modeling factors include the hidden neuron number setting, the epoch number, and the modeling cycle number. Dataset for training an ANN was randomly divided into two subsets: a training data set for training the model and a testing data set for assessing model accuracy. To construct the ANN model, the number of hidden layer(s), and neuron numbers for each layer were determined by balancing the model accuracy and computational efficiency. In this study, five 71 optimization methods were examined, each targeting the optimization of the hidden layer configuration. These methods included the Bayesian optimization method [113] [103], Information Criteria using the Akaike information criterion (AIC) method [96], Information Criteria using the Bayesian information criterion (BIC) method [98], Hebb's rule [114], and the Optimal Brain Damage (OBD) rule [100]. The final hidden layer configuration was determined based on minimizing error and maximizing computational efficiency through a comprehensive comparison of model errors across the five optimization methods mentioned above. Figure 30 presents a conceptual diagram of the ANN structure developed in this study. Figure 30 A conceptual diagram of the ANN structure used in this study The number of epochs was determined based on the model loss (error) versus epoch plot. The epoch count was chosen when the model error decreased and reached a stable, plateaued level. 3.4.3 Final relative feature importance determination With the optimal hidden neuron setting and the optimal epoch number, the studied packaging design features were incorporated into the ANN model. The relative importance of these packaging 72 design features was evaluated using the previously mentioned four methods. Each method provided importance scores for the studied packaging features, which were then normalized to a scale from 0 to 1 to account for differences in scaling. Referring to the mathematical model, The theoretical feature importance of 0 for features not included in the mathematical model was used to identify and eliminate unreliable methods. In practice, researchers and practitioners often average feature importance scores from various methods to obtain a more stable or general feature ranking. Therefore, in this study, to produce a comprehensive measurement, the feature importance results from the reliable methods were averaged to determine the final importance of different packaging design features. To confirm the validity of these results, the averaged packaging design feature importance was compared with the theoretical feature importance calculated using the well-established mathematical model. 3.5 CASE STUDY FOR FEATURE IMPORTANCE ANALYSIS As a critical parameter in the evaluation of shipping containers, BCS is determined by various factors, such as material properties, flute types, dimensions, and more. Each factor, or BCS feature, affects the BCS differently. Understanding how each BCS feature influences the BCS value and identifying the most impactful ones are crucial for packaging design. This knowledge enables designers to strategically prioritize adjustments to the most influential features, ultimately reducing material consumption and costs [115]. However, few systematic methods to evaluate the BCS features have been developed yet. Current analytical methods for BCS prediction indicate the dominant BCS features but require plenty of various mechanical tests considering the various BCS features. Existing numerical models based on finite element analysis (FEA) face difficulties in obtaining relevant parameters and dealing with the anisotropic non-linear properties of paper materials. Assessing the BCS feature importance is a great challenge for the corrugated packaging 73 industry. Therefore, in this section, BCS was used as a representative packaging property to validate the capability of the ANN approach in assessing the relative importance of packaging design features. Two datasets — one synthetic and one real—were employed as case studies. Up to six BCS features —including box perimeter, depth, ECT, thickness, and bending stiffness in both machine (EIx) and cross directions (EIy) — were evaluated using the four selected ANN- based approaches. The average feature importance of these BCS features, as determined by the ANN approach, was calculated to provide a comprehensive result. These values were then compared with the theoretical feature importance values derived from the McKee formula to verify the ANN assessment. 3.5.1 Case study 1-Relative feature importance of the synthetic data set The first data set used was a synthetic dataset which was created by inputting the box perimeters, depths, ECTs, and thicknesss of 3,009 commonly used commercial boxes [116] into the simplified McKee formula, as detailed in equation (19) [117] to compute the BCS values. 𝐵𝐶𝑆 = 5.87 × 𝐸𝐶𝑇 × √𝐶𝑎𝑙𝑖𝑝𝑒𝑟 × 𝑃 Where: ECT – Edge Crush Strength (lb/in). Thickness – Thickness of the corrugated board (in). P – Perimeter of the box (in). (19) Using the simplified McKee formula along with the concept of derivatives, we computed the theoretical relative importance of the four BCS features. The analysis, detailed in Figure 31, shows that the ECT feature has the highest relative importance, with a weight of 0.500, indicating that it is the most influential factor in the model. Both the perimeter and thickness features were found to have equal significance, each contributing a weight of 0.250, which underscores their moderate but noteworthy impact on the model's performance. In contrast, the depth feature was determined to have no influence in this analysis, receiving a weight of 0. This outcome, as illustrated in Figure 74 31, not only quantifies the contributions of each feature but also emphasizes the critical role of ECT in the model, while suggesting that the depth feature may be redundant or less relevant for this particular application. e c n a t r o p m I e r u t a e F 1.000 0.900 0.800 0.700 0.600 0.500 0.400 0.300 0.200 0.100 0.000 0.250 0.250 0.500 Theoretical BCS Feature Importance (from Simplified McKee formula) ECT P Caliper d Figure 31 Theoretical BCS Feature Importance calculated using the Simplified McKee formula 3.5.1.1 ANN training using the synthetic data set During the ANN model training process, the synthetic dataset with 3009 data points was randomly divided into two subsets: 70% of the data (2016 data points) for training the model and the remaining 30% (993 data points) for testing the model’s accuracy. As previously mentioned, the hidden layer neuron number setting was determined by comparing the model errors obtained by the five optimization methods mentioned in section 2.1(including the Bayesian optimization method, AIC method, BIC method, Hebb's rule, and the OBD rule) to minimize the model error and maximize the computational efficiency. Figure 32 presents the optimal neuron settings for the hidden layer as determined by each method, along with their 75 corresponding model errors over 70 modeling cycles. The results showed that the AIC method achieved the lowest model errors with one hidden layer and 120 neurons. The BIC method produced the second-lowest model error, with one hidden layer and 34 neurons. The error difference between these two methods was no more than 0.0032, considering their 95% confidence intervals (0.0032±0.0015 and 0.0026±0.0011). This indicates that the more complex configuration suggested by the AIC method was not necessary. In contrast, the simple configuration of 34 provides nearly the same accuracy but is significantly more efficient in terms of computational resources. Consequently, the neuron configuration proposed by the BIC rule was applied throughout the study for this data set. Namely, the ANN model developed in this real data set includes a single hidden layer with 34 neurons. 0.020 0.015 0.010 s e l c y C 0.005 0.000 g n i l l e d o M 0 7 r e v o ) a t a D t s e T r o f ( r o r r E l e d o M e g a r e v A 0.0067 0.0045 0.0042 0.0033 0.0032 0.0026 41x6 Information Criteria using AIC(2HL) 22 Hebb's rule 33 Optimal Brain Damage rule(1LH) 145 Bayesian 34 Information Criteria using BIC 120 Information Criteria using AIC(1HL) Hidden Neural Number setting given by difference methods Figure 32 Optimal neuron numbers in the hidden layer(s) determined by different methods for the synthetic data set The number of epochs was determined based on the model loss (error)-epoch plot (Figure 33), 76 which showed that model error reduction plateaued after 25 epochs. To maintain a conservative approach, the number of epochs was set to 35. 0.002 s s o L l e d o M 0.0015 0.001 0.0005 0 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031323334353637383940 Train data (NN=34) Test data (NN=34) Figure 33 Model Loss (Error) versus epoch plot with 34 neurons in the hidden layer The feature importance of 10 modeling cycles was averaged to obtain a reliable average feature importance. 3.5.1.2 Relative feature importance analysis of the synthetic data set The importance of the BCS features in this dataset was assessed using the four methods previously discussed, each of which provided an independent measure of the relative significance of the four key BCS features: edge crush test (ECT), thickness of the corrugated board, box perimeter, and box depth. Since different methods may yield results on different scales, a normalization process was applied to ensure consistency and comparability across all approaches. Specifically, the feature importance values were scaled such that the sum of the importance scores for all four features equaled 1 within each method. This normalization allows for a direct comparison of how each method ranks the importance of individual features while eliminating potential discrepancies caused by variations in scale or magnitude. The results of this analysis, presented in Figure 34, 77 provide a comprehensive overview of how different methods evaluate feature importance and highlight any similarities or discrepancies in their assessments. e c n a t r o p m I e r u t a e F 1.000 0.900 0.800 0.700 0.600 0.500 0.400 0.300 0.200 0.100 0.000 0.199 0.249 0.347 0.404 0.007 0.113 0.363 0.003 0.195 0.272 0.001 0.232 0.250 0.525 0.533 0.518 Connection Weights method Gradient-based method Permutation method SHAP values ECT P Caliper d Figure 34 ANN evaluated BCS feature importance of the synthetic data set generated using the Simplified McKee formula The ranking of BCS feature importance consistently identified by the four methods is ECT > Perimeter > Thickness > Depth. However, the connection weights method shows unusually high importance for the depth feature, which deviates significantly from the expected value of zero given the synthetic dataset's design. This discrepancy renders the results from the connection weights method unreliable. Therefore, the feature importance results of the other three approaches were averaged to provide a comprehensive estimate of BCS feature importance. The average BCS feature importance from these three methods shows that ECT weights 0.525, perimeter 0.295, thickness 0.180, and depth 0.004, as illustrated in Figure 35 (left). When compared to the theoretical BCS feature importance ranking calculated using the simplified McKee formula (depicted in Figure 35 (right)), the results from the three reliable methods are closely aligned. This 78 indicates that the ANN approach can be a potential tool for evaluating the relative importance of packaging design features. 0.004 0.179 0.294 0.523 e c n a t r o p m I e r u t a e F 1.000 0.800 0.600 0.400 0.200 0.000 0.000 0.250 0.250 0.500 Average Feature Importance (Synthetic data set from Simplified McKee formula) Theoretical BCS Feature Importance (from Simplified McKee formula) ECT P Caliper d Figure 35 Comparison of the average feature importance assessed by the selected ANN-based methods and theoretical BCS feature importance calculated using Simplified McKee formula 3.5.2 Case study 2-Relative feature importance of the real data set The second data set used was a real data set that comprises industry data on 429 commonly used commercial boxes, detailing five BCS features: box perimeters, depths, ECTs, and flexural stiffness in both the machine and cross-machine directions. These BCS values were obtained through actual testing, accurately reflecting real-world industry conditions. Since real-world data captures fluctuations in BCS feature values due to measurement inaccuracies and variations in material parameters, studying a real dataset from industry is very meaningful to the feature importance assessment. As with the first synthetic dataset, the theoretical relative importance of these five BCS features was calculated using the improved McKee formula (equation (20)) [118], which is ECT weights 79 0.500, perimeter has 0.330, both stiffness EIx and EIy have 0.085, and depth has no importance (0), as shown in Figure 36. 𝐵𝐶𝑆 = 2.028𝐸𝐶𝑇0.746√(𝐸𝐼𝑥 × 𝐸𝐼𝑦) 𝑃0.492 0.254 Where: ECT - Edge Crush Strength (lb/in). EIx, EIy - Flexural stiffness in the machine direction & cross-machine direction of the corrugated board (lb*in). P - Perimeter of the box (in). (20) e c n a t r o p m I e r u t a e F 1.000 0.900 0.800 0.700 0.600 0.500 0.400 0.300 0.200 0.100 0.000 0.085 0.085 0.330 0.500 Theoretical BCS Feature Importnace (from Improved McKee formula) ECT P EIx EIy d Figure 36 Theoretical BCS Feature Importance calculated using the Improved McKee formula 3.5.2.1 ANN training using the synthetic data set During the ANN model training process, the real dataset with 429 data points was randomly divided into two subsets: 70% of the data (287 data points) for training the model and the remaining 30% (142 data points) for testing the model’s accuracy. As with the synthetic data set, the hidden layer neuron number setting was also determined by comparing the model errors given by the five optimization methods mentioned above (including 80 the Bayesian optimization method, AIC method, BIC method, Hebb's rule, and the OBD rule) to minimize the model error and maximize the computational efficiency. Figure 37 presents the optimal neuron settings for the hidden layer as determined by each method, along with their corresponding model errors over 70 modeling cycles. 0.200 0.180 0.160 0.140 0.120 0.100 0.080 0.060 0.040 0.020 0.000 g n i l e d o M 0 7 r e v o ) a t a D t s e T r o f ( r o r r E l e d o M e g a r e v A s e l c y C 0.147 0.132 0.109 0.105 0.101 0.091 3×2 Information Criteria using AIC(2HL) 12×3 Hebb's rule 5 Information Criteria using BIC 15 Information Criteria using AIC(1HL) 34 Optimal Brain Damage rule(1LH) 103×103 Bayesian Hidden Neural Number setting given by difference methods Figure 37 Optimal neuron numbers in the hidden layer(s) determined by different methods for the real data set The results showed that Bayesian optimization achieved the lowest model errors with two hidden layers, each containing 103 neurons. The OBD rule produced the second-lowest model, with one hidden layer and 34 neurons. The error difference between these two methods was no more than 0.0191, considering their 95% confidence intervals (0.101±0.0048 and 0.091±0.0043). This indicates that the more complex configuration suggested by Bayesian optimization was not necessary. In contrast, the simple configuration of 34 provides nearly the same accuracy but is significantly more efficient in terms of computational resources. Consequently, the neuron configuration proposed by the OBD rule was applied throughout the study for this data set. 81 Namely, the ANN model developed in this real data set includes a single hidden layer with 34 neurons. The number of epochs was determined based on the model loss (error)-epoch plot (Figure 38), which showed that model loss (error) reduction plateaued after 40 epochs. To ensure a conservative result, the number of epochs was set to 50. s s o L l e d o M 0.03 0.025 0.02 0.015 0.01 0.005 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 Epoch Train data (NN=34) Test data (NN=34) Figure 38 Model loss (error) versus epoch plot with 34 neurons in the hidden layer The feature importance of 10 modeling cycles was averaged to achieve a reliable average feature importance. 3.5.2.2 Relative feature importance analysis of the real data set Given the unreliability of the connection weights method for assessing BCS feature importance, particularly for the depth feature, it was excluded from the analysis of the real dataset. Instead, the BCS feature importance for the real data was evaluated using the remaining three methods. Following the same procedure applied to the synthetic data set, the BCS feature importances were normalized across these methods to ensure that the sum of the five BCS feature importances 82 equaled 1 for each method, as illustrated in Figure 39. 0.082 0.085 0.099 0.260 0.059 0.100 0.132 0.230 0.052 0.112 0.134 0.215 0.070 0.099 0.116 0.235 0.474 0.480 0.487 0.480 e c n a t r o p m I e r u t a e F 1.000 0.900 0.800 0.700 0.600 0.500 0.400 0.300 0.200 0.100 0.000 Gradient-based method Permutation method SHAP values Average Feature Importance ECT P EIy EIx d Figure 39 ANN evaluated BCS feature importance of the real data set The results from the three methods demonstrate overall consistency and have been averaged to establish a comprehensive ranking of the five BCS feature importances. The average BCS feature importance is ranked as ECT > Perimeter > EIy > EIx > Depth, which aligns well with the theoretical BCS feature importance calculated using the mathematical model, as shown in Figure 40. The average BCS feature importance from these three methods shows that ECT weights 0.480, perimeter 0.235, EIy 0.116, EIx 0.099, and depth 0.070. The analysis of the real dataset reveals that, although Depth is ranked last, it still has a notable influence. In general, Depth remains an important factor in determining BCS (Buckling Compression Strength). As the depth value increases, buckling theory suggests that depth can significantly affect compression strength, making it a critical consideration. Despite the McKee equation [74] theoretically assigning a zero- importance value to depth, the real-world data demonstrates that the depth’s effect should not be disregarded. Our results emphasize the influence of depth, validating that the proposed methods 83 for measuring feature importance accurately reflect practical applications. Furthermore, although the theoretical BCS feature importance ranks EIx EIy equally, the average importance ranking from the three methods shows a 0.017 difference between these two features. This small variation is understandable considering the fluctuations of measurement inaccuracies that mathematical models may not fully account for. 0.070 0.099 0.116 0.235 0.480 1.000 0.900 0.800 0.700 0.600 0.500 0.400 0.300 0.200 0.100 0.000 0.085 0.085 0.330 0.500 Average Feature Iportance (Real data set) Theoretical BCS Feature Importnace (from Improved McKee formula) ECT P EIy EIx d Figure 40 Comparison of the average ANN evaluated BCS feature importance of the real data set and theoretical BCS feature importance calculated using the Improved McKee formula In summary, the BCS feature importance ranking for the real dataset is consistent with the findings from the theoretical feature importance. The evaluation of the real dataset provides a real-world context to the findings and further demonstrates the capability of the ANN approach in terms of feature importance evaluation for packaging design. 3.6 CONCLUSION This study introduces a new method for evaluating the importance of packaging design features using four ANN-based approaches: the Connection Weights method, the Gradient-based method, 84 the Permutation method, and SHAP values. Using BCS as a representative packaging design property, the relative importance of up to six BCS features was assessed through these ANN-based approaches. One synthetic dataset derived from the well-established mathematical model (McKee formula) and one real dataset were used as two case studies for training the ANN model and obtaining the feature importance influencing BCS. The feature importance rankings provided by the ANN approaches were consistent with the theoretical feature importance calculated using the mathematical model across both datasets. This result highlights the effectiveness of the ANN approach in evaluating feature importance in packaging design, allowing for a more efficient assessment of the relative impact of various design features. This allows designers to prioritize adjustments to the most influential features, ultimately reducing material consumption and costs. For instance, to increase BCS, designers can first consider increasing the box dimensions for a minimal design effort and reduced material waste, rather than modifying the thickness or flexural stiffness, which would require changes to the materials or production process. Overall, this study offers a novel approach to assessing packaging design feature importance through ANN techniques, providing practical insights for improving material efficiency and cost-effectiveness. This method can be easily applied to evaluate the relative importance of other packaging properties beyond BCS, offering valuable insights for addressing various challenges in the packaging industry using ANN approaches. 85 CHAPTER 4: BUILDING A GENERALIZED ANN MODEL TO EVALUATE BCS 4.1 INTRODUCTION The goal of this chapter is to build a generalized artificial neural network (ANN) model for box BCS evaluation. Based on the available data, a dataset extracted from a real data set containing the majority of BCS values used in the industry was utilized to train the model. The ANN modeling factors include the number of epochs, modeling cycles, and the hidden layer neuron setting. The number of epochs and modeling cycles were set up based on conservative results from the dataset with variation. Specifically, the number of epochs was set to 140, and the number of modeling cycles was initially set to 70 to get a conservative result. The hidden layer neuron setting was optimized using the same five optimization methods as Chapter 4 while balancing the model accuracy and computational efficiency. Namely, the five optimization methods used in this chapter are the Information Criteria using the Akaike information criterion (AIC) method, Hebb's rule, the Information Criteria using the Bayesian information criterion (BIC) method, the Optimal Brain Damage rule, and the Bayesian Optimization method. To evaluate the performance of the ANN model, the model prediction error of the test data was calculated and compared. After comparing the model error given by each optimization method, the optimal hidden neuron configuration, determined by the Optimal Brain Damage rule, consists of a single hidden layer with 35 neurons. This configuration was selected for its ability to best balance model accuracy and computational efficiency. This configuration resulted in a model error of 9.51% when evaluated on the test dataset, indicating that the model achieves a strong balance between accuracy and generalizability for practical industry applications. The observed error can primarily be attributed to the presence of boundary data points, which introduce variability and potential inconsistencies in the predictions, as well as the limited size of the dataset, which restricts 86 the model's ability to learn from a broader range of patterns. Despite these challenges, the model demonstrates reliable performance, making it suitable for real-world implementation. Further refinements, such as expanding the dataset or employing advanced regularization techniques, could potentially enhance accuracy and reduce error margins. The overall structure and logical progression of this chapter are visually outlined Figure 41, providing a clear roadmap of the analysis and methodology employed. Figure 41 Flow of building a generalized ANN model for BCS prediction 4.2 EXTRACT REAL DATA SET TO COVER THE MAJORITY OF BCS IN THE INDUSTRY To build a generalized ANN model for BCS evaluation, we trained a data set extracted from real- world data containing the most commonly used box dimensions. Based on an investigation of box dimensions used in the industry, provided by Packaging Corporation of America (PCA) company, we determined that 90% of commonly used boxes in the industry have the following dimensions: length between 8 and 25 inches, width between 5.75 and 19 inches, and depth between 4 and 28 inches. Therefore, our extracted dataset includes box dimensions within these ranges. The dataset comprises 395 data points in total, with BCS values ranging from 347 to 2172 lbs. 87 4.3 DETERMINATION OF HIDDEN LAYER NEURON SETTING Five methods mentioned above (including the Information Criteria using the AIC method, Hebb's rule, the Information Criteria using the BIC method, the Optimal Brain Damage rule, and the Bayesian Optimization method) for optimizing the ANN hidden neuron number setting were applied. The model error of test data obtained by each method has been calculated and compared, as shown in Figure 42. 0.200 0.180 0.160 0.140 0.120 0.100 0.080 0.060 0.040 0.020 0.000 g n i l l e d o M 0 7 r e v o ) a t a D t s e T r o f ( r o r r E l e d o M e g a r e v A s e l c y C 0.164 0.1012 0.1005 0.098 0.097 1 Information Criteria using BIC 2×7 Information Criteria using AIC(2HL) 12×3 Hebb's rule 14 Information Criteria using AIC(1HL) 35 Optimal Brain Damage rule(1LH) Hidden Neural Number setting given by difference methods 0.086 138×138 Bayesian Figure 42 ANN model error with their optimized hidden neuron numbers using the five selected methods For the Information Criteria using the AIC method, both one hidden layer and two hidden layers were tested. The result shows that the optimal hidden neuron setting from the five methods are 1 with one single hidden layer for the BIC method, 2 and 7 in the first and second hidden layer for the AIC method (2 hidden layers), 12 and 3 in the first and second hidden layer for the Hebb’s rule, 14 with one single hidden layer for the AIC method (1 hidden layer), 35 with one hidden layer for 88 the OBD rule, and 138 in both the first and second hidden layer for the Bayesian Optimization method. Overall, the model error decreases as the hidden neuron number increases. The model errors of the test data using the Bayesian method and the OBD rule are the lowest and second lowest. However, the error reduction rate is not significantly improved when the number of hidden neurons increases to 138 across two hidden layers (as determined by the Bayesian method), compared to 35 neurons in a single hidden layer (as determined by the OBD method). Therefore, 35 neurons with one single hidden layer obtained by the OBD rule were chosen as the optimal hidden neuron number setting, reducing the training computation time while maintaining a good performance for the ANN prediction. 4.4 TRAINING ANN MODEL TO EVALUATE BCS IN THE INDUSTRY With the optimal neuron number of 35 in the hidden layer, 140 epochs, 70 modeling cycles, and 395 data points from the real world, the ANN model was trained and the model error from 10 to 70 modeling cycles with have been calculated with the 95% confidence interval, as shown in Figure 43. Train data Test data e c n e d i f n o C % 5 9 h t i w ( r o r r E l e d o M ) l a v r e t n I 10.00% 9.80% 9.60% 9.40% 9.20% 9.00% 8.80% 8.60% 8.40% 10 20 30 40 Modeling Cycles' number 50 60 70 Figure 43 ANN Model error of the samples covering the 90% BCS values with commonly used dimensions of boxes in the industry 89 Overall, the model error of both train and test data converged at 30 modeling cycles. The ANN prediction error for BCS is below 9.60%. The average BCS error of the train and test data across 70 modeling cycles is 9.26% and 9.51%, respectively, as shown in Figure 44. l a v r e t n i e c n e d i f n o c % 5 9 h t i w r o r r e S C B 9.60% 9.50% 9.40% 9.30% 9.20% 9.10% 9.00% 8.90% 9.51% 9.26% Train data Test data Figure 44 Average BCS error in the train and test data cross the 70 modeling cycles To investigate the reason that causes the BCS error for the ANN model prediction, the BCS distribution of two random distinctive modeling cycles was also studied and plotted, as shown in Figure 45. The actual BCS distribution is represented by the blue columns, while the predicted BCS distributions for two randomly selected modeling cycles are shown in orange and green. When comparing the actual and predicted BCS distributions, the predicted distributions for these cycles indicate that data points with BCS values between 347 lbs and 450 lbs, as well as values greater than 1997 lbs, cannot be accurately predicted and consistently exhibit higher errors than other data points. This suggests that these boundary data points contribute significantly to the high prediction error of the ANN model for BCS evaluation in the current dataset. Another reason could be that the dataset size is not large enough, as the minimum number of data points for an idealized dataset, as mentioned in Chapter 3, was around 1500. In comparison, the extracted data set has a 90 smaller sample size than the minimum sample size required for achieving reliable ANN prediction accuracy. y c n e u q e r F 20 18 16 14 12 10 8 6 4 2 0 7 4 3 7 9 3 7 4 4 7 9 4 7 4 5 7 9 5 7 4 6 7 9 6 7 4 7 7 9 7 7 4 8 7 9 8 7 4 9 7 9 9 7 4 0 1 7 9 0 1 7 4 1 1 7 9 1 1 7 4 2 1 7 9 2 1 7 4 3 1 7 9 3 1 7 4 4 1 7 9 4 1 7 4 5 1 7 9 5 1 7 4 6 1 7 9 6 1 7 4 7 1 7 9 7 1 7 4 8 1 7 9 8 1 7 4 9 1 7 9 9 1 7 4 0 2 7 9 0 2 7 4 1 2 BCS (lbs) Actual BCS Predicted BCS(MC=2) Predicted BCS(MC=70) Figure 45 BCS distribution of ANN model prediction for the extracted real data set The structure of the generalized ANN model built for BCS prediction is shown in Figure 46. Figure 46 The structure of the generalized ANN model built by the real data set 91 The structure of the generalized ANN model contains six BCS features as the inputs, 1 hidden layer with 35 neurons. The optimal epoch number is 50 and the optimal modeling cycle number is 30. 4.5 CONCLUSION In this chapter, a generalized ANN model for BCS evaluation targeting industrial applications has been built. A generalized dataset derived from real-world data was used to cover the 90% BCS values with commonly used dimensions of boxes in the industry. Drawing from the training results of the previous data sets (data set with variations), ANN modeling factor values for epoch number and modeling cycles’ number were set to 140 and 70, respectively, to ensure a conservative outcome. Five methods (the Information Criteria using the AIC method, Hebb's rule, the Information Criteria using the BIC method, the Optimal Brain Damage rule, and the Bayesian Optimization method) for optimizing the hidden layer neuron setting were investigated and their training results with the corresponding model errors were compared. The optimal neuron number in the hidden layer was determined to be 35 to strike a balance between minimizing the model error of the test data and saving computational training time. Throughout 70 modeling cycles, the average BCS error for the test data, accounting for the corresponding neuron count in the hidden layer, was computed at 9.51%. The BCS value distribution revealed that the data points whose BCS values fell between 347 lbs and 450 lbs, as well as those exceeding 1997 lbs, exhibited higher errors in the ANN prediction. This observation suggests that the primary factor contributing to the high BCS error is the presence of boundary data points. These data points, situated at the edges of the dataset range, pose challenges for the ANN model in accurately predicting their corresponding BCS values. The small sample size of the extracted real dataset is another limiting factor that hinders achieving higher ANN prediction accuracy. In conclusion, the current ANN model can 92 predict the BCS of commonly used box dimensions in the industrial applicable level with an error of 9.51%. One possible strategy to improve ANN prediction accuracy is to continually expand the current dataset sample size using available resources. In summary, this study provides valuable insights utilizing the ANN approach to evaluate BCS of corrugated packages and solve the problems in the corrugated industry. 93 CHAPTER 5: RESEARCH SUMMARY AND FUTURE RESEARCH 5.1 RESEARCH SUMMARY This dissertation research explores the feasibility of using ANN to evaluate the BCS of corrugated packaging. The results demonstrate that employing ANN for BCS prediction is both feasible and meaningful, offering substantial advantages over traditional evaluation methods. ANNs can effectively address several challenges inherent in current BCS evaluation methods, including enhancing efficiency, reducing costs, and ensuring the validity of model construction, among others. The intelligent and robust analytical capabilities of ANN, grounded in data and mathematical methodologies, hold significant potential for enhancing efficiency, cost- effectiveness, and reliability in BCS evaluation. This study contributes to the exploration of ANN's potential in predicting BCS and its application in addressing complex challenges within the corrugated industry. To optimize the key modeling parameters of ANN for BCS evaluation with a reliable result (Chapter 3), both data sets from literature with small data population and synthetic data sets with large data populations have been trained to interpret the performance of ANN for BCS estimation. Four key modeling parameters (the combination of neuron numbers in hidden layers, the number of epochs, the number of modeling cycles, and the size of the data set) can significantly influence the ANN prediction accuracy and can be optimized based on the ANN model error reduction. These four ANN modeling parameters have been identified for the data set from literature and two synthetic data sets. The result shows that these ANN modeling factors’ values vary as the data sets internal noises change. The small data set with 63 data points needs a relatively larger hidden neuron number setting of 160 and 36 in the first and second hidden layer with 100 epochs and 60 modeling cycles; For a large data set with 3009 data points with variations of ±0.4% and ±5.4%, 94 neuron number setting in the hidden layer needed was 45 and 142 in the first and second hidden layer, epoch needed was 140, modeling cycles needed were 50 and 70, and the minimum data points number required to achieve a reliable ANN prediction were around 1500 and 2500, respectively. The results highlighted that the optimal values of these ANN modeling factors varied depending on the characteristics and size of the dataset, particularly in response to internal noise levels. This variability underscores the importance of carefully tuning these parameters to achieve robust and accurate BCS predictions across different data scenarios. The optimization scenario is to strike a balance between model error minimization and model complexity, as well as the training efficiency maximization. To explore the feasibility of applying the ANN approach for evaluating the relative importance of packaging design features, BCS was used as a representative packaging property and up to six BCS features’ relative importance have been evaluated in ANN to guide packaging design cost and material saving (Chapter 4). A synthetic dataset (generated using the McKee formula) and a real dataset (from industry) were used to determine the relative feature importance influencing BCS. Four methods—Connection Weights, Gradient-based, Permutation, and SHAP values— were employed in this analysis. This analysis identified the relative importance ranking of six BCS features (ECT, thickness, flexural stiffness in both machine (EIx) and cross-machine (EIy) directions of the corrugated board, perimeter, and depth of the box). The result shows that the ANN estimated BCS relative importance ranking aligns with the theoretical relative feature importance ranking calculated using the McKee formula. Notably, the analysis of the real dataset reveals that, although Depth is ranked last, it still has a notable influence. In general, Depth remains an important factor in determining BCS (Buckling Compression Strength). As the depth value increases, buckling theory suggests that depth can significantly affect compression strength, 95 making it a critical consideration. Despite being theoretically assigned an importance value of zero in the McKee equation [74], implying it may not be a key factor in certain models or calculations, the real-world data demonstrates that its effect should not be disregarded. This result indicated that the ANN predicted BCS feature importance is more reflective to the real-world cases compared with the analytical method. This study demonstrates the capability of the ANN approach in terms of feature importance evaluation for packaging design, helping designers prioritize adjustments to the most influential features, ultimately reducing material consumption and costs. This method can be easily applied to evaluate the relative importance of other packaging properties beyond BCS, offering valuable insights for addressing various challenges in the packaging industry using ANN approaches. Based on the study conducted above, finally, a generalized ANN model for BCS evaluation has been built using a data set derived from real data (Chapter 5). The ANN modeling factors of epoch and modeling cycles were conservatively set to 140 and 70 based on the training of the data set with variation. The hidden layer neuron number setting was optimized using the same five optimization methods as Chapter 4 (including the Information Criteria using the AIC method, Hebb's rule, the Information Criteria using the BIC method, the Optimal Brain Damage rule, and the Bayesian Optimization method). The optimized hidden neuron setting was identified to be 35 given by the Optimal Brain Damage rule by achieving a balance of model error minimization and model training time-saving. The epoch number and modeling cycle number were determined to 50 and 30 when the training error reduction reached a plateau. With the corresponding ANN modeling parameters and 395 data points, a generalized ANN model was trained and achieved an accuracy of BCS error of 9.51% for the industrial applicable level. 96 5.2 FUTURE RESEARCH Great effort should be continuously put into the development of the ANN model improvement as it can bring renovation for corrugated packaging design and optimization, achieving efficiency, sustainability, and cost-effectiveness in the corrugated board industry. First, BCS data obtained from real testing is the most reliable source. Therefore, physical testing can be used to validate the ANN-predicted BCS values and assess the importance of features such as flexural stiffness in the machine and cross-machine directions. Second, additional parameters influencing BCS, such as the corrugated board layer and flute type, can be incorporated into this study. The current research focuses solely on single-wall boxes; however, double-wall boxes and C-flute are in high demand in the U.S. market. Third, other criteria for evaluating the accuracy of the ANN model, beyond MSE, can be considered to provide a more comprehensive understanding of its predictive performance. Additionally, the Finite Element Method (FEM) tool could be utilized to generate BCS data [119], replacing synthetic data derived from the McKee formula. Besides, the ANN model prediction accuracy can be further improved by trying some other techniques that were not involved in this study, such as data transformation [120] to modify the distribution of input variables so that they can better match outputs, data augmentation [121, 122] to boost robust accuracy of the ANN model, weight decay and dropout to improve the generalization performance of ANN model and further improve the model accuracy [123, 124]. Last but not the least, the current data set can be expanded as much as possible to cover more BCS data existing in the industry so that the generalization of the ANN model can be improved to a better level. The more BCS data reflecting industry application can be collected, the more accurate the ANN model prediction fitting to the actual BCS values. Although the current data set has been able to cover 90% of the corrugated boxes commonly used in industry, it is still important to 97 expand it to cover the remaining 10% of corrugated boxes, which is critical for the final generalization of the ANN model. Further, it is critical to keep up to date the data set to cover the large majority of dimensions of the corrugated boxes used in the industry considering the changing needs in the market, so that the developed ANN model can keep up pace with the needs of customers in modern life. 98 BIBLIOGRAPHY 1. B. Frank, “Corrugated box compression-A literature survey,” Paclkaging Technology and science, vol. 27, no. 2, pp. 105-128, 2014. 2. 3. J. Y.-l. Z. a. J. S. Chen, “An overview of the reducing principle of design of corrugated box used in goods packaging,” Procedia Environmental Sciences, vol. 10, pp. 992-998, 2011. “Explore Custom Box Types,” Online.. Available: https://customboxesnow.com/box- styles/. 4. Global Corrugated Packaging Market Outlook: Global Opportunity And Market Segmentation Based On Packaging Type, Based On End-User & By Region With Forecast 2017-2030, 2020. 5. W. Z. W. R. &. S. Q. Yi Y., “Life cycle assessment of delivery packages in China,” Energy Procedia, vol. 105, pp. 3711-3719, 2017. 6. B. Sharma, “Corrugation Trends,” Quarterly Journal of Indian Pulp and Paper Technical Association, vol. 33, no. E2, pp. 51-54, 2021. 7. “Corrugated Packaging Market - By Product (Corrugated Box, Folding boxboard), By Printing Technique (Lithography, Flexography, Digital Printing), By End-use Industry (Food & beverages, Medical, Agriculture, Industrial, Paper & Carton), & Forecast, 2023 – 2022. Online.. Available: 2,” Global Market https://www.gminsights.com/industry-analysis/corrugated-packaging- market#:~:text=Corrugated%20Packaging%20Market%20size%20was%20valued,of%20 the%20burgeoning%20e- commerce%20sales%20worldwide.&text=Corrugated%20Packaging%20Market%20size, burgeoning%20e-comme. Inc., Nov Insights 8. A. &. B. J. Clayton, “Investigation of the Effect of Corrugated Boxes on the Distribution of Compression Stresses on the Top Surface of Wooden Pallets,” Correira, M. (n.d.). Creasing Training Manual, vol. 5, pp. 5-6, 2018. 9. K. R. N. M. &. M. B. T. Ramdass, “Determining the Root Causes of Boxes Stacking Strength Failure and Find Possible Solutions”. 10. S. P. S. J. &. S. K. Singh, “Effect of palletized box offset on compression strength of unitized and stacked empty corrugated fiberboard boxes,” Journal of Applied Packaging Research, vol. 5, no. 3, p. 157, 2011. 11. G. T. T. &. Ö. S. Meng, “Stacking misalignment of corrugated boxes-a preliminary study,” in In 23rd IAPRI Symposium on Packaging, Windsor, UK, 2007. 12. J. C. F. A. E. &. G. A. Gallo, “Mechanical behavior modeling of containers and octabins made of corrugated cardboard subjected to vertical stacking loads,” Materials, vol. 14, no. 9, p. 2392, 2021. 13. “PROPERTIES OF PAPER,” Online.. Available: 99 https://www.paperonweb.com/paperpro.htm. 14. D. G. T. &. K.-P. A. Mrówczyński, “Estimation of the compressive strength of corrugated board boxes with shifted creases on the flaps,” Materials, vol. 14, no. 18, p. 5181, 2021. 15. S. Manoj, “Performance And Appearance Of Packaging Grades Of Paper – Study On Quality Measurement Methods,” The Official International Journal of the Indian Pulp and Paper Technical Association, vol. 27, no. 4, pp. 29-39, 2015. 16. K. T. &. E. S. D. Ulrich, Product Design and Development (6th ed.), McGraw-Hill Education, 2016. 17. I. Chalmers, “Evaluating Corrugated Box Performance,” Corrugator Today, pp. 1-10, 2019. 18. “Evaluating Box Comression Strength with Compression Testing,” Online.. Available: https://www.pacorr.com/blog/evaluating-box-compression-strength-with-compression- testing/. 19. “ASTM D4169 Transit Simulation,” Online.. Available: https://pkgcompliance.com/test/astm-d4169-transit-distribution- simulation/?matchtype=p&network=o&device=c&keyword=ASTM%20D4169&campaig n=519511179&adgroup=1316117729722451&msclkid=b7d0db76040214553689bd2308 b4ad02&utm_source=bing&utm_medium=cpc&utm_campaign. 20. “Practices of Science: Scientific Error,” Online.. Available: https://manoa.hawaii.edu/exploringourfluidearth/physical/world-ocean/map- distortion/practices-science-scientific- error#:~:text=Human%20error%20is%20due%20to,recorded%20or%20written%20down %20incorrectly. 21. P. Group, “The Box compression Test Procedure: A Comprehensive Overview,” 12 April 2024. Online.. Available: https://medium.com/@prestogrouponline/the-box-compression- test-procedure-a-comprehensive-overview-99e0f0330b22. 22. R. C. G. J. W. &. W. J. R. McKee, “Compression strength formula for corrugated boxes,” Paperboard Packag, vol. 48, no. 8, pp. 149-159, 1963. 23. B. a. K. K. Frank, “Assessing variation in package modeling,” TAPPI JOURNAL, vol. 20, no. 4, pp. 231-238, 2021. 24. “Chalmers DST MD http://www.rdmtest.com/p/Chalmers-DST-MD-Torsional- Stiffness/#:~:text=MD%20Torsional%20Stiffness%20measures%20how,in%20real%20 world%20service%20environments. Stiffness,” Torsional Online.. Available: 25. I. R. Chalmers, “The use of MD shear stiffness by the torsional stiffness technique to predict corrugated board properties and box performance,” Appita: Technology, Innovation, Manufacturing, Environment, vol. 60, no. 5, pp. 357-361, 2007. 100 26. T. J. a. B. F. Urbanik, “Box compression analysis of world-wide data spanning 46 years,” Wood and fiber science, pp. 399-416, 2006. 27. T. C. C. J. B. T. M. A. A. &. O. U. L. Fadiji, “The efficacy of finite element analysis (FEA) as a design tool for food packaging: A review,” Biosystems Engineering, vol. 174, pp. 20- 40, 2018. 28. M. A. C. I. G. B. &. L. E. Jiménez-Caballero, “Design of different types of corrugated board packages using finite element tools,” in SIMULIA customer conference, 2009. 29. R. Anyoha, “The History of Artificial Intelligence,” Online.. Available: https://sitn.hms.harvard.edu/flash/2017/history-artificial- intelligence/#:~:text=It%E2%80%99s%20considered%20by%20many%20to%20be%20t he%20first,by%20John%20McCarthy%20and%20Marvin%20Minsky%20in%201956.. 30. P. Lisboa, “A review of evidence of health benefit from artificial neural networks in medical intervention,” Neural networks, vol. 15, no. 1, pp. 11-39, 2002. 31. J. Smith, “Advances in neural networks and potential for their application to steel metallurgy,” Materials Science and Technology, vol. 36, no. 17, pp. 1805-1819, 2020. 32. A. F. S. a. Z. H. D. Sheikhtaheri, “Developing and using expert systems and neural networks in medicine: a review on benefits and challenges,” Journal of medical systems, vol. 38, pp. 1-6, 2014. 33. S. K. A. R. E. &. B. D. Adamopoulos, “Predicting the properties of corrugated base papers using multiple linear regression and artificial neural networks,” Drewno: prace naukowe, doniesienia, komunikaty, vol. 59, 2016. 34. S. P. R. a. D. K. Malasri, “Predicting Corrugated Box Compression Strength Using an Artificial Neural Network,” International Journal, vol. 4, no. 1, pp. 169-176, 2016. 35. T. C. R. S. J. &. J. T. Archaviboonyobul, “An analysis of the influence of hand hole and ventilation hole design on compressive strength of corrugated fiberboard boxes by an artificial neural network model,” Packaging Technology and Science, vol. 33, no. 4-5, pp. 171-181, 2020. 36. A. Krogh, “What are artificial neural networks?,” Nature biotechnology, vol. 26, no. 2, pp. 195-197., 2008. 37. N. C. Steven Walczak, “Artificial Neural Networks,” in Encyclopedia of Physical Science and Technology (Third Edition), 2003. 38. A. Lheureux, “Feed-forward vs feedback neural networks,” Online.. Available: https://blog.paperspace.com/feed-forward-vs-feedback-neural-networks/. 39. “Feed Forward Neural Network,” Online.. Available: https://deepai.org/machine-learning- glossary-and-terms/feed-forward-neural-network. 40. “Recurrent Neural Network,” Online.. Available: https://deepai.org/machine-learning- glossary-and-terms/recurrent-neural-network. 101 41. A. Singh, “Artificial Neural Network | Types | Feed Forward | Feedback | Structure | Perceptron | Machine Learning | Applications,” 21 May 2018. Online.. Available: https://msatechnosoft.in/blog/artificial-neural-network-types-feed-forward-feedback- structure-perceptron-machine-learning- applications/#:~:text=Feedback%20neural%20networks%20are%20dynamic,equilibrium %20needs%20to%20be%20found.. 42. N. R. T. &. B. W. Shahid, “Applications of artificial neural networks in health care organizational decision-making: A scoping review,” PloS one, vol. 14, no. 2, p. e0212356., 2019. 43. “Reinforcement learning,” Wikimedia Foundation, Inc., Online.. Available: https://en.wikipedia.org/wiki/Reinforcement_learning. 44. “Markov decision process,” Wikimedia Foundation, Inc., Online.. Available: https://en.wikipedia.org/wiki/Markov_decision_process. 45. O. I. J. A. O. A. E. D. K. V. M. N. A. &. A. H. Abiodun, “State-of-the-art in artificial neural network applications: A survey,” Heliyon, vol. 4, no. 11, 2018. 46. E.-S. M. A. I. S. M. M. M. E. a. S. E. H. El-Kenawy, “Novel feature selection and voting classifier algorithms for COVID-19 classification in CT images,” IEEE access, vol. 8, pp. 179317-179335, 2020. 47. P. M. S. a. A. K. Chhajer, “The applications of artificial neural networks, support vector machines, and long–short term memory for stock market prediction.,” Decision Analytics Journal, vol. 2, p. 100015, 2022. 48. M. a. Y. S. Qiu, “Predicting the direction of stock market index movement using an optimized artificial neural network model,” PloS one, vol. 5, p. e0155133, 2016. 49. D. V. K. a. A. M. Selvamuthu, “Indian stock market prediction using artificial neural networks on tick data,” Financial Innovation, vol. 5, no. 1, pp. 1-12, 2019. 50. A. D. K. M. S. a. L. M. G. Ziletti, “Insightful classification of crystal structures using deep learning,” Nature communications, vol. 9, no. 1, p. 2775, 2018. 51. K. K. F. G. C. C. S. V. K. R. V. M. Z. a. F. T. Choudhary, “Computational scanning tunneling microscope image database,” Scientific data, vol. 8, no. 1, p. 57, 2021. 52. C. B. E. V. Á. S. L. S. G. N. D. V. J. T. T. J. J. B. G. a. C. S. Cooper, “Design‐to‐device approach affords panchromatic co‐sensitized solar cells,” Advanced Energy Materials, vol. 9, no. 5, p. 1802820, 2019. 53. V. J. D. L. W. A. D. Z. R. O. K. K. A. P. G. C. a. A. J. Tshitoyan, “Unsupervised word embeddings capture latent knowledge from materials science literature.,” Nature, vol. 571, no. 7763, pp. 95-98, 2019. 54. A. M. L. a. C. H. D. Bahrami, “Intelligent design retrieval and packaging system: application of neural networks in design and manufacturing,” THE INTERNATIONAL 102 JOURNAL OF PRODUCTION RESEARCH, vol. 33, no. 2, pp. 405-426, 1995. 55. S. Malasri, “Applications of Neural Networks in Transport Packaging,” in PACKCON 2015, Online, 2015. 56. Y. X. Y. C. Z. a. Z. W. Liang, “Application of neural networks to identification of nonlinear characteristics in cushioning packaging,” Mechanics research communications, vol. 23, no. 6, pp. 607-613, 1996. 57. A. M. S. S. Maleki, “Application of artificial neural networks for producing an estimation of high-density polyethylene,” Polymers, vol. 12, no. 10, p. 2319, 2020. 58. O. S. A. A. T.-C. J. a. I. D. Adeleke, “Application of artificial neural networks for predicting the physical composition of municipal solid waste: An assessment of the impact of seasonal variation,” Waste Management & Research, vol. 39, no. 8, pp. 1058-1068, 2021. 59. V. V. S. a. C. D.-F. Oliveira, “Artificial neural network modelling of the amount of separately-collected household packaging waste,” Journal of cleaner production, vol. 210, pp. 401-409, 2019. 60. Y. B. A. C. Ian Goodfellow, Deep Learning (Adaptive Computation and Machine Learning series, Cambridge, MA: The MIT Press (Illustrated edition), 2016. 61. M. a. M. K. Kubat, “Artificial neural networks,” An introduction to machine learning, pp. 117-143, 2021. 62. H. a. Y. C. Hang, “Motion Estimation for Image Sequence Compression,” in Handbook of Visual Communications, H.-M. Hang and J.W. Woods, Editors, vol. 17, San Diego, Academic Press, 1995, pp. 147-188. 63. R. J. Hyndman, “Moving Averages,” in International Encyclopedia of Statistical Science. 2nd ed., Lovric, Miodrag. , Springer, 2011, pp. 866-869. 64. L. J. L. G. C. a. H. M. Zhang, “Research on Packaging Evaluation System of Fast Moving Consumer Goods Based on Analytical Hierarchy Process Method,” in Advanced Graphic Communications and Media Technologies, Springer Singapore, 2017, pp. 711-718. 65. A. J. V. D. M. A. L. a. M. C. Pérez, “An Analytical Hierarchy Approach Applied in the Packaging Supply Chain,” in Supply Chain Management and Logistics in Emerging Markets, Emerald Publishing Limited, 2020, pp. 89-104. 66. B. J. G. M. a. D. S. Hicks, “A finite element‐based approach for whole‐system simulation of packaging systems for their improved design and operation,” Packaging Technology and Science, vol. 22, no. 4, 2009. 67. J. P. M. C. D. S. J. H. M. &. H. S. W. Park, “Finite element-based simulation for edgewise compression behavior of corrugated paperboard for packaging of agricultural products,” Applied Sciences, vol. 10, no. 19, p. 6716, 2020. 68. H. A. E. A. S. C. G. a. Z. A. Nefeslioglu, “A modified analytical hierarchy process (M- 103 AHP) approach for decision support systems in natural hazard assessments,” Computers & Geosciences, pp. 1-8, 2013. 69. L. J. L. G. C. a. H. M. Zhang, “Research on Packaging Evaluation System of Fast Moving Consumer Goods Based on Analytical Hierarchy Process Method,” in Advanced Graphic Communications and Media Technologies, Singapore, Springer, 2017, pp. 711-718. 70. S. a. L. S. Lundberg, “A unified approach to interpreting model predictions,” Advances in neural information processing systems, vol. 30, 2017. 71. S.-H. Y.-C. C. a. C.-H. H. Tsaur, “Determinants of guest loyalty to international tourist hotels—a neural network approach,” Tourism Management, vol. 23, no. 4, pp. 397-405, 2002. 72. J. A. S. a. R. A. K. Iqbal, “The relative importance of textual indexes in predicting the future performance of banks: A connection weight approach,” Borsa Istanbul Review, vol. 23, no. 1, pp. 240-253, 2023. 73. A. T. Goh, “Back-propagation neural networks for modeling complex systems,” Artificial intelligence in engineering, vol. 9, no. 3, pp. 143-151, 1995. 74. T. J. K. G. A. C. a. T. G. Gajewski, “On the use of artificial intelligence in predicting the compressive strength of various cardboard packaging,” Packaging Technology and Science, vol. 37, no. 2, pp. 97-105, 2024. 75. S. P. R. a. D. K. Malasri, “Predicting corrugated box compression strength using an artificial neural network,” International Journal, vol. 4, no. 1, pp. 169-176, 2016. 76. T. J. K. G. A. C. a. T. G. Gajewski, “On the use of artificial intelligence in predicting the compressive strength of various cardboard packaging,” Packaging Technology and Science, vol. 37, no. 2, pp. 97-105, 2024. 77. N. d. L. M. a. B. R. 4. da Costa, “Evaluation of feature selection methods based on artificial neural network weights,” Expert Systems with Applications, vol. 168, p. 114312, 2021. 78. A. L. E. a. L. R. Hill, “A Novel Gradient Feature Importance Method for Neural Networks: An Application to Controller Gain Tuning for Mobile Robots,” in In International Conference on Informatics in Control, Automation and Robotics. pp. 124-141, Cham: Springer International Publishing, 2020. 79. C. W. a. D. F. C. Zobel, “Evaluation of neural network variable influence measures for process control,” Engineering applications of artificial intelligence, vol. 24, no. 5, pp. 803- 812, 2011. 80. D. G. Garson, Interpreting neural network connection weights, pp. 47-51, 1991, pp. 47-51. 81. Y. T. G. a. G. S. Yoon, “Integrating artificial neural networks with rule-based expert systems,” Decision Support Systems, vol. 11, no. 5, pp. 497-507, 1994. 82. S.-H. Y.-C. C. a. C.-H. H. Tsaur, “Determinants of guest loyalty to international tourist hotels—a neural network approach,” Tourism Management, vol. 23, no. 4, pp. 397-405, 104 2002. 83. W. B. Mandler H, “Feature importance in neural networks as a means of interpretation for data-driven turbulence models,” Computers & Fluids, vol. 265, p. 105993, 2023. 84. A. T. Goh, “Back-propagation neural networks for modeling complex systems,” Artificial intelligence in engineering, vol. 9, no. 3, pp. 143-151, 1995. 85. H.-F. a. S.-H. T. Luoh, “The effects of age stereotypes on tour leader roles,” Journal of Travel Research, vol. 53, no. 1, pp. 111-123, 2014. 86. J. A. S. a. R. A. K. Iqbal, “The relative importance of textual indexes in predicting the future performance of banks: A connection weight approach,” Borsa Istanbul Review, vol. 23, no. 1, pp. 240-253, 2023. 87. K. a. L. C. Fukumizu, “Gradient-based kernel method for feature extraction and variable selection,” Advances in neural information processing systems, vol. 25, 2012. 88. A. L. T. O. S. a. T. L. Altmann, “Permutation importance: a corrected feature importance measure,” Bioinformatics, vol. 26, no. 10, pp. 1340-1347, 2010. 89. Z. Li, “Extracting spatial effects from machine learning model using local interpretation method: An example of SHAP and XGBoost,” Computers, Environment and Urban Systems, vol. 96, p. 101845, 2022. 90. G. H. J. a. J. C. Jeon, “Distilled gradient aggregation: Purify features for input attribution in the deep neural network,” Advances in Neural Information Processing Systems, vol. 35, pp. 26478-26491, 2022. 91. M. a. A. M. A. Azmat, “Feature Importance Estimation Using Gradient Based Method for Multimodal Fused Neural Networks,” in IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC), 2022. 92. H. a. B. W. Mandler, “Feature importance in neural networks as a means of interpretation for data-driven turbulence models,” Computers & Fluids, vol. 265, p. 105993, 2023. 93. G. A. L. M. S. a. D. S. Van den Broeck, “On the tractability of SHAP explanations,” Journal of Artificial Intelligence Research, vol. 74, pp. 851-86, 2022. 94. Z. Li, “Extracting spatial effects from machine learning model using local interpretation method: An example of SHAP and XGBoost,” Computers, Environment and Urban Systems, vol. 96, p. 101845, 2022. 95. E. a. I. K. Štrumbelj, “Explaining prediction models and individual predictions with feature contributions,” Knowledge and information systems, vol. 41, pp. 647-665, 2014. 96. H. Akaike, “Information theory and an extension of the maximum likelihood principle,” Springer New York, New York, NY, 1998. 97. “Hebb's rule,” Online.. Available: https://search.brave.com/search?q=Hebb%27s+rule&source=desktop&summary=1&sum 105 mary_og=7726c1f501903b41a133a1. 98. G. Schwarz, “Estimating the dimension of a model,” The annals of statistics, vol. 6, no. 2, pp. 461-464, 1978. 99. G. Schwarz, “Estimating the dimension of a model,” The annals of statistics, vol. 6, no. 2, pp. 461-464, 1978. 100. Y. J. D. a. S. S. LeCun, “Optimal brain damage,” Advances in neural information processing systems, vol. 2, 1989. 101. J. Močkus, “On Bayesian methods for seeking the extremum,” In Optimization techniques IFIP technical conference: Novosibirsk, Springer Berlin Heidelberg, pp. 400-404, 1974. 102. J. a. J. M. Mockus, “The Bayesian approach to local optimization,” Springer Netherlands, 1989. 103. W. Koehrsen, “A Conceptual Explanation of Bayesian Hyperparameter Optimization for Machine Available: 24 https://towardsdatascience.com/a-conceptual-explanation-of-bayesian-model-based- hyperparameter-optimization-for-machine-learning-b8172278050f. Learning,” Medium, Online.. 2018. June 104. P. I. Frazier, “A tutorial on Bayesian optimization,” arXiv preprint arXiv, vol. 1807.02811, 2018. 105. E. E. v. d. H. a. J. R. Wit, “'All models are wrong...’: an introduction to model uncertainty,” Statistica Neerlandica, vol. 66, no. 3, pp. 217-236, 2012. 106. G. a. N. L. H. Claeskens, Model selection and model averaging, Cambridge books, 2008. 107. MichaelWeißandMarkusGo¨ker, “The Yeasts (Fifth Edition),” in Molecular Phylogenetic Reconstruction, Elsevier Science, 2011, pp. 159-174. 108. “Akaike information criterion,” Wikimedia Foundation, Inc., 25 May 2024. Online.. Available: https://en.wikipedia.org/wiki/Akaike_information_criterion. 109. C. N. B. H. F. Emad A. Mohammed, “Emerging Business Intelligence Framework for a Clinical Laboratory Through Big Data Analytics,” in Emerging Trends in Computational Biology, Bioinformatics, and Systems Biology, 2015, pp. 577-602. 110. P. a. Y. S. Stoica, “Model-order selection: a review of information criterion rules,” IEEE Signal Processing Magazine, vol. 21, no. 4, pp. 36-47, 2004. 111. K. P. a. D. R. A. e. Burnham, Model selection and multimodel inference: a practical information-theoretic approach, New York, NY: Springer New York (2nd edition), 2002. 112. H. Akaike, “A new look at the statistical model identification,” IEEE transactions on automatic control, vol. 19, no. 6, pp. 716-723, 1974. 113. P. I. Frazier, “A tutorial on Bayesian optimization,” arXiv preprint arXiv:1807.02811., 2018. 106 114. “Hebbian theory,” Wikimedia Foundation, Inc., 16 May 2024. Online.. Available: https://en.wikipedia.org/wiki/Hebbian_theory. 115. Y. E. K. L. G. K. V. a. I. V. 2. “. m. f. p. c. s. o. p. c. Pyr’yev, Wood and Fiber Science, vol. 54, no. 1. 116. F. B. L. E. Gu J, “A Comparative Analysis of Artificial Neural Network (ANN) Architectures for Box Compression Strength Estimation,” Korean J Packag Sci Technol, vol. 29, no. 3, pp. 163-74, 2023. 117. R. C. J. W. G. a. J. R. W. McKee, “Compression strength formula for corrugated boxes,” Paperboard Packag , vol. 48, no. 8, pp. 149-159, 1963. 118. Y. E. K. L. G. K. V. a. I. V. Pyr’yev, “Empirical models for prediction compression strength of paperboard carton,” Wood and Fiber Science, vol. 54, no. 1, 2022. 119. T. A. A. C. C. B. T. a. O. U. Fadiji, “Application of finite element analysis to predict the mechanical strength of ventilated corrugated paperboard packaging for handling fresh produce,” Biosystems Engineering, vol. 174, pp. 260-281, 2018. 120. J. J. Shi, “Reducing prediction error by transforming input data for neural networks,” Journal of computing in civil engineering, vol. 14, no. 2, pp. 109-116, 2000. 121. S.-A. S. G. D. A. C. F. S. O. W. a. T. A. M. Rebuffi, “Data augmentation can improve robustness,” Advances in Neural Information Processing Systems, vol. 34, pp. 29935- 29948, 2021. 122. L. Z. J. P. N. C. L. D. M. A. N. a. N. S. Khan, “Data augmentation to improve performance of neural networks for failure management in optical networks,” Journal of Optical Communications and Networking, vol. 15, pp. 57-67, 2023. 123. A. a. J. H. Krogh, “A simple weight decay can improve generalization,” Advances in neural information processing systems, vol. 4, 1991. 124. N. Srivastava, “Improving neural networks with dropout,” University of Toronto, Toronto, 2013. 125. J. S. C. a. H. J. Park, “Numerical prediction of equivalent mechanical properties of corrugated paperboard by 3D finite element analysis,” Applied Sciences, vol. 10, no. 22, p. 7973, 2020. 126. J.-M. T.-Y. P. a. H.-M. J. Park, “Prediction of Deflection Due to Multistage Loading of a Corrugated Package,” Applied Sciences, vol. 13, no. 7, p. 4236, 2023. 127. E. a. L. H. Molina, “Development of a Gaussian Process Model as a Surrogate to Study Load Bridging Performance in Racked Pallets,” Applied Sciences, vol. 11, no. 24, p. 11865, 2021. 128. R. C. J. W. B. S. P. R. &. S. M. Haj-Ali, “Refined nonlinear finite element models for corrugated fiberboards,” Composite Structures, vol. 87, no. 4, pp. 321-333, 2009. 107 129. G. J. S. Y. Z. D. &. X. Y. Hua, “Experimental and numerical analysis of the edge effect for corrugated and honeycomb fiberboard,” Strength of Materials, vol. 49, no. 1, pp. 188-197, 2017. 130. T. T. G. a. J. G. Garbowski, “Estimation of the compressive strength of corrugated cardboard boxes with various perforations,” Energies, vol. 14, no. 4, p. 1095, 2021. 131. T. T. G. a. J. G. Garbowski, “Estimation of the compressive strength of corrugated cardboard boxes with various openings,” Energies, vol. 14, no. 1, p. 155, 2020. 132. M. a. C. B. Biancolini, “Numerical and experimental investigation of the strength of corrugated board packages,” Packaging Technology and Science, vol. 16, no. 2, pp. 47-60, 2003. 133. J. a. J. P. Han, “Finite element analysis of vent/hand hole designs for corrugated fibreboard boxes,” Packaging Technology and Science, vol. 20, no. 1, pp. 39-47, 2007. 134. T. G. T. M. D. &. J. R. Garbowski, “Crushing of single-walled corrugated board during converting: Experimental and numerical study,” Energies, vol. 14, no. 11, p. 3203, 2021. 135. G. S. P. N. M. &. Ö. S. Marin, “Experimental and finite element simulated box compression tests on paperboard packages at different moisture levels,” Packaging Technology and Science, vol. 34, no. 4, pp. 229-243, 2021. 136. T. Kobayashi, “Numerical Simulation for Compressive Strength of Corrugated Fiberboard Box,” JAPAN TAPPI JOURNAL, vol. 73, no. 8, pp. 793-800, 2019. 137. S. Kleene, “Representation of events in nerve nets and finite automata,” Automata studies, vol. 34, pp. 3-41, 1956. 138. V. Mitić, “Benefits of artificial intelligence and machine learning in marketing,” in Sinteza 2019-International scientific conference on information technology and data related research, Singidunum University, 2019. 139. A. a. B. S. Sun, “How can Big Data and machine learning benefit environment and water management: a survey of methods, applications, and future directions,” Environmental Research Letters, vol. 14, no. 7, p. 073001, 2019. 140. Y. Pi, “Machine learning in governments: Benefits, challenges and future directions,” JeDEM-eJournal of eDemocracy and Open Government, vol. 13, no. 1, pp. 203-219, 2021. 141. S. Kalogirou, “Artificial neural networks in renewable energy systems applications: a review,” Renewable and sustainable energy reviews, vol. 5, no. 4, pp. 373-401, 2001. 142. B. L. İ. M. O. O. S. G. A. N. S. M. &. S. B. Aylak, “Application of machine learning methods for pallet loading problem,” Applied Sciences, vol. 11, no. 18, p. 8304, 2021. 143. K.-P. A. G. J. Garbowski T, “Estimation of the Edge Crush Resistance of Corrugated Board Using Artificial Intelligence,” Materials, vol. 16, p. 1631, 2023. 144. T. Jacob, “Vanishing Gradient Problem: Causes, Consequences, and Solutions,” 15 June 108 2023. Online.. Available: https://www.kdnuggets.com/2022/02/vanishing-gradient- problem.html#:~:text=When%20there%20are%20more%20layers,this%20the%20vanishi ng%20gradient%20problem.. 145. “Tanh Activation,” Online.. Available: https://paperswithcode.com/method/tanh- activation#:~:text=Tanh%20Activation%20is%20an%20activation,for%20multi%2Dlaye r%20neural%20networks. 146. S. Sharma, “Activation Function in Neural Networks,” Towards Data Scinece, 6 September 2017. Online.. Available: https://towardsdatascience.com/activation-functions-neural- networks-1cbd9f8d91d6. 147. “Vanishing gradient problem,” Wikimedia Foundation, Inc, Online.. Available: https://en.wikipedia.org/wiki/Vanishing_gradient_problem. 148. M. J. G. S. a. A. W. Roodschild, “A new approach for the vanishing gradient problem on sigmoid activation,” Progress in Artificial Intelligence, vol. 9, no. 4, pp. 351-360, 2020. 149. “Multi-Class Neural Networks: Softmax,” Online.. Available: https://developers.google.com/machine-learning/crash-course/multi-class-neural- networks/softmax. 150. B. P. C, “Softmax Activation Function: Everything You Need to Know,” 30 June 2023. Online.. Available: https://www.pinecone.io/learn/softmax-activation/. 151. T. Wood, “Softmax Function,” Online.. Available: https://deepai.org/machine-learning- glossary-and-terms/softmax-layer. 152. S. Polamuri, “DIFFERENCE BETWEEN SOFTMAX FUNCTION AND SIGMOID FUNCTION,” Online.. Available: https://dataaspirant.com/difference-between-softmax- function-and-sigmoid-function/. 153. S. Shah, “Cost Function is No Rocket Science!,” 20 March 2024. Online.. Available: https://www.analyticsvidhya.com/blog/2021/02/cost-function-is-no-rocket-science/. 154. “Cost Function in Machine Learning,” Online.. Available: https://www.javatpoint.com/cost-function-in-machine-learning. 155. MILK, “Dummies guide to Cost Functions in Machine Learning with Animation.,” 14 November 2020. Online.. Available: https://machinelearningknowledge.ai/cost-functions- in-machine-learning/. 156. A. A. &. N. L. A. Faisal, “Simulation of ammonia nitrogen removal from simulated wastewater by sorption onto waste foundry sand using artificial neural network,” Association of Arab Universities Journal of Engineering Sciences, vol. 26, no. 1, pp. 28- 34, 2019. 157. N.-T. a. K.-U. D. Vu, “Prediction of ammonium removal by biochar produced from agricultural wastes using artificial neural networks: Prospects and bottlenecks.,” Soft Computing Techniques in Solid Waste and Wastewater Management, pp. 455-467, 2021. 109 158. Y.-S. a. S. L. Park, “Artificial neural networks: Multilayer perceptron for ecological modeling.,” Developments in environmental modelling, vol. 28, pp. 123-140, 2016. 159. E. V. A. K. S. A. K. V. S. S. V. P. K. K. T. B. &. P. A. Antunes, “Application of biochar for emerging contaminant mitigation,” Advances in Chemical Pollution, Environmental Management and Protection, vol. 7, pp. 65-91, 2021. 160. H. Y. R. R. A. &. N. P. A. Kang, “Artificial neural network modeling of phytoplankton blooms and its application to sampling sites within the same estuary,” Elsevier, pp. 161- 172, 2011. 161. M. M. S. H. B. I. M. a. M. I. H. S. Hussain, “Application of different artificial neural network for streamflow forecasting,” Advances in streamflow forecasting, pp. 149-170, 2021. 162. M. Z. F. B. D. P. N. a. K. K. Mohseni-Dargah, “Machine learning in surface plasmon resonance for environmental monitoring,” In Artificial intelligence and data science in environmental sensing, pp. 269-298, 2022. 163. Z. R. H. a. W. H. Zhang, “Application of Artificial Neural Network Algorithm in Facial Biological Image Information Scanning and Recognition,” Contrast Media & Molecular Imaging, 2022. 164. M. R. a. R. Z. Jabłońska, “Artificial neural networks for predicting social comparison effects among female Instagram users,” PloS one, vol. 15, no. 2, 2020. 165. L. S. N. P. a. P. L. N. M. Berke, “Optimum design of aerospace structural components using neural networks.,” Computers & structures, vol. 48, no. 6, pp. 1001-1010, 1993. 166. S. K. K. D. J. R. D. V. B. G. a. T. R. N. Paul, “Application of artificial neural networks in aircraft maintenance, repair and overhaul solutions,” arXiv preprint arXiv, vol. 1001, no. 3741, 2010. 167. Y. Y. L. a. Y. W. Li, “New Algorithm of Traditional Chinese Medicine and Protection of Intangible Cultural Heritage Based on Big Data Deep Learning,” BioMed Research International, 2022. 168. G. G. S. V. N. V. K. a. T. S. Gopichand, “Digital signature verification using artificial neural networks,” Int J Recent Technol Eng (IJRTE) Blue Eyes Intell Eng, vol. 7, p. 552, 2019. 169. M. Kumar, “Signature verification using neural network,” International Journal on Computer Science and Engineering, vol. 4, no. 9, p. 1498, 2012. 170. A. B. D. a. S. B. Karouni, “Offline signature recognition using neural networks approach,” Procedia Computer Science, vol. 3, pp. 155-161, 2011. 171. K. M. P. S. S. G. a. A. A. Abhishek, “Weather forecasting model using artificial neural network,” Procedia Technology, vol. 4, pp. 311-318, 2012. 172. D. N. a. D. K. S. Fente, “Weather forecasting using artificial neural network,” second 110 international conference on inventive communication and computational technologies (ICICCT), pp. 1757-1761, 2018. 173. M. a. Z. M. Hayati, “Application of artificial neural networks for temperature forecasting,” International Journal of Electrical and Computer Engineering, vol. 1, no. 4, pp. 662-666, 2007. 174. K. D. B. C. C. J. A. T. F. C. R. P. C. C. A. A. A. B. S. a. H. E. Choudhary, “Recent advances and applications of deep learning methods in materials science,” npj Computational Materials, vol. 8, no. 1, p. 59, 2022. 175. T. a. J. C. G. Xie, “Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties,” Physical review letters, vol. 120, no. 14, p. 145301, 2018. 176. T. a. J. C. G. Xie, “Hierarchical visualization of materials space with graph convolutional neural networks,” chemical physics, vol. 149, no. 17, 2018. 177. L. A. A. A. C. a. C. W. Ward, “A general-purpose machine learning framework for predicting properties of inorganic materials,” npj Computational Materials, vol. 2, no. 1, pp. 1-7, 2016. 178. W. B. J. C. J. J. K. S. S. P. S. M. P. N. S. a. K.-S. S. Park, “Classification of crystal structure using a convolutional neural network,” IUCrJ, vol. 4, no. 4, pp. 486-494, 2017. 179. M. Hellenbrandt, “The inorganic crystal structure database (ICSD)—present and future,” Crystallography Reviews, vol. 10, no. 1, pp. 17-22, 2004. 180. V. G. H. P. G. a. B. G. S. Fung, “Machine learned features from density of states for accurate adsorption energy prediction,” Nature communications, vol. 12, no. 1, p. 88, 2021. 181. Y. E. K. L. G. K. V. a. I. V. Pyr’yev, “Empirical models for prediction compression strength of paperboard carton,” Wood and Fiber Science, vol. 54, no. 1, 2022. 182. P. R. B. P. Fehér L, “Compression strength estimation of corrugated board boxes for a reduction in sidewall surface cutouts—experimental and numerical approaches,” Materials, vol. 16, no. 2, p. 597, 2023. 183. S. P. R. a. D. K. Malasri, “Predicting corrugated box compression strength using an artificial neural network,” International Journal, vol. 1, pp. 169-176, 2016. 184. S. a. A. C. Chakravorty, “Hidden layer optimization of neural network using computational technique,” in the International Conference on Advances in Computing, Communication and Control, 2009. 185. “Bayesian optimization,” Wikimedia Foundation, Inc., 13 May 2024. Online.. Available: https://en.wikipedia.org/wiki/Bayesian_optimization. 186. “Hebbian theory,” Wikimedia Foundation, Inc., 16 May 2024. Online.. Available: https://en.wikipedia.org/wiki/Hebbian_theory. 111 187. Y. J. D. a. S. S. LeCun, “Optimal brain damage,” Advances in neural information processing systems 2, vol. 2, 1989. 188. P. a. Y. S. Stoica, “Model-order selection: a review of information criterion rules,” IEEE Signal Processing Magazine, vol. 21, no. 4, pp. 36-47, 2004. 189. R. McElreath, Statistical Rethinking: A Bayesian Course with Examples in R and Stan, CRC Press, 2016, p. 189. 190. M. Taddy, in Business Data Science: Combining Machine Learning and Economics to Optimize, Automate, and Accelerate Business Decisions, New York, McGraw-Hill, 2019, p. 90. 191. H. Akaike, “Information theory and an extension of the maximum likelihood principle,” New York, NY, Springer New York, 1998, pp. 199-213. 192. A. H., “Information theory and an extension of the maximum likelihood principle,” Proc: 2nd Int symp information theory. Budapest, p. 267–81, 1973. 193. J. Močkus, “On bayesian methods for seeking the extremum,” Optimization Techniques IFIP Technical Conference Novosibirsk, vol. 27, p. 400–404, 1975. 194. W. Koehrsen, “A Conceptual Explanation of Bayesian Hyperparameter Optimization for Machine Available: 24 https://towardsdatascience.com/a-conceptual-explanation-of-bayesian-model-based- hyperparameter-optimization-for-machine-learning-b8172278050f. Learning,” Medium, Online.. 2018. June 195. W. a. D. R. N. Li, “Marker selection by Akaike information criterion and Bayesian information criterion,” Genetic Epidemiology, vol. 21, no. S1, pp. S272-S277, 2001. 196. H. a. M. H. Wang, “Supervised Hebb rule based feature selection for text classification,” Information Processing & Management, vol. 56, no. 1, pp. 167-191, 2019. 197. D. X. X. a. F. P. Lin, “Bayesian Information Criterion Based Feature Filtering for the Fusion of Multiple Features in High‐Spatial‐Resolution Satellite Scene Classification,” Journal of Sensors 2015, p. 142612, 2015. 198. Y. a. N. S. Mate, “Hybrid feature selection and Bayesian optimization with machine learning for breast cancer prediction,” In 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), vol. 1, pp. 612-619, 2021. 199. C. Z. Z. a. D. W. Liu, “Pruning deep neural networks by optimal brain damage,” Interspeech, pp. 1092-1095, 2014. 200. T. a. R. A. S. Maheshwari, “Study of the Effect of Squareness of the Corrugated Box on its Box Compression Strength,” Int. J. Latest Technol. Eng. Manag. Appl. Sci, vol. 6, pp. 26- 28, 2017. 201. L. J. L. G. C. a. H. M. Zhang, “Research on Packaging Evaluation System of Fast Moving Consumer Goods Based on Analytical Hierarchy Process Method,” in Advanced Graphic Communications and Media Technologies, Springer Singapore, 2017, pp. 711-718. 112 202. S. S. S. M. R. M. S. H. a. S. S. I. Beg, “Application of design of experiments (DoE) in pharmaceutical product and process optimization,” Pharmaceutical quality by design, pp. 43-64, 2019. 203. J. a. T. M. Hron, “Application of design of experiments to welding process of food packaging,” Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis, vol. 61, no. 4, pp. 909-915, 2013. 204. T. C. J. C. T. M. B. A. A. a. U. L. O. Fadiji, “The efficacy of finite element analysis (FEA) as a design tool for food packaging: A review,” Biosystems Engineering, vol. 174, pp. 20- 40, 2018. 205. B. J. G. M. a. D. S. Hicks, “A finite element‐based approach for whole‐system simulation of packaging systems for their improved design and operation,” Packaging Technology and Science, vol. 22, no. 4, 2009. 206. J. M. P. D. S. C. H. M. J. a. S. W. H. Park, “Finite element-based simulation for edgewise compression behavior of corrugated paperboard for packaging of agricultural products,” Applied Sciences, vol. 10, no. 9, p. 6716, 2020. 207. H. N. J. M. T. J. Nygårds M, “A finite element model for simulations of creasing and folding of paperboard,” in Abaqus Users, 2005. 208. N. L. M. D. d. L. a. R. B. da Costa, “Evaluation of feature selection methods based on artificial neural network weights,” Expert Systems with Applications, vol. 168, p. 114312, 2021. 209. B. a. A. S. Iooss, “Introduction to sensitivity analysis,” in Handbook of uncertainty quantification, 2017, pp. 1103-1122. 210. A. E. L. a. R. L. Hill, “A Novel Gradient Feature Importance Method for Neural Networks: An Application to Controller Gain Tuning for Mobile Robots,” in International Conference on Informatics in Control, Automation and Robotics, 2020. 211. S. M. a. S.-I. L. Lundberg, “A unified approach to interpreting model predictions,” Advances in neural information processing systems, vol. 30, 2017. 212. C. W. a. D. F. C. Zobel, “Evaluation of neural network variable influence measures for process control,” Engineering applications of artificial intelligence, vol. 24, no. 5, pp. 803- 812, 2011. 213. D. G. Garson, “Interpreting neural network connection weights,” pp. 47-51, 1991. 214. K. a. C. L. Fukumizu, “Gradient-based kernel method for feature extraction and variable selection,” Advances in neural information processing systems, vol. 25, 2012. 215. A. L. T. O. S. a. T. L. Altmann, “Permutation importance: a corrected feature importance measure,” Bioinformatics, vol. 26, no. 10, pp. 1340-1347, 2010. 216. T. a. F. B. Urbanik, “ Box compression analysis of world-wide data spanning 46 years,” Wood and fiber science, pp. 399-416, 2006. 113 217. S. P. R. a. D. K. Malasri, “Predicting corrugated box compression strength using an artificial neural network,” International Journal, vol. 4, no. 1, pp. 169-176, 2016. 218. A. J. V. D. M. A. L. a. M. C. Pérez, “An Analytical Hierarchy Approach Applied in the Packaging Supply Chain,” in upply Chain Management and Logistics in Emerging Markets, Emerald Publishing Limited, 2020, pp. 89-104. 219. S. S. S. M. R. M. S. H. a. S. S. I. Beg, “Application of design of experiments (DoE) in pharmaceutical product and process optimization,” Pharmaceutical quality by design, Vols. 43-64, 2019. 220. J. a. T. M. Hron, “Application of design of experiments to welding process of food packaging,” Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis, vol. 61, no. 4, pp. 909-915, 2013. 114