BEYOND FINITE ELEMENT: PHYSICS INFORMED NEURAL NETWORK FOR STRESS
PREDICTION

By

Hamed Bolandi

A DISSERTATION

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

Civil Engineering—Doctor of Philosophy
Computer Science—Dual Major

2023

ABSTRACT

This Multidisciplinary research proposes deep neural networks to bypass the Finite Element

Analysis (FEA) and predict high-resolution stress distributions on loaded steel plates with variable

loading, geometries, and boundary conditions. FEA for structures has been broadly used to conduct

stress analysis of various civil and mechanical engineering structures. Conventional methods,

such as FEA, provide high-fidelity solutions but require solving large linear systems that can be

computationally intensive. The existing workflow for FEM applications includes: (i) modeling

the geometry and its components, (ii) specifying material properties, boundary conditions, and

loading, (iii) Applying mesh strategy, and (iv) stress analysis which may be time-consuming based

on the complexity of the model. Instead, Deep learning (DL) techniques can generate solutions

significantly faster than conventional run-time analysis. This can prove extremely valuable in real-

time structural assessment applications. In this work, The Convolutional Neural network (CNN)

was designed and trained to use the geometry, boundary conditions, and static load as input to

predict the stress contours in intact steel plates. Furthermore, we predict high-resolution stress

distributions on damaged steel plates using CNNs augmented with custom loss functions that use

physics rules to bypass the need for Finite Element Analysis. We embedded physics constraints into

the loss function to enforce the model training, precisely capturing stress concentrations around

the tips of various structural damage configurations. The proposed technique’s performance was

compared to Finite-Element simulations using partial differential equation (PDE) solvers. Neuro-

DynaStress is also proposed to predict the entire sequence of stress distribution based on Finite

Element simulations using a partial differential equation (PDE) solver. More specifically, CNN,

along with the multi-head attention transformer and feature alignment, is used to extract features and

capture the data’s temporal dependence. The model was designed and trained to use the geometry,

boundary conditions, and sequence of loads as input and predict the sequences of high-resolution

von Mises stress contours. Moreover, to increase the accuracy of dynamic stress prediction, we

propose a Physics Informed Neural Network (PINN). The PINN-Stress model can predict the entire

sequence of stress distribution based on finite element simulations using a PDE solver. In order

to force our model to learn the physical constraints, we minimize the violation of the equation of

motion and also minimize the boundary condition violation to fully enforce the underlying PDE.

The PINN-Stress model can predict the sequence of normal and shear stress distribution in almost

real-time and can generalize better than the model without PINN. Our model is also able to predict

von Mises stress using the von Mises equation.

Copyright by
HAMED BOLANDI
2023

ACKNOWLEDGMENTS

      Words  cannot  express  my  gratitude  to  my  advisors,  Dr.  Nizar  Lajnef  and  Dr.  Vishnu 

Boddeti,  for  their  invaluable  patience  and  feedback.  I  could  not  have  undertaken  this  journey 

without them, who generously  provided  knowledge  and  expertise.  Additionally,  this  endeavor 

would  not  have been possible without the support of my committee, Dr.  Wolfgang Banzhaf and 

Dr.  Weiyi Lu.

I  am  also  grateful  to  my  friends  and  lab  mates,  Xuyang  Li,  and  Bashir  Sadeghi,  for  their 

valuable  comments  and  especially  to  Gautam  Sreekumar,  for  his  valuable  guidance  and  help.  I 

would like to thank Laura Post for making the paperwork process so easy and for her help with the 

administrative process.

Last but not least,  I would be remiss in not mentioning my family;  thanks to my wife Samane 

for her understanding and love during the past few years.  Her support and encouragement were in 

the end what made this dissertation possible.  My parents,  Alireza and Zohre,  receive my deepest 

gratitude  and  love  for  their  dedication  and  the  many  years  of  support  during  my  undergraduate 

studies that provided the foundation for this work.

v

TABLE OF CONTENTS

CHAPTER 1

.

.

.

.

.

.

.
INTRODUCTION .
1.1 Background and State of Knowledge . .
1.2 Research Objectives . . . .
1.3 Research Significance . . .
1.4 Contributions . . . . . .
.
1.5 Chapter Overview . . . . .

.
.
. .
. .
. .
. .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

1
1
3
4
5
6

CHAPTER 2

HIGH-RESOLUTION STATIC STRESS DISTRIBUTION PREDICTION
.
IN INTACT STRUCTURAL COMPONENTS .
8
8
.
.
.
. 16
.
. 17
.

.
2.1 Methodology . . . . . .
2.2 Loss function and performance metrics . .
. .
2.3 Results and discussion .

.
.
.
.
.
.
. .

.
.
.
.
.
.
. .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

. .

.
.
.

.
.
.

.
.
.

.
.
.

.
.
.

.
.
.

.

.

.

.

.

.

.

.

.

.

.

.

CHAPTER 3

HIGH-RESOLUTION STATIC STRESS DISTRIBUTION PREDICTION
IN DAMAGED STRUCTURAL COMPONENTS .
.
.
.
.
.
.
.

3.1 Predicting physical responses .
. .
.
.
3.2 Methodology . . . . . .
3.3 Loss function and performance metrics . .
. .
3.4
.
. .
3.5 Results and discussions .

Implementation details

.
. .

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.

.
.

.
.

.
.

.
.

.
.

.
.

.
.

.
.

.

.

. 24
. 24
. 24
. 33
. 35
. 36

CHAPTER 4

NEURO-DYNASTRESS: PREDICTING DYNAMIC STRESS 
DISTRIBUTIONS IN STRUCTURAL COMPONENTS 
.
.
.
.
.
.
.
.
.

.
4.1 Need for fast dynamic analysis .
. .
.
4.2 Methods . . . . . . . . .
4.3 Proposed Methodology .
.
.
4.4 Loss Function and Performance Metrics .
4.5
4.6 Results and Discussions .

.
.
.
.
Implementation and Computational Performance
.

.    .  .  .  .  .  .  . 45
. 45
.
.
.
.
. 46
.
.
.
.
. 51
.
.
.
.
. 52
.
.
.
.
. 53
.
.
.
.
. 53
.
.
.
.

. .
.
. .

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

. .

.
.
.

.
.

.
.

.
.

.

.

.

.

.

.

.

.

.

.

CHAPTER 5

.

.

.

PHYSICS INFORMED NEURAL NETWORK FOR DYNAMIC STRESS
.
.
.
PREDICTION . .
.
.
5.1 Physics Informed Neural Network .
.
.
.
5.2 Background . . . . . . .
.
. .
.
5.3 Method . . . . . . . . .
.
. .
.
5.4 Experiments and Results
.
. .
.
5.5 Ablation Studies . . . . .

.
.
. .
. .
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.

.

. 60
. 60
. 64
. 65
. 67
. 73

CHAPTER 6

SUMMARY AND CONCLUSION .

CHAPTER 7

FUTURE WORKS .

BIBLIOGRAPHY . . . . . . . . . . .

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

. 79

. 82

. 84

vi

CHAPTER 1

INTRODUCTION

1.1 Background and State of Knowledge

Stress analysis is an essential part of engineering and design. The development of various design

systems continuously imposes higher demands on computational costs while preserving accuracy.

Numerical analysis methods, such as structural Finite Element Analysis (FEA), are typically used to

conduct stress analysis of various structures. Researchers commonly use FEA methods to evaluate

the design, safety, and maintenance of different structures in various fields, including aerospace,

automotive, architecture, civil, and structural systems. The current workflow for FEA applications

includes: a) modeling geometry and its components, which can be time-consuming based on the

system complexity; b) specifying material properties, boundary conditions, and loading; c) applying

a meshing strategy for geometry; d) analyzing, which complexity of all previous steps determines

how long does it take. The time consumption and complexity of current FEA workflows make it

impractical in real-time or near real-time applications, such as in the aftermath of a disaster or during

extreme disruptive events that require immediate corrections to avoid catastrophic failures. Based

on the steps of FEA described above, performing a complete stress analysis with conventional FEM

has high computational costs. To resolve this issue, we propose Deep Learning (DL) methods to

construct deep neural networks (DNN), which, once trained, allow to bypass FEA. These methods

may enable real-time stress analysis by leveraging machine learning (ML) algorithms. DNNs can

model complicated, nonlinear relationships between input and output data. Thus, these models

help us acquire adequate knowledge for predictions of unseen problems.

Data-driven approaches that model physical phenomena have been lauded for their significant and

growing successes. Most recent works have included design, and topology optimization [1, 2],

data-driven approaches in fluid dynamics

[3, 4], molecular dynamics simulation [5, 6], and

material properties prediction [7, 8, 9, 10]. Atalla et al. [11] and Levin et al. [12] have used neural

1

regression for FEA model updating. Recently, DL has shown promise in solving conventional

mechanics problems. Some researchers used DL for structural damage detection, a promising

alternative to conventional structural health monitoring methods [13, 14]. Javadi et al. [15] used a

typical neural network in FEA as a surrogate for the traditional constitutive material model. They

simplified the geometry into a feature vector which approaches hard to generalize complicated

cases. The numerical quadrature of the element stiffness matrix in the FEA on a per-element basis

was optimized by Oishi et al. [16] using deep learning. Their approach helps to accelerate the

calculation of the element stiffness matrix. Convolutional Neural Networks (CNN) are commonly

used in tasks involving 2D information due to the design of their architecture. Recently, Madani

et al. [17] developed a CNN architecture for stress prediction of arterial walls in arteriosclerosis.

Also, Liang et al. [18] proposed a CNN model for aortic wall stress prediction. Their method is

expected to allow real-time stress analysis of human organs for a wide range of clinical applications.

Gulgec et al. [19] proposed a CNN architecture to classify simulated damaged and intact samples

and localize the damage in steel gusset plates. Modares et al. [20] conducted a study on composite

materials to identify the presence and type of structural damage using convolutional neural networks.

Also, for detecting concrete cracks without calculating the defect features, Cha et al. [21] proposed

a vision-based method based on convolutional neural networks (CNNs). Do et al. [22] proposed a

method for forecasting the crack propagation in risk assessment of engineering structures based on

“long short-term memory" and “multi-layer neural network". Truong et al. [23] proposed a deep

forward neural network method to detect the location and severity of damaged elements in the bar

planar truss and bar dome-like space truss using the noisy incomplete modal dat. Lieu et al. [24]

presented a deep neural network-based adaptive surrogate model for structural reliability analysis.

Zhuang et al. [25] developed a technique for bending, vibration, and buckling analysis of Kirchhoff

plates based on deep autoencoders. Samaniego et al. [26] proposed a deep neural network to solve

boundary value problems. They used relevant examples, from computational mechanics, using

DNNs to build the approximation space. A deep feed-forward artificial neural network has been

developed by Berg et al. [27] to approximate partial differential equations with complex geometry.

2

Truong et al. [28] proposed a method for the safety evaluation of steel trusses using the gradient

tree boosting algorithm.

An approach for predicting stress distribution on all layers of non-uniform 3D parts was presented

by Khadilkar et al. [29] More recently, Nie et al. [30] developed a CNN-based method to predict the

low-resolution stress field in a 2D linear cantilever beam. Jiang et al. [31] developed a conditional

generative adversarial network for low-resolution von Mises stress distribution prediction in solid

structures. Some studies have been conducted to develop methods of predicting structural response

using ML models. Dong et al. [32] proposed a support vector machine approach to predict nonlinear

structural responses. Wu et al. [33] Utilized deep convolutional neural networks to estimate the

structural dynamic responses. Long short-term memory (LSTM) [34] was used by Zhang et

al. [35] to predict nonlinear structural response under earthquake loading. Fang et al. [36] proposed

a deep-learning-based structural health monitoring (SHM) framework capable of predicting a dam’s

structural dynamic responses once explosions are experienced using LSTM. Kohar et al. [37] used

3D-CNN-autoencoder and LSTM to predict the force-displacement response and deformation of

the mesh in vehicle crash-worthiness. Schwarzer et al. [38] construct a neural network architecture

that combines a graph convolutional neural network (GCN) with a recurrent neural network (RNN)

to predict fracture propagation in brittle materials. Lazzara et al. [39] proposed a dual-phase

LSTM Auto-encoder-based surrogate model to predict aircraft dynamic landing response over

time. Jahanbakht et al. [40] presented an FEA-inspired DNN using an attention transformer to

predict the sediment distribution in the wide coral reef.

The few models that studied stress predictions suffer from the problem of low-resolution

predictions and ungeneralizability, making them unsuitable for decision-making after a catastrophic

failure. To the best of our knowledge, this is the first work to predict stress distribution in the specific

domain of steel plates with high accuracy and low latency.

1.2 Research Objectives

Based on the steps of FEA described in the last section, performing a complete stress analysis

with conventional FEA has a high computational cost. The main objective of this work is to bypass

3

FEA to provide faster FEA for engineering communities. Our approach in a sense, could be viewed

as a surrogate for FEA software, and it avoids the computations bottlenecks in FEA. In particular, our

model predicts stress distributions and stress concentrations in both static and dynamic models of

the most common gusset plates used in infrastructures such as bridges and buildings. The proposed

deep learning models are established through the integration of AI (Artificial intelligence) and FE

methods. Advanced AI algorithms are developed and evaluated during this research. The main idea

here is to train a generalized model that can later be used in situations where real-time estimations

are needed, such as in the aftermath of extreme disruptive events. For example, focusing on critical

structural components, there is a need for immediate assessment following a disaster or during

extremely disruptive events to guide corrective actions. Engineers could rely on the proposed

computationally efficient algorithms to determine stress distributions over gusset plates and apply

the proper rehabilitation actions. It is important for them to be able to analyze gusset plates quickly

and accurately, which is exactly what our models can provide.

1.3 Research Significance

We propose deep learning models that can act as a surrogate for FEA solvers for dynamic FEA

while avoiding the computational bottlenecks involved. To demonstrate its utility, we model the

stress distribution in gusset plates under static and dynamic loading. Bridges and buildings rely

heavily on gusset plates as one of their most critical components. Gusset plates are designed to

withstand lateral loads such as earthquakes and winds, which makes fast dynamic models valuable

in avoiding catastrophic failures. The main idea here is to train a model that can later be used

when real-time estimations are needed, such as in the aftermath of extreme disruptive events. For

example, focusing on critical structural components, there is a need for immediate assessment

following a disaster or during extremely disruptive events to guide corrective actions. Engineers

could rely on the proposed computationally efficient algorithms to determine stress distributions

over damaged gusset plates and apply the proper rehabilitation actions. They need to be able to

analyze gusset plates quickly and accurately, which is what our model can provide.

4

1.4 Contributions

1.4.1 Structural Engineering

We have made a major advancement in the field of Structural Engineering. For the first time,

we have developed a deep learning model for the prediction of high-resolution stress distribution in

intact structural components. Our model has achieved a Percentage Mean Absolute Error (PMAE)

of 0.9% and a Percentage Peak Absolute Error (PPAE) of 0.46%. We also proposed a convolutional

neural network (CNN) augmented with the custom loss function which is inspired by the stress

concentration physics equation to predict high-resolution von Mises stress distribution in the specific

domain of damaged steel plates. The custom loss helped to localize damages accurately and capture

stress concentration around crack tips, which is not possible with other ML methods. Moreover, we

built a novel deep neural network equipped with a Convolutional Neural Network (CNN) and Long

Short Term Memory (LSTM) to predict the entire sequence of dynamic stress distribution. The

model can predict dynamic stress distribution with a mean relative percentage error of 2.3%, which

is considered an acceptable error rate in engineering communities. Furthermore, We presented

a new model, called PINN-Stress, that can accurately predict dynamic stress distribution. Our

model utilizes a unique architecture and incorporates a physics-informed loss function, resulting in

a computational speed that is 600,000 times faster than traditional finite element solvers.

1.4.2 Computer Science

Presenting novel custom loss function using dynamic binary masks. The custom loss function

is formulated by taking into account the stress concentration at the crack tips, which is the highest

compared to other locations in the section. The proposed approach is able to localize the cracks

and capture the high-resolution stress concentration at the crack tips in structural components,

even when the crack length and width are too small. Propose a state-of-the-art physics-informed

loss function for the first time that uses a governing equation behind the equation of motion as a

soft constraint to enforce the loss to minimize. This loss function is computationally efficient and

facilitates real-time analysis for dynamic stress distribution prediction. We also propose a novel

5

method to calculate gradients of stresses on a surrogate grid created using kernel density estimation

(KDE). This method helped us to estimate the gradients of stress output along the x and y directions.

1.5 Chapter Overview

The study will be guided by four major tasks for bypassing FEA, by introducing novel deep-

learning models. An overview of each chapter is shown below, delineating the specific objective,

body, and conclusion.

Chapter II deals with developing deep neural networks in the form of convolutional neural

networks (CNN) to bypass the FEA and predict high-resolution static stress distributions on loaded

steel plates with variable geometry, loading, and boundary conditions. The CNN was designed and

trained to use the geometry, boundary conditions, and load as input to predict the stress contours.

The proposed technique’s performance was compared to Finite-Element simulations using a partial

differential equation (PDE) solver.

Chapter III is focused on integrating physics knowledge into a convolutional neural network

(CNN) to boost learning within a feasible solution space in a specific domain. Our proposed method

uses deep neural networks in the form of (CNNs) augmented with custom loss functions that use

physics rules to bypass the need for Finite Element Analysis and predict high-resolution stress

distributions on damaged steel plates with variable loading and boundary conditions. We embedded

physics constraints into the loss function to enforce the model training, precisely capturing stress

concentrations around the tips of various structural damage configurations. The CNN was designed

and trained to use the geometry, boundary conditions, and load as input and predict the stress

contours. The proposed framework’s performance is compared to Finite-Element simulations

using partial differential equation (PDE) solver.

Chapter IV outlines the development of a deep learning model, Neuro-DynaStress, which is

proposed to predict the entire sequence of dynamic stress distribution based on finite element

simulations using a partial differential equation (PDE) solver. The model was designed and trained

to use the geometry, boundary conditions, and sequence of loads as input and predict the sequences

of high-resolution stress contours. The proposed framework’s performance is compared to finite

6

element simulations using a PDE solver. The goal is to train a model that can later be used when

real-time estimations are needed, such as in the aftermath of extreme disruptive events.

Chapter V presents a deep learning model to reduce computational cost while maintaining

accuracy, a Physics Informed Neural Network (PINN), the PINN-Stress model, is proposed to

predict the entire sequence of stress distribution based on Finite Element simulations using a partial

differential equation (PDE) solver. Using automatic differentiation, we embed a PDE into a deep

neural network’s loss function to incorporate information from measurements and PDEs.

Finally, Chapter VI summarizes the work performed under this project, outlines the main

research products developed and presents the main findings of the study. Some directions for future

research are also presented.

7

CHAPTER 2

HIGH-RESOLUTION STATIC STRESS DISTRIBUTION PREDICTION

IN INTACT STRUCTURAL COMPONENTS

2.1 Methodology

Deep Learning (DL) techniques can generate results significantly faster than conventional run-

time analysis. This can prove extremely valuable in real-time structural assessment applications.

Our proposed method uses deep neural networks in the form of convolutional neural networks

(CNN) to bypass the FEA and predict high-resolution static stress distributions on loaded steel

plates with variable loading and boundary conditions. The CNN was designed and trained to use

the geometry, boundary conditions, and load as input to predict the stress contours. An overview

of our approach is shown in Fig. 2.1.

2.1.1 Data generation

Two-dimensional steel plate structures with five edges, E1 to E5 denoting edges 1 to 5, as shown

in Fig. 2.2, are considered to be made of homogeneous and isotropic linear elastic material. The 2D

steel plates have similar geometry to that of gusset plates, as used for connecting beams and columns

to braces in steel structures. The boundary conditions and loading angles simulate conditions that

are similar to those affecting common gusset plate structures under external loading. Analysis

of the behavior of these components is essential since various reports have observed failures of

gusset plates subject to lateral loads [41, 42, 43, 44]. The distributed static loads applied to the

plates in this study range from 1 to 5 kN with intervals of 1 kN. Moreover, loads are applied with

three angles 𝜋/6, 𝜋/4, and 𝜋/3, on either one or two edges of the plate. The load is decomposed

to its horizontal and vertical direction components. The boundary conditions and load cases are

considered to simulate similar conditions in common gusset plate structures under external loading.

Some of the most common gusset plates configurations in practice are shown in Fig. 2.3. Four

boundary conditions are considered, as shown in Fig. 2.4, based on real gusset plates’ boundary

8

Figure 2.1 Overview: Unlike FEM, our proposed model is computationally efficient and facilitates
real-time analysis. The existing workflow for FEM applications includes: (i) modeling the geometry
and its components, (ii) specifying material properties, boundary conditions, meshing, and loading,
(iii) analysis, which may be time-consuming based on the complexity of the model. Our model
takes geometry, boundary condition, and load as input and predicts the static stress distribution.

conditions.

Figure 2.2 Basic schematic topology for initializing the steel plate geometries.

All the translational and rotational displacements are fixed at the boundary conditions. All

input variables used to initialize the population are shown in Table 1. The minimum and maximum

range for the width and height of the plate are from 30 to 60 cm. Various geometries are generated

by changing the position of each node in horizontal and vertical directions, as shown in Fig. 2.2,

9

HorizontalLoadVerticalLoadCNNCNNResNetSE blockLoss = MSEGusset plateLoad30 cm15 cm15 cm5 cm15 cm5 cmE1E3E4E5E215 cmwhich lead to 1024 unique pentagons. The material properties remain unchanged and isotropic for

all samples.

Figure 2.3 Some of the most common gusset plates in practice.

Figure 2.4 Different types of boundary conditions for initializing population.

2.1.1.1

Input Data

The geometry is encoded into a 600 × 600 matrix as a single channel binary image. 0 (black)

and 1 (white) denote the outside and inside of the geometry, as shown in Fig. 2.5(a). The boundary

condition is also represented by another 600 × 600 pixel binary image, where the constrained edges

are defined by 1 (white) (Fig. 2.5 (b)). Moreover, each horizontal and vertical component of the

load is encoded as one 600 × 600-pixel single-channel colored image, as shown in Fig. 2.5(c) and

2.5(d). Each row of Fig. 2.5 represents one of the simulated geometry, boundary conditions, and

load positions as described in table 2.1. The magnitude of the horizontal and vertical components

of the loads, after decomposition, varies between 0.5 kN and 4.33 kN. These loads are normalized

between (100,0,0) and (255,0,0) as RGB colors to create a color image where the colored part

represents the location and magnitude of the load (Figs. 2.5(c) and 2.5(d)).

10

ColumnGusset plateBraceBeam(a)(b)(c)(d)E1E3E4E5E2E1E3E4E5E2E1E3E4E5E2(a)(b)(c)(d)E1E3E4E5E2Figure 2.5 Input and output representation for static stress distribution prediction: (a) geometry, (b)
boundary condition, (c) horizontal load, (d) vertical load, (e) output.

Table 2.1 Input variable

Geometry Boundary conditions Load position Load angle(degree) Load magnitude (kN)

pentagon
pentagon
pentagon
pentagon

E2
E2E3
E1E2
E3

E5,E4,E4E5
E5
E4
E2E5,E1E2E5

30,45,60
30,45,60
30,45,60
30,45,60,90

1,2,3,4,5
1,2,3,4,5
1,2,3,4,5
1,2,3,4

2.1.1.2 Output Data

FEA is performed using the Partial Differential Equation (PDE) solver in the MATLAB toolbox

to obtain the stress distributions of each sample. The MATLAB PDE toolbox mesh generator only

generates unstructured triangulated meshes incompatible with CNN. Since each element should be

represented by one pixel in an image, we develop a 600 × 600 grid surface equal to the dimensions

of the most significant possible geometry. The stress values are then interpolated between the

triangular elements and grids to determine a stress distribution compatible with our CNN network.

The stress values of all the elements outside the material geometry are assigned a zero, as shown

11

(a)(b)(c)(d)(e)in Fig. 2.5(e). The dimension of the largest sample is 600 mm × 600 mm, and the smallest is 300

mm × 300 mm. Therefore, the size of each element is 1 mm × 1 mm, which means that each image

has 360000 pixels. This high-resolution dataset offers significant accuracy. The maximum and

minimum von Mises stress values for elements among the entire dataset are 96,366 MPa and –0.73

MPa, respectively. We normalized all the output data between 0 and 1 to ensure faster convergence

and encoded it to 600 × 600 matrices.

2.1.2 Model Architecture

The CNN can be built using a sequence of convolutional layers. The convolutional layers learn

to encode the input in simple signals and reconstruct the input [45]. We conducted an ablation

study to determine the best network architecture for our convolutional and deconvolutional layers,

as well as a number of Squeeze-Excitation and Residual blocks. We tested a range of different

architectures, varying the number of convolutional and deconvolutional layers, and the number of

Squeeze-Excitation and Residual blocks. Through our tests, we were able to identify the most

accurate network architecture for our needs. As shown in table 2.2 arch 1 has minimum relative

error compared to its counterparts. This network architecture was then used for the rest of our

experiments. This architecture has three stages of layers: The first stage is downsampling which

consists of seven convolutional layers (E1, E2, E3, E4, E5, E6, E7), and the second stage has three

layers (RS1, RS2, and RS3) of Squeeze-Excitation and Residual blocks (SE-ResNet). In addition,

the Inception and MobileNetV2 blocks are swapped with the SE-ResNet block to check if these

modules can further enhance the network’s performance. The third stage is upsampling, consisting

of six deconvolutional layers (D1, D2, D3, D4, D5, D6), as illustrated in Fig. 2.6.

Table 2.2 Choice of architecture

Item

arch 1

arch 2

arch 3

Architecture
arch 5

arch 4

arch 6

arch 7

arch 8

arch 9

Conv layers
SE-ResNet layers
ConvT layers

7
3
6

6
3
5

8
3
7

7
2
6

6
2
5

8
2
7

7
4
6

6
4
5

8
4
7

PMAE (%)

0.9

1.02

1.21

1.13

1.35

1.49

1.08

1.27

1.40

12

Figure 2.6 Proposed CNN architecture.

2.1.2.1 Residual Block

We used residual blocks to address the vanishing gradient problem.

In addition, SE blocks

are computationally lightweight and result in only very small increases in model complexity. As

illustrated in Fig. 2.7, the formulation of F(x)+x can be realized by feedforward neural networks

with shortcut connections. The shortcut connection simply performs identity mapping, and its

output is added to the output of the stacked layers [46].

Figure 2.7 The building block of residual learning [46].

2.1.2.2 Squeeze-and-Excitation Blocks

As depicted in Fig. 2.8, Squeeze-and-Excitation blocks improve the representational capacity of

the network, enabling dynamic channel-wise feature recalibration. A SE-block can be implemented

with five steps. First, we feed the input x as a convolutional block and the current number of channels

to the SE function, where 𝐹𝑡𝑟 in Fig. 2.8 is the convolutional operator for the transformation of X to

U. Then, at the second phase, Each channel is squeezed into a single numeric value by using average

13

Inputboundaryconditionshorizontalloadverticalloadgeometry600×600600×600600×600600×600300×300150×15075×7519×19600×60038×3810×110123264128512256102410×11010241024102419×1951238×3825675×75128150×15064300×30032600×60012Outputstress distributiondeconv3×3,batchnorm,ReLucopyConv 3×3,batchnorm,ReLuConv 9×9,batchnorm,ReLuResNet+ SEblockeEncodereDecoderE1E4E2E3E5E6E7D6D5D4D3D2D1RS1RS2RS3pooling. Additionally, in the third phase, a fully connected layer is followed by a ReLu function,

which applies a nonlinearity and reduces the output channel complexity. Then in the fourth phase,

SE blocks can be used directly with residual networks. Fig. 2.9 depicts a SE-ResNet module in the

SE block transformation. 𝐹𝑡𝑟 is regarded as the non-identity branch of a residual module. Before

the summation of the identity branch, both squeeze and excitation acts. Using both SE and ResNet

in the network outperforms using ResNet [47].

Figure 2.8 The building block of Squeeze-and-Excitation [47].

2.1.2.3

Inception Block

Figure 2.9 SE-ResNet module.

Inception Modules are used to reduce the computational cost of CNNs. Since neural networks

have to deal with a vast array of images, each with different content, they must be carefully

designed. Using the vanilla version of the inception module, we can perform a convolution on the

input meaning three different sizes of filters (1 × 1, 3 × 3, 5 × 5) instead of one. Also, max pooling

is performed. The outputs are then concatenated and sent to the next layer. Therefore, convolutions

occur at the same level in CNNs, where the network gets wider, not deeper. Compared with shallower

14

and less wide CNNs, this method offers significant quality gains at a modest computational cost

increase [48]. Fig. 2.10 depicts the inception module.

2.1.2.4  MobileNetV2 Block

Figure 2.10 Inception module.

MobileNetV2  is  based  on  an  inverted  residual  block  with  shortcut  connections  between  thin 

bottleneck layers [49].  A lightweight depth-wise convolution technique is used in the intermediate 

layer to filter features as a source of n onlinearity. The nonlinearities must be removed in the narrow 

layers  to  maintain  representational  power.  In  general,  in  this  model,  the  bottlenecks  encode  the 

intermediate  inputs  and  outputs  of  the  model,  while  the  inner  layer  encodes  how  the  model  can 

transform from  lower-level concepts  such  as  pixels to higher-level  features  such  as categories of 

images.  Lastly, shortcuts  can  improve  training  speed  and  accuracy,  just  like  traditional  residual 

connections.  Fig. 2.11 depicts the MobileNetV2.

Figure 2.11 MobileNetV2 module.

2.1.2.5 Network layers and hyperparameters

All the details of the network layers and hyperparameters can be found in tables 2.3 and 2.4.

As can be seen, the models consist of seven Conv layers, three different bottleneck blocks, and

six ConvT layers. Since arch 1 shows the best performance among all architectures; therefore, we

15

keep the network with 1024 channels as the primary model and swap the bottleneck each time with

the SE-ResNet, Inception, and MobileNetV2. We keep the bottleneck dimension the same for all

models to match the ConvT first layer. The batch size is set to 16, leading to the best accuracy

compared to other batch sizes. Different learning rates from 1e−3 to 1e−6, and 1e−5 lead to the

best convergence.

Table 2.3 Network layers

item

Number of layers First layer(H×W×C) Last layer(H×W×C)

Activation

Conv
SE-ResNet
Inception
MobileNetV2
ConvT

7
3
3
1
6

600×600×12
10×10×1024
10×10×1024
10×10×1024
19×19×512

10×10×1024
10×10×1024
10×10×1024
10×10×1024
600×600×12

ReLU
Sigmoid-RelU
ReLU
ReLU6
ReLU

Table 2.4 Network hyperparameters

batch size Learning rate Weight decay Expand ratio Loss function

16

1e−5

1e−7

6

MSE-MSE

2.2 Loss function and performance metrics

We used MSE (mean squared error) for the training loss defined in Eq. 2.1. MSE gives a more

significant penalty to large errors than MAE (mean absolute error), and the errors, also are normally

distributed. Using MAE (mean absolute error), MRPE (mean relative percentage error), PMAE

(percentage mean absolute error), PAE (peak absolute error) and PPAE (peak percentage absolute

error) helps evaluate the overall quality of predicted stress distribution. These metrics are defined

in Equations 2.2 to 2.6, respectively.

𝑀𝑆𝐸 =

1
𝑛

𝑛
∑︁

𝑖=1

(𝑆(𝑖) − 𝑆∧(𝑖))2

𝑀 𝐴𝐸 =

1
𝑛

𝑛
∑︁

𝑖=1

|(𝑆(𝑖) − 𝑆∧(𝑖)|

16

(2.1)

(2.2)

where 𝑆(𝑖) is the stress value at a node ‘i‘ computed by FEA as the ground truth, 𝑆∧(𝑖) is the

corresponding predicted stress by the DL model, and n is the total number of elements at each

sample which is 360000 in our work. Symbol || denotes the absolute value. Our model’s prediction

and ground truth are displayed as 600×600 resolution images.

We used MRPE for measurement of the relative error:

𝑀 𝑅𝑃𝐸 =

𝑀 𝐴𝐸
max(|𝑆(𝑖)|, |

ˆ𝑆(𝑖)|)

× 100

The percentage mean absolute error is defined as:

𝑃𝑀 𝐴𝐸 =

𝑀 𝐴𝐸
𝑚𝑎𝑥 [𝑆(𝑖)] − 𝑚𝑖𝑛[𝑆(𝑖)]

× 100

(2.3)

(2.4)

where max [𝑆(𝑖)] is the maximum value in a set of ground truth stress values and min [𝑆(𝑖)]

is the minimum value. PAE and PPAE measure the accuracy of maximum stress which are one

of the main important critical load values in the predicted stress distribution. The importance

of maximum stress matters in the design phase since maximum stress should be less than yield

strength to avoid permanent deformation. PAE and PPAE are defined as:

𝑃 𝐴𝐸 = [𝑆(𝑖)] − [𝑆∧(𝑖)]

𝑃𝑃 𝐴𝐸 =

𝑃 𝐴𝐸
[𝑆(𝑖)]

× 100

(2.5)

(2.6)

2.3 Results and discussion

All codes are written in PyTorch Lightning and run on two NVIDIA TITAN RTX 24G GPUs.

We use the AdamW (Adam algorithm with weight decay) optimizer to speed up the convergence

17

(a)

(b)

Figure 2.12 MSE and MAE curves on training and testing data with two scales: (a) linear scale, (b)
logarithmic scale.

of models. We train and evaluate different models based on Table 3 to find the model with the best

performance. The training data size of models 1 to 3 is 83,558, and the testing data size is 20,890,

randomly divided with a train/test ratio of 80%–20%. Fig. 2.12 shows MSE and MAE losses as

a function of epochs in model 1. Figs. 2.12a and Fig. 2.12b are linear and logarithmic scales.

Fig. 2.12a shows that the MSE and MAE curves rapidly decline after a few epochs. However,

Fig. 2.12b gives a more precise representation of the model’s behavior. Fig. 2.12b shows that MSE

is smaller than MAE, but both have similar general trends.

We save the best checkpoint during validation, and all error metrics are based on the best

checkpoint. Models 4 to 6 are validated with K-fold cross-validation to ensure that the model is

generalizable. To reduce the computational cost, we divide the dataset into three folds. K-fold

cross-validation shows the best performance in all models based on most metrics, as can be seen in

Table 2.5, which means the model is generalizable.

We replace the SE-ResNet block in the bottleneck with the Inception and MobileNetV2 block

in models 2 and 3, respectively. Model 1, has the best performance in terms of PPAE, with an

error of 0.46% and model 2 is the best model, based on PMAE with a 0.57% error. Fig. 2.13

depicts the performance of different models in terms of MAE. As can be seen in Fig. 2.13, models

3 and 1, which have MobileNetworkV2 and SE-ResNets in a bottleneck, have almost the same

performance, and model 2 with inception block is the best in terms of MAEs. We deem these

18

Table 2.5 Error metrics for models at best checkpoints (Units:MPa)

model

dataset

bottleneck MSE MAE MRPE(%) PMAE(%) PPAE(%)

PAE

classic SE-ResNet
Model 1
classic
Inception
Model 2
Model 3
classic MobileNet
Model 4 Fold 1 SE-ResNet
Model 5 Fold 2 SE-ResNet
Model 6 Fold 3 SE-ResNet
Mean
STD

0.21
0.62
0.38
0.59
0.64
0.61
0.61
0.02

58.8
31.5
51.99
34.90
25.04
32.03
30.65
5.14

3.80
1.54
2.83
1.80
1.10
1.42
1.44
0.28

0.9
0.57
0.93
0.63
0.45
0.57
0.55
0.07

0.46
2.66
4.36
2.19
0.12
0.93
1.08
0.85

30.00
150.55
246.50
124.00
6.93
52.89
61.27
58.15

results satisfactory for stress distribution predictions, specifically the PPAE, the most critical load

value for stress distribution in engineering domain applications.

Figure 2.13 Comparison of different models in terms of MAE.

The predictions produced from some randomly selected samples from the test dataset of model

1 are visualized in Fig. 2.14 Each row represents a sample. Columns (a) to (d) represent geometry,

boundary conditions, and load in horizontal and vertical directions, respectively. Columns (e) and

(f) represent the ground truth and predicted stress distributions. As can be seen, there is a high

fidelity fit between ground truth and predicted stress distributions in both maximum stress and stress

distribution in the different samples. Also, some inaccurate predictions are shown in Fig. 2.15.

19

These predictions still provide information. Fig. 2.16 illustrates the cumulative distribution of

PMAE and PPAE in the test dataset of model 1. Fig. 2.16a shows the probability of mean in PMAE

is 80%, which means that about 80% of predicted samples have a PMAE of less than 0.9, and 50%

of samples have a PMAE of less than 0.46, which is the median. Fig. 2.16b shows that about 99%

of predicted samples have a PPAE of less than 0.46, and 50% of the predicted samples have a PPAE

of 0.06.

2.3.1 Effect of dataset size on the performance of the network

We break the data into different sizes to evaluate the effect of data size on the network’s

performance of model 1. Therefore, besides training with the entire dataset, 104448 samples, we

train the network with 10,000, 20,000, 30,000, 40,000, 50,000, and 70,000 samples. Fig. 2.17

demonstrates that training with just 10% of the dataset can achieve a mean error of 1.85%, which

is acceptable in most engineering applications. Also, it can be seen that if we want to accomplish

a mean error of less than 1%, we should train the network with at least 90% of the dataset. We

also evaluate the effect of data size on the Gaussian distributions of PMAE and PPAE, illustrated

in Figs. 2.18a and 2.18b. As shown in Fig. 2.18a, increasing the data size decreases the standard

deviation of PMAE. However, a 70,000 data size and the total data size have almost the same

standard deviation. Fig. 2.18b shows that the standard deviation of PPAE decreases when the data

size increases from 50000 to 70000. As a result, we should train the network with at least 70,000

examples, 67% of our dataset, to ensure PPAE’s acceptable standard deviation accuracy.

20

Figure 2.14 Predicted stress distribution and corresponding inputs with different loads and boundary
conditions scenarios. Columns (a) to (d) represent (a) geometry, (b) boundary conditions, and load
in a (c) horizontal and (d) vertical direction. Column (e) represent ground truth, and column (f)
shows predicted stress distribution, respectively (Units = MPa).

21

(a)(b)(c)(d)(e)(f)Figure 2.15 Inaccurate predicted stress distribution and corresponding inputs with different loads
and boundary conditions scenarios. Columns (a) to (d) represent (a) geometry, (b) boundary
conditions, and load in a (c) horizontal and (d) vertical direction. Column (e) represent ground
truth, and column (f) shows predicted stress distribution, respectively (Units = MPa).

22

(a)(b)(c)(d)(e)(f)(a)

(b)

Figure 2.16 Cumulative distribution of PMAE and PPAE: (a) PMAE of samples less than mean
and median on the test dataset, (b) PPAE of samples less than mean and median on the test dataset.

Figure 2.17 PMAE at different data sizes.

(a)

(b)

Figure 2.18 Gaussian distribution of PMAE and PPAE. (a) PMAE, (b) PPAE.

23

20406080100DataSize(%)1.01.21.41.61.8PMAE(%)CHAPTER 3

HIGH-RESOLUTION STATIC STRESS DISTRIBUTION PREDICTION

IN DAMAGED STRUCTURAL COMPONENTS

3.1 Predicting physical responses

Accurately and efficiently predicting physical responses is essential for different real-world

applications, such as the prediction of the remaining life of mechanical systems [50], earthquake

alarms [51], and weather forecasting [52]. Although data-driven and physics-based solutions allow

for solid predictions, both methods still suffer from several limitations. The drawbacks of data-

driven methods are the requirement for large amounts of data, the inability to produce physically

consistent results, and the lack of generalization to out-of-sample scenarios.

[53]. However,

physics-based models, such as Finite Element Analysis (FEA), are computationally prohibitive.

Therefore, to achieve fast analysis of mechanical systems and address deficiencies of data-driven

models, we integrate conventional physics-based methods with state-of-the-art Deep Learning (DL)

methods to predict stress distributions in damaged steel components.

3.2 Methodology

3.2.1 Data generation

Two-dimensional steel plate structures with five edges, E1 to E5 denoting edges 1 to 5 as shown

in Fig. 3.2, are considered homogeneous and isotropic linear elastic material. Various geometries

are generated by changing the position of each node in horizontal and vertical directions, as shown

in Fig. 3.2, which led to 1024 unique pentagons. The material properties remain unchanged and

isotropic for all samples. 1024 crack scenarios with various widths, lengths, angles, and locations

were also created on the steel plates. Since the number of geometries is the same as the number of

crack scenarios, each geometry has a unique crack.

The 2D steel plates approach the geometry of gusset plates. The boundary conditions and

24

Figure 3.1 Overview: Phy loss facilitates localizing the cracks and capturing stress concentrations
around the crack tips.

loading angles are considered to simulate similar conditions in common gusset plate structures

under external loading. Adding different loading and boundary conditions extended the population

into 61,440 unique samples. All input variables used to initialize the population are shown in

Table 3.1. Gusset plates are used for connecting beams and columns to braces in steel structures.

Table 3.1 Input variable

Geometry Boundary conditions Load position Load angle(degree) Load magnitude (kN)

pentagon
pentagon
pentagon
pentagon

E2
E2E3
E1E2
E1

E4E5
E5
E4
E2

30,45,60
30,45,60
30,45,60
30,45,60

1,2,3,4,5
1,2,3,4,5
1,2,3,4,5
1,2,3,4,5

For crack initiation, we divided the steel plate into 15 different regions to create cracks with

25

Horizontal     LoadVertical   LoadMask 1Mask 2CNNCNNResNetSE blockLoss = MAEPhy LossPhy LossGusset plateLoadFigure 3.2 Basic schematic topology for initializing the damaged steel plate geometries.

different lengths and ensure that the crack length would not violate the edges of the steel plate.

Fig. 3.3 shows all damage locations in 3 categories. Every red point represents the center of single

damage in the steel plate. Categories 1 to 3 each have 9, 4, and 2 subcategories, respectively. Each

plate has just one crack, and the other red points just represent the location of the cracks. Details

of crack initiation are shown in Table 3.2.

Figure 3.3 Different location of damages on steel plates: (a) category 1, (b) category 2, (c) category
3.

The distributed static loads applied to the gusset plates in this study ranged from 1 to 5 kN

with intervals of 1 kN. Moreover, loads were applied with 3 different angles, 𝜋

6 , 𝜋

4 and 𝜋

3 on either

one or two edges of the plate. The load is decomposed into its horizontal and vertical direction

components. Also, four types of boundary conditions are considered, as shown in Fig. 3.4, similar

to real gusset plates’ boundary conditions. All the translational and rotational displacements were

fixed at the boundary conditions. The minimum and maximum range for the width and height of

the plates are from 30 cm to 60 cm.

26

Figure 3.4 Different types of boundary conditions for initializing population.

Table 3.2 Detail of damages in steel plates

Crack Width
(mm)
number

Length of
category1 (mm)

Length of
category2 (mm)

Length of
category3 (mm)

Angle
(degree)

1
2
3
4
5
6
7

1
2
3
4
5
6
7

10,40,80
10,40,80
10,40,80
10,40,80
10,40,80
10,40,80
10,40,80

120
120
120
120
120
120
120

160,200
160,200
160,200
160,200
160,200
160,200
160,200

0,30,45,90
0,30,45,90
0,30,45,90
0,30,45,90
0,30,45,90
0,30,45,90
0,30,45,90

3.2.1.1

Input Data

The geometry is encoded into a 600 × 600 matrix as a single channel binary image. 0 (black)

and 1 (white) denote the outside and inside of the geometry, as shown in Fig. 3.5a. The boundary

condition is also represented by another 600 × 600 pixel binary image, where the constrained edges

are defined by 1 (white) as shown in Fig. 3.5b. Moreover, each horizontal and vertical component

of the load is encoded as one 600 × 600-pixel single-channel colored image, as shown in Figs. 3.5c

and 3.5d. Each row of Fig. 3.5 represents one of the simulated boundary conditions and its load

positions as described in Table 3.1. The magnitude of the horizontal and vertical components of

the loads, after decomposition, varies between 0.5 kN and 4.33 kN. These loads are normalized

between (100,0,0) and (255,0,0) as RGB colors to create a color image where the colored part

represents the location and magnitude of the load as shown in Figs. 3.5c and 3.5d.

27

Figure 3.5 Input and output representation for stress distribution prediction: (a) damaged geometry,
(b) boundary condition, (c) horizontal load, (d) vertical load, (e) output.

3.2.1.2 Output Data

FEA was performed using the Partial Differential Equation (PDE) solver in the MATLAB

toolbox to obtain the stress distributions of each sample. We didn’t use other FE approaches, such

as a Carerra unified formulation [54] due to computational cost. Carrera Unified Formulation

allows FE matrices and vectors to be derived in terms of fundamental nuclei. The MATLAB PDE

toolbox mesh generator only generates unstructured triangulated meshes, which are not compatible

with CNN. The minimum and maximum triangulated mesh sizes are 5 mm and 10 mm; respectively.

Since each element should be represented by one pixel in an image, we develop a 600 × 600 grid

surface equal to the dimensions of the largest possible geometry. Figs. 3.6a and 3.6b show the

28

unstructured mesh and 600 by 600 grid surface on top of the one random sample, respectively. The

stress values are then interpolated between the triangular elements and grids to determine a stress

distribution compatible with our CNN network. The stress values of all the elements outside of

the material geometry are assigned to zero, as shown in Fig. 3.5e. The dimensions of the largest

sample are 600 × 600 mm, and the smallest is 300 mm × 300 mm. Therefore, the dimension of

each element is 1 mm × 1 mm, which means that each image has 360,000 pixels. All the cracks

are initialized in the smallest dimension (300 × 300 mm) to keep all lengths of cracks inside the

geometry. This high-resolution dataset led to achieving significant accuracy. The maximum and

minimum von Mises stress values for elements among the entire dataset are 362,687 MPa and

-138.35 MPa, respectively. We normalized all the output data between 0 and 1 to ensure faster

convergence and encoded it to 600 × 600 matrices.

Figure 3.6 A sample of mesh generation: (a) unstructured triangular mesh, (b) structured gird
surface.

3.2.2 CNN Architecture

The CNN can be built in different ways using a sequence of convolutional layers. The convolu-

tional layers learn to encode the input in a set of simple signals and then reconstruct the input [45].

In Chapter 2, an ablation study was conducted to identify the most suitable architecture for the

task. After careful consideration of the similarities between our dataset and the dataset used in

Chapter 2, we decided to use an architecture that is almost identical to the one used in the ablation

29

study. The only difference is that our dataset includes cracks whereas the dataset in Chapter 2

does not. This was a conscious decision as the architecture used in Chapter 2 has been proven to

work well for similar datasets. However, given the addition of cracks to our dataset, we decided

to add self-attention layers to increase the weight of the crack area, which is of particular interest

to us. Our CNN architecture consists of 4 types of layers: The first stage is downsampling layers

which consist of seven convolutional layers (E1, E2, E3, E4, E5, E6, E7), and the second stage is

3 layers (RS1, RS2, and RS3) of Squeeze-Excitation and Residual blocks (SE-ResNet). The last

stage is upsampling layers which consist of six deconvolutional layers (D1, D2, D3, D4, D5, D6)

and 2 self-attention layers (SA1, SA2), as illustrated in Fig. 3.7. The layer sizes can be also seen in

Table 3.3.

Figure 3.7 Model architecture.

Table 3.3 Size of network layers

Downsampling
H×W×C

E1
600×600×12

E2
600×600×32

E3
600×600×64

E4
600×600×128

E5
600×600×256

E6
600×600×512

E7
600×600×1024

SE-ResNet
H×W×C

RS1
10×10×1024

RS2
10×10×1024

RS3
10×10×1024

Upsampling
H×W×C

D1
19×19×512

D2
38×38×256

SA1
38×38×256

D3
75×758×128

SA2
75×758×128

D4
150×150×64

D5
300×300×32

D6
600×600×12

3.2.2.1 Residual Block

We used residual blocks to address the vanishing gradient problem.

In addition, SE blocks

are computationally lightweight and result in only very small increases in model complexity. As

illustrated in Fig. 3.8, the formulation of F(x)+x can be realized by feedforward neural networks

30

with shortcut connections. The shortcut connection simply performs identity mapping, and its

output is added to the output of the stacked layers [46].

Figure 3.8 The building block of residual learning [46].

3.2.2.2 Squeeze-and-Excitation Blocks

As depicted in Fig. 3.9, Squeeze-and-Excitation blocks improve the representational capacity of

the network, enabling dynamic channel-wise feature recalibration. A SE-block can be implemented

with five steps. First, we feed the input x as a convolutional block and the current number of channels

to the SE function, where 𝐹𝑡𝑟 in Fig. 3.9 is the convolutional operator for the transformation of X

to U. Then, at the second phase, Each channel is squeezed into a single numeric value by using

average pooling. Additionally, in the third phase, a fully connected layer is followed by a ReLU

function, which applies a nonlinearity and reduces the output channel complexity. Then in the

fourth phase, SE blocks can be used directly with residual networks. Fig. 3.10 depicts a SE-ResNet

module in the SE block transformation. 𝐹𝑡𝑟 is regarded as the non-identity branch of a residual

module. Before summation of the identity branch, both squeeze and excitation act. Using both SE

and ResNet in the network outperforms using ResNet [47].

Figure 3.9 The building block of Squeeze-and-Excitation [47].

31

3.2.2.3 Self-attention blocks

Figure 3.10 SE-ResNet module.

In DL, the attention mechanism is inspired by human vision. Our brain transmits a signal via

neurons after we receive visual information from the outside. Humans benefit from this process as

it helps them focus on the right areas, and it reduces the weight of unrelated areas in their attention.

As part of the feature extraction process of the input image, attention increases the weight of the area

of interest and reduces the weight of unrelated regions. In the current paper, we use Self-Attention

GAN (SAGAN) [55] to improve the prediction’s results. Convolution processes information in a

local neighborhood; therefore, using a single convolutional layer is computationally inefficient for

modeling long-range dependency in images. SAGAN helps efficiently model relationships between

widely separated spatial regions, even areas far apart; it can simply capture global dependencies.

In the self-attentions mechanism, the convolutional image feature maps are broadened into three

copies, corresponding to the key, value, and query concepts in the transformer [56]. key, value, and

query are Key:

𝑓 (𝑥) = 𝑊 𝑓 𝑥, Query: 𝑔(𝑥) = 𝑊𝑔𝑥 , and Value: ℎ(𝑥) = 𝑊ℎ𝑥 . The image features

from the previously hidden layer are first transformed into two feature spaces f(x) and g(x), to

calculate the attention and then apply the dot-product attention to output the self-attention feature

maps using equations 3.1 and 3.2. The entire process of the self-attention mechanism in SAGAN

is depicted in Fig. 3.11.

𝑎𝑖, 𝑗 = 𝑆𝑜 𝑓 𝑡𝑚𝑎𝑥( 𝑓 (𝑥𝑖)𝑇 𝑔(𝑥𝑖))

(3.1)

32

𝑜 𝑗 = 𝑤𝑣 (

𝑛
∑︁

𝑖=1

𝑎𝑖, 𝑗 ℎ(𝑥𝑖))

(3.2)

Figure 3.11 The building block of self-attention module for the SAGAN [55].

3.3 Loss function and performance metrics

3.3.1 Custom loss function

Learning biases can be established by proper choice of loss functions, constraints, and inference

algorithms that can regulate the training step of the ML model to explicitly direct convergence

towards solutions that adhere to the fundamental of physics [57]. The underlying physical laws can

be satisfied by using and tuning such a penalty constraint. In this thesis, we consider the stress

concentration factor equation to improve the prediction of stress concentration around the crack tip

in steel gusset plates. It can be demonstrated from mathematical analysis and experimental results

that stress distributions occur near changes in sections of a loaded structural component. It reaches

greater magnitudes than the average stress in the section. It is called stress concentration when

the peak stress increases near openings and other changes in the section. Equation 3.3 defines

stress concentration as the peak stress relative to the nominal stress that would exist if the stress

distribution remained uniform [58].

𝐾𝑡 =

𝜎𝑚𝑎𝑥
𝜎𝑛𝑜𝑚

33

(3.3)

where,𝐾𝑡 is the stress concentration factor, 𝜎𝑚𝑎𝑥 and 𝜎𝑛𝑜𝑚 are peak stress around the crack tip

and nominal stress in the remainder of the section, respectively. We create binary masks to apply

the stress concentration factor equation to the loss function. Fig. 3.12 illustrates one of the possible

mask scenarios, in Fig. 3.12a crack (rectangle with gray color) is surrounded by mask 1 (white

rectangle) which all pixel values of the mask are one (white), and all other pixel values are zero

(black). Fig. 3.12b shows mask 2 which all the zero pixel values from mask 1 are replaced with

value one, and all one pixel’s values are replaced with value zero. The relation between the two

masks is mask 2 = 1 - mask 1. Mask 1 represents the area where the stress concentration factor

should be applied to capture the peak stress, and mask 2 represents the area with nominal stress

distribution. Based on the above description, our custom loss function will be defined as below:

𝐿𝑜𝑠𝑠 =

𝜆𝑃𝐻𝑌
𝑛

𝑛
∑︁

𝑛=1

(𝑆(𝑖) − 𝑆∧(𝑖))2 · 𝑀1 +

1
𝑚

𝑚
∑︁

𝑚=1

(𝑆(𝑖) − 𝑆∧(𝑖))2 · 𝑀2

(3.4)

where 𝜆𝑃𝐻𝑌 is the stress concentration factor, n and m are the number of white pixels in mask

1 and mask 2, respectively. 𝑀1 is mask 1 and 𝑀2 is mask 2. S(i) is the stress value at a node ‘i‘

computed by FEA as the ground truth and, s (i) is the corresponding predicted stress by the DL

model, and ‘·‘ is the symbol of the Hadamard product.

Figure 3.12 Illustration of binary masks. (a) mask 1, (b) mask 2.

3.3.2 Performance metrics

We used the custom loss function for the training loss as defined in equation 2. Therefore,

its error is defined as CLE (Custom Loss Error). We also used MAE (Mean Absolute Error),

34

PMAE (Percentage Mean Absolute Error), PAE (Peak Absolute Error), and PPAE (Percentage

Peak Absolute Error) to evaluate the overall quality of predicted stress distribution. These metrics

are defined in Equations 3.5 3.6, 3.7, 3.8, respectively.

𝑀𝑆𝐸 =

1
𝑛

|

𝑛
∑︁

𝑖=1

(𝑆(𝑖) − 𝑆∧(𝑖))2|

(3.5)

where 𝑆(𝑖) is the stress value at a node ‘i‘ computed by FEA as the ground truth, 𝑆∧(𝑖) is the

corresponding predicted stress by the DL model, and n is the total number of elements at each

sample which is 360,000 in our work. Symbol || denotes the absolute value. Our model’s prediction

and ground truth are displayed as 600×600 resolution images.

The percentage mean absolute error is defined as:

𝑃𝑀 𝐴𝐸 =

𝑀 𝐴𝐸
𝑚𝑎𝑥 [𝑆(𝑖)] − 𝑚𝑖𝑛[𝑆(𝑖)]

× 100

(3.6)

where max [𝑆(𝑖)] is the maximum value in a set of ground truth stress values and min [𝑆(𝑖)]

is the minimum value.

PAE and PPAE measure the accuracy of maximum stress which are one of the main important

critical load values in the predicted stress distribution. The importance of maximum stress matters

in the design phase, since maximum stress should be less than yeild strength to avoid permanent

deformation. PAE and PPAE are defined as:

𝑃 𝐴𝐸 = [𝑆(𝑖)] − [𝑆∧(𝑖)]

𝑃𝑃 𝐴𝐸 =

𝑃 𝐴𝐸
[𝑆(𝑖)]

× 100

(3.7)

(3.8)

3.4

Implementation details

All codes are written in PyTorch Lightning and run on two NVIDIA TITAN RTX 24G GPUs.

AdamW optimizer [59] was used to speed up the convergence of our models using a learning rate

of 1e−5. The batch size is set to 8, leading to the best accuracy compared to other batch sizes. The

35

value of the stress concentration factor, which is applied as a 𝜆𝑃𝐻𝑌 in the custom loss function is

15, leading to the best results compared to the other values.

3.5 Results and discussions

3.5.1 Main results

We train and evaluate our model on custom loss using the entire dataset for 200 epochs, the

other metrics are plotted as independent metrics. The training data size is 49,152, and the separate

testing data size is 12,288, which is randomly divided with a train/test ratio of 80% - 20%. Fig. 3.13

shows Custom Loss Function (CLE) and MAE losses as a function of epochs. Fig. 3.13a is in linear

scales, and Fig. 3.13b is in logarithmic scales. Fig. 3.13a shows that the curves of both CLE and

MAE rapidly declined after a few epochs. However, Fig. 3.13b gives a more precise representation

of the model’s behavior. From Fig. 3.13b, it can be seen that CLE is less than MAE, which is due

to penalization of the loss with the stress concentration factor.

Figure 3.13 CLE and MAE curves on training and testing data with two scales: (a) linear scale, (b)
logarithmic scale.

We saved the best checkpoint during training, epoch 181, and all error metrics are based on this

checkpoint. The evaluation results of the network are shown in Table 3.4.

As can be seen, PMAE and PPAE for the testing dataset are just 0.22% and 1.50%, respectively.

We consider these results satisfactory for stress distribution predictions of damaged structural

components, specifically the PPAE, which is the most critical load value for stress distribution and

36

Table 3.4 Error metrics at epoch 181 (Units: MPa)

Metrics CLE MAE PMAE(%) PAE PPAE(%)

Testing

4.5

22.01

0.22

10.8

1.5

stress concentration in engineering domain applications.

Fig. 3.14 illustrates the cumulative distribution of PMAE and PPAE in the test dataset. Fig. 3.14a

shows that the probability of mean in PMAE is about 2%, which means that about 2% of predicted

samples have a PMAE of less than 0.22; however, 80% of predicted samples have PMAE less than

3, and 50% of predicted samples have a PMAE of less than 1.21, which is the median. Fig. 3.14b

shows that about 99% of predicted samples have a PPAE of less than 1.5, 80% of predicted samples

have PMAE less than 0.35 and, 50% of the predicted samples have a PPAE of less than 0.17.

Figure 3.14 Cumulative distribution of PMAE and PPAE: (a) PMAE of samples less than mean,
median and, 80% of data on the test dataset, (b) PPAE of samples less than mean, median and, 80%
of data on the test dataset.

The prediction results of some randomly selected samples from the test dataset are visualized

in Fig. 3.15. Each row represents a sample. Columns (a) to (d) represent geometry, boundary

conditions, and load in horizontal and vertical directions, respectively. Columns (e) and (f)

represent the ground truth and predicted stress distributions, respectively. Comparing columns (e)

and (f) shows that predicted stress distributions are pretty similar to the ground truths.

Fig. 3.15 also demonstrates that the network can localize and quantify different sizes of damage,

even tiny cracks. The second row of Fig. 3.15 is an example of rectangle damage with a size of 10 ×

37

Figure 3.15 Predicted stress distribution and corresponding inputs with different loads and boundary
conditions scenarios. Columns (a) to (d) represent (a) geometry, (b) boundary conditions, and load
in a (c) horizontal and (d) vertical direction. Column (e) represent ground truth, and column (f)
shows predicted stress distribution, respectively (Units =MPa).

1 mm, which shows proper damage localization and stress distribution prediction for the remainder

of the section.

38

Fig. 3.16 shows two cases of Fig. 3.15 with larger scales to show details of ground truth

and predicted stress distribution. Moreover, comparing the last two columns of Fig. 3.15 shows

the efficacy of our novel custom loss function, which can accurately capture stress concentration

around crack tips. This means the learned algorithm can capture the underlying knowledge of

physics behind the stress concentration.

Figure 3.16 Larger representative from Fig. 15. (a) ground truth (b) predicted (Units = MPa).

Some inaccurate predictions are also shown in Fig. 3.17. As can be seen, these predictions

can still capture damage locations and stress concentration around crack tips; however, mean and

peak stress distribution in some parts of the ground truth and predictions slightly vary. Fig. 3.18

shows two cases of Fig. 3.17 with larger scales to show details of ground truth and predicted stress

distribution.

39

Figure 3.17 Inaccurate predicted stress distribution and corresponding inputs with different loads
and boundary conditions scenarios. Columns (a) to (d) represent geometry, boundary conditions,
load in the horizontal and vertical direction, respectively. Columns (e) and (f) represent ground
truth and predicted stress distribution, respectively (Units = MPa).

3.5.2 Study on the effect of the custom loss function

We have also trained our model using torch.nn.MSELoss function compares its results with our

custom loss model to evaluate how efficient our proposed custom loss function is. We investigated

40

Figure 3.18 Larger representative from Fig. 17. (a) ground truth (b) predicted (Units = MPa).

the performance of the custom loss model with different stress concentration factors and compared

it with the MSE model. Table 3.5 demonstrates that the performance of the custom loss model with

a stress concentration factor equal to 1 is almost the same as the MSE model, which is expected

based on equation 3.4.

Table 3.5 Error metrics with different stress concentration factors (Units: MPa)

Model

Stress concentration factor PMAE(%) PPAE(%)

Custom loss
Custom loss
Custom loss
Custom loss
Custom loss
MSE loss

1
8
15
20
30
None

41

0.05
0.12
0.22
0.27
0.48
0.053

2.52
2
1.5
2.3
3.7
2.5

However, with applying higher and lower values than 15 the PPAE increased in the custom

loss model. It seems lower stress concentration factors do not trigger the pixel values within the

damaged area and higher stress concentration factors are over triggering the pixel values within the

damaged area. Therefore, we trained our custom loss model with stress factor 15 to obtain the best

results. The error metric in the MSE loss model and the custom loss model with stress factor 15

are presented in the Table 3.6.

Table 3.6 Error metrics in MSE and custom loss model (Units: MPa)

Metrics

CLE MAE PMAE(%)

PAE

PPAE(%)

Custom loss
MSE

4.5

22.01
5.07

0.22
0.053

10.8
109.03

1.5
2.5

As can be seen, the custom loss model’s performance is slightly better than the MSE model

in terms of PPAE, which is expected since we penalize the damaged areas 15 times more than the

remainder of the section. However, the MSE model performs better in terms of the MAE metric

than the custom loss model. This means the custom loss model is better in damage localization and

capturing stress concentration around the crack tips, and the MSE model is better in general stress

prediction. Although PPAE in the custom loss model is 1% less than the MSE model, the custom

loss has a significant advantage in damage localization and capturing stress concentration around

crack tips.

Fig. 3.19 compares predicted stress distribution between MSE and custom loss model and the

masks used for the custom loss function. In Fig. 3.19, Columns (a) represent the mask used for

penalizing the damaged area, and column (b) represents the mask used for the remainder of the

section, which has no penalization. Columns (c), (d), and (e) represent the ground truth, MSE

model prediction, and, custom loss model prediction. As it can be seen in the first row of Fig. 3.19,

the MSE model cannot localize the crack; however, the custom loss model has completely localized

the tiny crack and its stress concentration around the crack tip. Fig. 3.19 also shows that the MSE

model can localize larger cracks but still is not as good as the custom loss model in capturing stress

concentration around crack tips. Fig. 3.20 shows two cases of Fig. 3.19 with larger scales to show

42

Figure 3.19 Comparison of predicted stress distribution in MSE and custom loss model and their
corresponding masks. Columns (a) and (b) represent the mask for the damaged area and the
remainder of the section, respectively. Columns (c) to (e) represent ground truth, and predicted
stress distribution in the MSE and custom loss models, respectively (Units = MPa).

details of ground truth and predicted stress distribution with the MSE loss model and custom loss

model.

43

Figure 3.20 Larger representative from Fig. 19. (a) ground truth (b) MSE loss model (c) custom
loss model (Units = MPa).

44

CHAPTER 4

NEURO-DYNASTRESS: PREDICTING DYNAMIC STRESS

DISTRIBUTIONS IN STRUCTURAL COMPONENTS

4.1 Need for fast dynamic analysis

Structural components are typically exposed to dynamic loadings, such as earthquakes, wind,

and explosions. Structural engineers should be able to conduct real-time Finite Element Analysis

(FEA) aftermath or during extreme disaster events requiring immediate corrections to avoid fatal

failures. For instance, if after a disruptive event a crack occurs in the bridge, fast FEA will help

engineers to understand what kind of immediate action should be taken before the failure of the

bridge. As a result, it is crucial to predict dynamic stress distributions during highly disruptive

events in real time. The main idea here is to train a model that can later be used when real-

time estimations are needed, such as in the aftermath of extreme disruptive events. For example,

focusing on critical structural components, there is a need for immediate assessment following a

disaster or during extremely disruptive events to guide corrective actions. Engineers could rely on

the proposed computationally efficient algorithms to determine stress distributions over damaged

gusset plates and apply the proper rehabilitation actions. They need to be able to analyze gusset

plates quickly and accurately, which is what our model can provide.

Some studies have been conducted to develop methods of predicting structural response using

ML models. Dong et al. [32] proposed a support vector machine approach to predict nonlinear

structural responses. Wu et al. [33] Utilized deep convolutional neural networks to estimate the

structural dynamic responses. Long short-term memory (LSTM) [34] was used by Zhang et al. [35]

to predict nonlinear structural response under earthquake loading. The few models that studied

stress predictions suffer from the problem of low-resolution predictions, making them unsuitable

for decision-making after a catastrophic failure. To the best of our knowledge, this is the first work

to predict dynamic stress distribution in the specific domain of steel plates with high accuracy and

45

Figure 4.1 Overview: Unlike FEM, our proposed Neuro-DynaStress is computationally efficient and
facilitates real-time analysis. The existing workflow for FEM applications includes: (i) modeling
the geometry and its components, (ii) specifying material properties, boundary conditions, meshing,
and loading, (iii) dynamic analysis, which may be time-consuming based on the complexity of the
model. Our Neuro-DynaStress takes geometry, boundary condition, and load as input and predicts
the dynamic stress distribution at all time steps in one shot.

low latency. The algorithm takes the geometry, boundary conditions, and time histories as input

and renders the dynamic von Mises stress distribution as an output. We modeled the steel plates as

gusset plates with dynamic loading applied at different edges, different boundary conditions, and

varying complex geometries.

4.2 Methods

4.2.1 Data Generation

The data generation process is almost identical to its representation in chapter 2, and we have

included it in this chapter as well since each chapter discusses a different topic. Two-dimensional

steel plate structures with five edges, E1 to E5 denoting edges 1 to 5, as shown in Fig. 4.2, are

46

Gusset plateLoadNeuro-DynaStressconsidered homogeneous and isotropic linear elastic materials. Various geometries are generated

by changing the position of each node in horizontal and vertical directions, as shown in Fig. 4.2,

which led to 1024 unique pentagons. The material properties remain unchanged, isotropic for all

samples. The 2D steel plates approach the geometry of gusset plates. Gusset plates connect beams

and columns to braces in steel structures. The behavior and analysis of these components are

critical since various reports have observed failures of gusset plates subject to lateral loads. The

boundary conditions and time-history load cases are considered to simulate similar conditions in

common gusset plate structures under external loading. We showed the most common gusset plates

in practice in chapter 2.

Figure 4.2 Basic schematic topology for initializing the steel plate geometries.

A total of 57,344 unique samples were created by combining 14 random time-history load cases

and the four most common boundary conditions in gusset plates. Boundary conditions are shown

in Fig. 4.3, mimicking the real gusset plates’ boundary conditions. All the translation and rotational

displacements were fixed at the boundary conditions. The range for width and height of the plates

is from 30 cm to 60 cm. Each time history consists of 100 time steps generated with random sine

and cosine frequencies. The frequencies range between 1 and 3 Hz, with amplitudes ranging from

2 to 10 kN at intervals of 2 kN. All time histories in horizontal and vertical directions are shown

in Fig. 4.4. Considering 100 time steps, each interval is 0.01 seconds, making the total time equal

to 1 second. All the details for the input variables used to initialize the population are shown in

Table 4.1.

47

30 cm15 cm15 cm5 cm15 cm5 cmE1E3E4E5E215 cmFigure 4.3 Different types of boundary conditions for initializing population.

(a)

(b)

Figure 4.4 Time histories (a) Horizontal direction (b) Vertical direction.

Table 4.1 Input variable

Geometry

pentagon
pentagon
pentagon
pentagon

Boundary
conditions

Load
position

Frequencies
(HZ)

E2
E2E3
E1E2
E3

E4E5
E5
E4
E2E5

1,1.5,2,2.5,3
1,1.5,2,2.5,3
1,1.5,2,2.5,3
1,1.5,2,2.5,3

Load (kN)

2,4,6,8,10
2,4,6,8,10
2,4,6,8,10
2,4,6,8,10

Time
steps

Total
time (s)

100
100
100
100

1
1
1
1

4.2.2 Input Data

The geometry is encoded as a 200 × 200 matrix and, incidentally, a binary image. 0 (black) and

1 (white) denote outside and inside of the geometry, as shown in Fig. 4.5a. The boundary condition

48

E1E3E4E5E2E1E3E4E5E2E1E3E4E5E2(a)(b)(c)(d)E1E3E4E5E2020406080100−15000−10000−5000050001000015000020406080100−15000−10000−5000050001000015000020406080100−15000−10000−5000050001000015000020406080100−15000−10000−5000050001000015000020406080100−15000−10000−5000050001000015000020406080100−15000−10000−5000050001000015000020406080100−15000−10000−5000050001000015000020406080100−15000−10000−5000050001000015000020406080100−15000−10000−5000050001000015000020406080100−15000−10000−5000050001000015000020406080100−15000−10000−5000050001000015000020406080100−15000−10000−5000050001000015000020406080100−15000−10000−5000050001000015000020406080100−15000−10000−5000050001000015000Time(s)Load(N)020406080100−10000−50000500010000020406080100−10000−50000500010000020406080100−10000−50000500010000020406080100−10000−50000500010000020406080100−10000−50000500010000020406080100−10000−50000500010000020406080100−10000−50000500010000020406080100−10000−50000500010000020406080100−10000−50000500010000020406080100−10000−50000500010000020406080100−10000−50000500010000020406080100−10000−50000500010000020406080100−10000−50000500010000020406080100−10000−50000500010000Time(s)Load(N)is also represented by another 200×200 pixel binary image, where the constrained edges are defined

by 1 (white) as shown in Fig. 4.5b. Moreover, each time step of time histories for horizontal and

vertical components is encoded in the load position of the corresponding frame. Load positions

in each time step have values between 0 and 1, corresponding to each time step of time histories,

and all remaining elements are zero. All the load frames of each sample in horizontal and vertical

directions are saved as tensors of dimension 100 × 200 × 200. Figs. 4.5c and 4.5d show loads in

the horizontal and vertical directions. The colored load positions in Figs. 4.5c and 4.5d are used

only for visualization. Each row of Fig. 4.5 represents one of the simulated samples. Details of

boundary conditions and their load positions are described in Table 4.1.

Figure 4.5 Input and output representation for stress distribution prediction: (a) geometry, (b)
boundary condition, (c) horizontal load, (d) vertical load, (e) output.

4.2.3 Output Data

FEA was performed using the Partial Differential Equation (PDE) solver in the MATLAB

toolbox to obtain the stress distributions of each sample. We used transient-planestress function of

MATLAB PDE solver to generate dynamic stress contours as the ground truth of our model. We

defined the geometry, boundary condition, material properties, and time histories as input, and the

PDE solver returns the sequence of stress distributions corresponding to the inputs. The MATLAB

PDE toolbox mesh generator only generates unstructured triangulated meshes incompatible with

CNN. The minimum and maximum triangulated mesh sizes are 5 and 10mm, respectively. Since

49

each element should be represented by one pixel in an image, we develop a 200 × 200 grid

surface equal to the dimensions of the largest possible geometry. Figs. 4.6a and 4.6b show the

unstructured mesh and the 200 × 200 grid surface on top of a random sample. The stress values

are then interpolated between the triangular elements and grids to determine a stress distribution

compatible with our CNN network. The stress values of all the elements outside the material

geometry are assigned to zero, as shown in Fig. 4.5e.

Figure 4.6 A sample of mesh generation: (a) unstructured triangular mesh, (b) structured gird
surface.

The dimension of the largest sample is 600 × 600 mm, and the smallest is 300 × 300 mm.

Using a mesh grid of 200 × 200 on top of samples made each element 3 × 3 mm, which means that

each frame of output has 40,000 pixels. This high-resolution dataset led to achieving significant

accuracy. The maximum and minimum von Mises stress values for elements among the entire

dataset are 279,370 and -980 MPa, respectively. We normalized all the output data between 0 and

1 to ensure faster convergence and encoded it to 200 × 200 for each frame.

4.2.4 Stress Calculation

The steps for linear finite element analysis’ stress calculation, which is part of phase (iii) of

FEA’s workflow elaborated in the introduction section, are as follows:

𝐾𝑄 = 𝐹

50

(4.1)

where 𝐾 denotes a global stiffness matrix, 𝐹 is the load vector applied at each node, and 𝑄

denotes the displacement. A stiffness matrix 𝐾 consists of elemental stiffness matrices 𝐾𝑒:

𝐾𝑒 = 𝐴𝑒 𝐵𝑇 𝐷𝐵

(4.2)

where 𝐵 represents strain-displacement matrix; 𝐷 represents stress-strain matrix; and 𝐴𝑒 rep-

resents area of element. Mesh geometry and material properties determine 𝐵 and 𝐷. This will be

followed by adding the local stiffness matrix 𝑘 𝑒 to the global stiffness matrix. The displacement

boundary conditions are encoded using the corresponding rows and columns in the global stiffness

matrix 𝐾. Solving 𝑄 can be achieved using direct factorization or iterative methods.

As a result of calculating the global displacement using equation 4.1, we can calculate the

nodal displacements 𝑞 then we can calculate the stress tensors of each element as follows:

𝜎 = 𝐷𝐵𝑞

(4.3)

where 𝜎 specifies the tensor of an element. The 2-D von Mises Stress criterion is then used to

calculate each element’s von Mises Stress:

√︃

𝜎𝑣𝑚 =

𝜎2

𝑥 + 𝜎2

𝑦 − 𝜎𝑥𝜎𝑦 + 3𝜏2
𝑥𝑦

(4.4)

where 𝜎𝑣𝑚 denotes von Mises Stress, 𝜎𝑥, 𝜎𝑦 are the normal stress components and 𝜏𝑥𝑦 is the

shear stress component.

4.3 Proposed Methodology

We use convolutional layers to encode the spatial information from the input. Our hypothesis is

that these layers will combine the information in geometry, boundary conditions, and load. A key

characteristic of dynamic structural systems is the temporal dependence of their states. LSTM is a

suitable architecture for modeling temporal information in sequence and hence is a good choice to

model structural dynamic systems in our experiments. For high-quality 2D reconstructions, we use

transposed convolutional layers in our decoder. For further improving training and performance, we

use modules from the recently proposed feature-aligned pyramid networks (FaPN) [60]. FaPN al-

lows the decoder to access information from the encoder directly. Overall, our network architecture

51

consists of four modules: encoder consisting of convolutional layers, temporal module made using

LSTM modules, decoder consisting of transposed convolutional layers, and alignment modules

acting as connections between encoder and decoder. The number of layers in each module and the

number of layers in LSTM modules were chosen based on their performance. The architecture is

illustrated schematically in Fig. 4.7. The size of layers and hyper-parameters used in the network

are summarized in Table 4.2.

Figure 4.7 Architecture for the proposed Neuro-DynaStress. The convolutional encoder maps the
raw input data to a latent space. LSTM layers processes the information across different time
frames. The final output is obtained from the resulting latent representation using transposed
convolutional layers.

Table 4.2 Network layers and hyper-parameters

Type of layers Number of layers First layer (H×W×C) Last layer (H×W×C)

Conv
LSTM
ConvT
FaPN

6
4
5
4

Batch size

8

Learning rate
10−4

200×200×16
1×1×512
13×13×256
13×13×256

Weight decay
10−5

7×7×512
1×1×512
200×200×16
100×100×32

Loss function

MAE

4.4 Loss Function and Performance Metrics

We use Mean Absolute Error (MAE), defined in Eq. 4.5 as the primary training loss and metric.

To ensure that we do not overfit to a single metric.

MAE =

1
𝑁𝑇

𝑛,𝑡
∑︁

𝑁,𝑇

(cid:12)𝑆(𝑛, 𝑡) − ˆ𝑆(𝑛, 𝑡)(cid:12)
(cid:12)
(cid:12)

(4.5)

52

LSTMFeature-aligned Pyramid Network (FaPN)Elementwise AdditionConv LayerTransposed Conv LayerWe also use Mean Relative Percentage Error (MRPE) defined in Eq. 4.6 to evaluate the overall

quality of predicted stress distribution.

MRPE =

MAE
max(|𝑆(𝑛, 𝑡)|, | ˆ𝑆(𝑛, 𝑡)|)

× 100

(4.6)

where 𝑆(𝑛, 𝑡) is the true stress value at a node 𝑛 at time step 𝑡, as computed by FEA, and ˆ𝑆(𝑛, 𝑡)

is the corresponding stress value predicted by our model, 𝑁 is the total number of mesh nodes in

each frame of a sample, and 𝑇 is a total number of time steps in each sample. As mentioned earlier,

we set 𝑇 = 100 in our experiments.

4.5

Implementation and Computational Performance

We implemented our model using PyTorch [61] and PyTorch Lightning. AdamW optimizer [59]

was used as the optimizer with a learning rate of 10−4. We found that a batch size of 8 gave the

best results. The computational performance of the model was evaluated on an AMD EPYC 7313

16-core processor and two NVIDIA A6000 48G GPUs. The time required during the training phase

for a single sample with 100 frames and a batch size of 8 was 10 seconds. In the training phase,

one forward and backward pass was considered. The inference time for one sample was less than 5

ms which can be considered a real-time requirement. The most powerful FE solvers take between

10 minutes to an hour to solve the same. Therefore, Neuro-DynaStress is about 72 × 104 times

faster than conventional FE solvers. We consider the minimum time for all processes of modeling

geometry, meshing, and analysis of one sample in FE solver to be about 10 minutes. MATLAB

PDE solver does not use GPU acceleration. This demonstrates that our proposed approach can

achieve the real-time requirement during the validating phase.

4.6 Results and Discussions

4.6.1 Quantitative Evaluation

Our model is trained on the training dataset for 45 epochs and evaluated on the validation

dataset using separate metrics. The training dataset consisted of 48,755, while the validating

dataset contained 8,589 samples, together forming the 80%-20% split of the whole dataset. The

53

model predicts five frames of output from a sequence of five previous inputs until all 100 frames are

predicted. The best validation performance was obtained when we sequenced five frames during

validation. The best checkpoint during validation, at epoch 40, is the basis for all error metrics.

MRPE for the validating dataset is just 2.3%.

4.6.2 Qualitative Evaluation

The prediction results for a few randomly selected samples from the validation dataset are

visualized in Figs. 4.8a and 4.8b. The first row represents 5 frames out of 100 frames of one

reference sample. The second row illustrates the prediction corresponding to the frames in the first

row, and the last row represents the error in the corresponding predictions. The columns represent

the time steps 1, 25, 50, 75 and 100 seconds. We visualized frames at intervals of 25 seconds to

evaluate different ranges of dynamic stress prediction.

For visualization purposes, the references and predictions in Figs. 4.8a and 4.8b are scaled

to the same range using the maximum and the minimum of each sample. The errors are scaled

independently. As it can be seen in Fig. 4.8a, the predicted frames are quite similar to their

corresponding references. Although the geometry contains sharp corners and edges, which are

areas that are hard for CNN to reconstruct, our model can predict it. The errors, except for a small

part of the first frame, are in an acceptable range which shows the prediction accuracy of our model.

Fig. 4.8b shows another successful reconstruction. Comparing references with their corresponding

predicted frames demonstrates that our Neuro-DynaStress model can capture both load variations

and maximum stress values at the same time. Furthermore, these results demonstrate that our

model can predict a dynamic stress distribution with a high variation of distributed stress.

Fig.4.9 shows a random failure sample. Despite the model’s success in predicting most parts

of the frames, it is not able to reconstruct high-stress concentrations at angles of 90 degrees. Since

CNNs typically struggle in handling sharp edges, smoothening the sharp corners using Gaussian

filters during data preprocessing may help the network to train better. Furthermore, as the loads in

frames 𝑡 = 25 and 𝑡 = 75 are lower than in other frames, the prediction in those frames is acceptable.

It is also important that the predictions are temporally consistent.

In order to qualitatively

54

(a)

(b)

Figure 4.8 Successful predicted dynamic stress distribution and their corresponding errors in
different time sequences for two samples. The top row corresponds to reference frames and the
middle row shows the predictions. The bottom row shows the absolute error between corresponding
frames (Unit = MPa).

55

Referencet=1t=25t=50t=75t=100PredictError0200040006000800010000120000200040006000Referencet=1t=25t=50t=75t=100PredictError050001000015000200002500050001000015000Figure 4.9 Failed predicted dynamic stress distribution and their corresponding errors in different
time sequences (Unit = MPa).

demonstrate the temporal consistency of the proposed method, Fig. 4.10a shows a comparison of

stress values across 100 frames for successful predictions in a randomly selected element. As can

be seen, the references and the predicted distributions are almost identical in most time sequences,

with errors close to zero, despite the stress varying widely with time. Fig. 4.10a illustrates how

prediction fits with reference more closely when there is more temporal smoothness at peak points.

For instance, a good match between prediction and reference can be seen in the rightmost graph

in Fig. 4.10a, where the stress variation follows a smooth Gaussian distribution in the last peak.

However, in the remaining graphs, the prediction has good correlation with the reference despite

a lack of smoothness in most peak stress values. Moreover, based on the graphs in Fig. 4.10a, we

can conclude that the model is better at predicting stress in valleys compared to peaks.

We have also illustrated some of the unsuccessful predictions in Fig. 4.10b to identify the

limitations of our proposed model.

It can be seen that in all graphs with non-Gaussian stress

distributions, the model finds it difficult to capture the peak stress values accurately. However, in

the first two graphs from the left in Fig. 4.10b, the predictions perfectly fit later peaks of the reference

56

Referencet=1t=25t=50t=75t=100PredictError0100002000030000400005000060000500010000150002000025000(a)

(b)

Figure 4.10 Comparison of stress values across 100 frames for predictions, references, and errors
in a randomly selected element. (a) Successful predictions (b) Unsuccessful predictions (Units =
MPa-T).

since the stress values in the reference have Gaussian distributions at these points. Figs. 4.11a and

4.11b depict the MRPE of randomly selected samples across 100 frames and frames corresponding

to the minimum and maximum MRPE. As can be seen for both samples, the minimum errors are

around zero, with only a few frames exceeding the error by more than 2%.

4.6.3 Ablation Study

The efficiency of architecture can be attributed to several design choices we have made. Our

architecture models the temporal dependency between time frames and the relationship between

57

020406080100050001000015000200000204060801000250050007500100001250015000175000204060801000100020003000400050006000Time(s)Stress(MPa)ReferencePredictError02040608010002000400060008000020406080100050001000015000200000204060801000100020003000400050006000Time(s)Stress(MPa)ReferencePredictError(a)

(b)

Figure 4.11 Relative errors across 100 frames in the randomly selected sample. Graphs in the
center represent the MRPE per frame. (a) and (c) in each figure represent the reference; (b) and (d)
refer to their corresponding predictions. Arrows refer to the MRPE of the presented frame (Units
= MPa-T).

different elements in an input. Even though self-attention has shown state-of-the-art performance

in sequence modeling, they are not suitable for tasks without large amounts of data. Hence, we use

LSTMs for sequence modeling. To demonstrate our claim, we compare our architecture against

other baseline architectures. We compare against three architectures as shown in Table 4.3. The

model with multi head self-attention is very similar to our architecture, except the LSTM modules in

our model are replaced with self-attention modules. The details of the other models are represented

58

(a)(b)(c)(d)(b)(c)(d)(a)in Table 4.3. We will refer to our architecture as Neuro-DynaStress. The results are shown in

Table 4.3, and the best results are highlighted in bold.

Table 4.3 Architecture comparison

Architecture for modeling temporal information

Multi-headed self-attention

LSTM

LSTM

LSTM

FaPN
Skip connection

MRPE(%)

✓
✓

4.5

✓
✓

2.3

✓
×

6.6

×
×

9.7

59

CHAPTER 5

PHYSICS INFORMED NEURAL NETWORK FOR DYNAMIC STRESS

PREDICTION

5.1 Physics Informed Neural Network

Data collection and generation are common themes across various scientific fields, but data

assimilation is far more challenging. While many ML models have shown some initial success

and promise, the majority are unable to extract interpretable information and knowledge from this

huge amount of data. Moreover, purely data-driven models may fit observations very well, but

predictions may be physically inconsistent or implausible, owing to extrapolation or observational

biases that may lead to inappropriate generalization. Furthermore, purely data-driven models may

be accurate at fitting observations, but be physically incoherent or implausible in their predictions

due to extrapolation bias or observational errors, which may lead to inappropriate generalization.

Therefore, to integrate fundamental physical laws and domain knowledge, it is imperative that

machine learning models learn about the rules governing physical behavior. Physical laws can

provide a strong theoretical constraint and inductive bias, in addition to observational ones, that

can serve as ’informative priors’. Consequently, it is necessary to implement physics-informed

learning, where prior knowledge derived from observation, empirical, physical, or mathematical

understanding of the world can enhance a learning algorithm’s performance [57]. Presas et al [62]

proposed a neural network to estimate the magnitude of static and dynamic stresses based on

the measurements of stationary sensors in turbines. Raissi et al [63, 64] used Gaussian process

regression to construct representations of linear operator functionals. Their model can accurately

infer the solution and provide uncertainty estimates for different physical problems; this was then

extended in [65, 66]. Raissi et al. [67] proposed a physics-informed neural network that can solve

supervised learning tasks while respecting any given law of physics described by general nonlinear

partial differential equations. For solving nonlinear PDEs, such as Schrödinger, Burgers, and

60

Allen–Cahn equations, Raissi et al [68] introduced and illustrated the PINN approach. Vahab et

al [69] developed Physics-Informed Neural Networks based on Airy stress functions and Fourier

series to find optimal solutions to a few reference biharmonic problems of elasticity and elastic plate

theory. Yan et al [70] proposed an approach to solving linear elasticity problems in composite plates

and tubes using Physics Informed Neural Network. Chen et al [71] proposed a PINN for fatigue life

prediction with a sparse amount of experimental data combined with physical models describing

the fatigue behavior of materials. Bai et al [72] proposed an advanced PINN method based on the

modified loss function for computational 2D and 3D solid mechanics. Jeong et al [73] introduced

a Physics-Informed Neural Network-based Topology Optimization (PINNTO) framework which is

a combination of Topology Optimization and Physics-Informed PINNs. PINNTO uses an energy-

based PINN to replace FEA in the conventional structural topology optimization and numerically

determine the deformation states. Zhang et al [74] presented a PINN method for identifying

unknown geometric and material parameters. They parameterize the geometry of the material

using a mesh-free method and a differentiable and trainable technique that can identify multiple

structural features. Fallah et al [75] proposed a PINN model for bending and free vibration analysis

of three-dimensional functionally graded porous beams. Bazmara et al [76] built a PINN framework

using the Euler-Bernoulli beam theory and Hamilton principle to predict the nonlinear bending

of the beam system. Zaho et al [77] presented a PINN model for temperature field predicting

heat source layout. Xu et al [78] introduced a PINN model for predicting external loads of

diverse engineering structures based on limited displacement monitoring points. Zheng et al [79]

reconstructed the solution of the displacement field after damage to predict crack propagation using

PINN. Yao et al. [80] proposed a physics-guided learning algorithm for predicting the mechanical

response of materials and structures. Das et al. [81] proposed a data-driven physics-informed

method for prognosis and applied it to predict cracking in a mortar cube specimen. Wang et

al. [82] proposed a hybrid DL model that unifies representation learning and turbulence simulation

techniques using physics-informed learning. Goswami et al. [83] proposed a physics-informed

variational formulation of DeepONet for brittle fracture analysis. Raissi et al. [67] proposed a

61

physics-informed neural network that can solve supervised learning tasks while respecting any

given law of physics described by general nonlinear partial differential equations. Haghighat et

al. [84] presented physics-informed neural networks to inversion and surrogate modeling in solid

mechanics. Jin et al. [85] investigated the ability of PINNs to directly simulate incompressible

flows, ranging from laminar to turbulent flows to turbulent channel flows. Li et al. [86] used the

Fourier transform to develop a Fourier neural operator to model turbulent flows.

The recently introduced models [87, 88] were designed to predict static stress distributions using

deep neural network (DNN)-based methods in both intact and damaged structural components.

The primary limitations of the above data-driven models are the incapability to produce physically

consistent results and the lack of generalizability to out-of-distribution scenarios. The concept

of physics-informed learning was introduced recently [64, 65, 89] to address the computational

cost of FEA and lack of generalizability to out-of-distribution scenarios. There is special interest

in Physics-informed Neural Networks (PINNs), which incorporate partial differential equations

(PDEs) into the training loss function directly. However, their applications have primarily been

limited to non-engineering toy simulations. Working with engineering problems such as those in

structural engineering will require these models to learn several factors of variation in addition to

the physical equations themselves, such as geometry. To overcome these issues, we propose a novel

model for dynamic stress prediction which is real-time and generalizable and can therefore be used

for stress prediction in seismic and explosions design.

We augment PINN with a novel neural architecture for predicting dynamic stress distribution to

achieve fast dynamic analysis and address deficiencies of data-driven models. We model the stress

distribution in gusset plates under dynamic loading to demonstrate its utility. Gusset plates are one

of the most critical components in structural systems such as bridges and buildings. Since gusset

plates are designed for lateral loads such as earthquakes, wind, and explosions, real-time dynamic

models such as ours can help avoid catastrophic failures. In practice, the outputted stress maps

from our models can be used by downstream applications for detecting anomalies such as cracks

in the plates. In other words, it can act as a precursor to existing vision-based systems.

62

Figure 5.1 Overview: Unlike FEM, PINN-Stress is computationally efficient, facilitates real-time
analysis. PINN-Stress use a governing equation behind the equation of motion as a soft constraint
in the loss function to enforce the loss to minimize. The points with different colors in observations
correspond to the same nodes in the gusset plate. Gusset plate image is taken from [90].

An overview of our approach is shown in Fig. 5.1. To summarize our contributions, we introduce

NeuroStress and PINN-Stress, two novel deep learning models to learn dynamic stress distribution

for complex geometries, boundary conditions, and various load sequences. Loss function in PINN-

Stress uses traditional MAE loss for training. PINN-Stress uses the physics-informed loss function

described in Section 3.1. For real-life use, our models require input from sensors placed on the

plates. But since it is difficult to obtain such data for research purposes, we generate challenging

synthetic data emulating dynamic stress prediction. Through extensive experiments on simulated

data, we show that:

1. NeuroStress and PINN-Stress can predict dynamic stress distribution with complex geome-

tries, boundary conditions, and various load sequences faster than traditional FEA solvers.

Previous works only predict static stress distribution;

2. NeuroStress and PINN-Stress can learn the temporal information in the data to make accurate

predictions;

3. Introducing novel spatiotemporal multiplexing to physics-informed learning and showing its

utility in dynamic stress prediction;

63

Gussetplatesareoneofthemostcriticalcomponentsinstructuralsystemssuchastobridgesandbuildings.Itsfailuremayresultindisasters.Imageistakenfrom[1]ObservationsFEMSolverhighlatencyOurmodellowlatencyσxσyσxyReferencestressvaluesˆσxˆσyˆσxyPredictedstressvaluesphysics-basedlosssimulate4. NeuroStress and PINN-Stress can predict von Mises stress distribution using the von Mises

equation. von Mises stress distribution is a primary diagnostic tool to predict the failure of a

structure;

5. To the best of our knowledge, PINN-Stress is the first model that learns governing equations

behind that of motion in structures. We attribute the generalization abilities of our architecture

on unseen load sequences and geometries to its loss function.

5.2 Background

To ensure that any component of an object is in equilibrium, the balance of forces and moments

acting on that component should be enforced. Stress components acting on the face of the element

can be written as equations of equilibrium. The stress equilibrium equation can be written as a

variation in each stress term within the body since stress changes from point to point. Considering

a two-dimensional case in which stress acts in the horizontal and vertical directions gives the

following set of equations of motion:

𝜕𝜎𝑥𝑥
𝜕𝑥

𝜕𝜎𝑥𝑦
𝜕𝑦

+

+ 𝑏𝑥 − 𝜌𝑎𝑥 = 0

𝜕𝜎𝑦𝑦
𝜕𝑦

𝜕𝜎𝑥𝑦
𝜕𝑥

+

+ 𝑏𝑦 − 𝜌𝑎𝑦 = 0

(5.1)

(5.2)

where 𝜎𝑥𝑥, 𝜎𝑦𝑦 and 𝜎𝑥𝑦 denote normal stress in horizontal and vertical directions, and shear

stress respectively. 𝑏𝑥 and 𝑏𝑦 represent body force in horizontal and vertical directions. 𝑎𝑥 and 𝑎𝑦

represent an acceleration in the horizontal and vertical directions and 𝜌 denotes the density of the

material.

5.2.1 von Mises equation

von Mises stress is a way of measuring whether a structure has begun to yield at any point.

To compare experimentally observed yield points with calculated stresses, von Mises stress can be

used mathematically as a scalar quantity. We also predict von Mises stress since the engineering

64

community relies heavily on it. von Mises stress can be calculated from the predicted 𝜎𝑥𝑥, 𝜎𝑦𝑦, and

𝜎𝑥𝑦 through the von Mises stress equation.

√︃

𝜎𝑣𝑚 =

5.3 Method

𝜎2

𝑥𝑥 + 𝜎2

𝑦𝑦 − 𝜎𝑥𝑥𝜎𝑦𝑦 + 3𝜎2
𝑦𝑦

(5.3)

We introduce a novel architecture in this paper and augment it with a physics-based loss function

for gains in generalization.

5.3.1 Architecture

Firstly, we use a 2-layered MLP to encode the input to a larger dimensional space. Then

we introduce our spatiotemporal multiplexing (STM) module to encode the spatial and temporal

information alternatively. We treat both the temporal and the spatial dimensions as sequences,

which may be modeled using an appropriate deep neural architecture such as RNN, LSTM [34] or

self-attention [56]. LSTMs have demonstrated better performance than RNNs, but have performed

worse compared to self-attention. However, self-attention requires plenty of data, which cannot

be satisfied in our problem statement. Hence, as a middle ground, we use LSTMs to model both

temporal and spatial information.

Spatiotemporal multiplexing (STM): A single instance of our STM module consists of two

LSTM layers - one for temporal sequence modeling and another for spatial sequence modeling. The

input feature to an STM module is of shape 𝐵 × 𝑁 ×𝑇 × 𝑑 where 𝐵, 𝑁, 𝑇, 𝑑 are batch size, number of

spatial nodes, number of time frames, and feature dimension, respectively. We reshape this tensor

into 𝐵𝑁 × 𝑇 × 𝑑 and feed it as input to the first LSTM. Here, 𝑇 forms the index for sequence. The

output tensor from this LSTM is reshaped to 𝐵𝑇 × 𝑁 × 𝑑 before feeding it into the second LSTM for

spatial sequence modeling. We would like to point out that the idea of multiplexing is not novel in

deep learning literature [59, 91]. Our contribution is that we are the first to introduce multiplexing

in physics-informed learning and show its utility in dynamic prediction. Our whole architecture

consists of three STM modules, totaling six LSTM layers. The architecture is schematically shown

65

in Fig. 5.2

Figure 5.2 Model architecture: We introduce the novel spatiotemporal multiplexing (STM) to
physics-informed learning in order to learn both spatial and temporal information in the data. Our
architecture is lightweight and hence gives a real-time performance.

5.3.2 Physics Loss Function

In order to force our model to learn the physical constraints, we minimize the violation of the

physical equations shown in Eq. 5.1 and 5.2. We also minimize the boundary condition violation

to fully enforce the underlying PDE. Specifically, our loss function is a weighted sum of three loss

terms:

L = 𝑤dataLdata + 𝑤PDELPDE + 𝑤bcLbc

(5.4)

where Ldata measures the mean absolute error (MAE) between true and predicted labels. LPDE

measures the violations of the physical equations by calculating the mean absolute error between

the LHS and the RHS. Lbc corresponds to boundary condition constraints. 𝑤data, 𝑤PDE and 𝑤bc are

the weights used to balance the interplay between the three loss terms. Lbc consists of the initial

and boundary conditions at each time step as below:

𝜎(𝑥, 𝑦, 𝑡 = 0) = 0

𝜎(𝑥, 𝑦, (𝑡0...𝑡𝑛)) = 𝜎

(5.5)

(5.6)

Equations 5.5 and 5.6 should be satisfied for 𝜎𝑥𝑥, 𝜎𝑦𝑦 and 𝜎𝑥𝑦. 𝑥 and 𝑦 are coordinates of

meshes in each sample, and 𝑡 is the time at time steps.

66

MLPSpatiotemporal  MultiplexingObseved Values from  SensorsSpatialMLPNTimesTemporalPredicted Stress        ValuesLSTMLSTMx, yt5.3.3 Differentiable grid from mesh

Our physics-based loss function requires us to estimate the gradients of stress output along 𝑥

and 𝑦 directions. But since our output is in the form of a triangular mesh, gradient computation

is not easy. Instead, we propose to calculate gradients on a surrogate grid created using kernel

density estimation (KDE). Specifically, we calculate the stress value at a grid vertex by adding

contributions from every mesh node, weighted by a Gaussian filter centered at this vertex and

having a specific variance. By tuning the variance of this filter, we can achieve a robust, accurate

reconstruction of the mesh along with a mask showing extrapolated regions. The original mesh,

the grid reconstructed from it, and the corresponding mask are shown in Fig. 5.3. To compare

the accuracy of the surrogate grid, we compare it against the reconstruction obtained through

tricontourf function in Matplotlib package in Python. As can be observed in Fig. 5.3c, the grid

is accurate within the mesh region. Now, we can estimate the gradients for the stress outputs from

these grids.

(a)

(b)

(c)

(d)

Figure 5.3 Constructing grid values from mesh values (a) mesh nodes of a single output, the
color of each node represents the stress value at the corresponding node, (b) reconstruction from
Matplotlib tricontourf function, (c) our reconstruction on a 200 × 200 grid, (d) corresponding
mask showing interpolated regions.

5.4 Experiments and Results

5.4.1 Data Generation

Gusset plates connect beams and columns to braces in steel structures. The behavior and

analysis of these components are critical since various reports have observed failures of gusset

plates subject to lateral loads [41, 42, 43]. The boundary conditions and time-history load cases are

67

considered to simulate similar conditions in common gusset plate structures under external loading.

We create a dataset with 71,680 unique samples by combining 14 random time-history load

cases, 1024 different geometries, and 5 most commonly found boundary conditions in gusset plates.

Boundary conditions are shown in Fig. 5.4, mimicking the real gusset plates’ boundary conditions.

All the translation and rotational displacements were fixed at the boundary conditions. The range

for width and height of the plates is from 30 cm to 60 cm. Two-dimensional steel plate structures

with five edges, E1 to E5 denoting edges 1 to 5, as shown in Fig. 5.5, are considered to be made of

homogeneous and isotropic linear elastic materials. Various geometries are generated by changing

the position of each node in horizontal and vertical directions, as shown in Fig. 5.5, which leads to

1024 unique pentagons. The material properties remain unchanged and isotropic for all samples.

Figure 5.4 Different types of boundary conditions for initializing population.

Figure 5.5 Basic schematic topology for initializing the steel plate geometries.

Time histories consist of 100 time-steps generated with random sine and cosine frequencies.

The frequencies range between 1 and 3 Hz, with amplitudes ranging from 2 to 10 kN at intervals of

2 kN. All time histories in horizontal and vertical directions are shown in Fig. 5.6. Each time series

68

E1E3E4E5E2E1E3E4E5E2E1E3E4E5E2(a)(b)(c)(d)(e)E1E3E4E5E2E1E3E4E5E230 cm15 cm15 cm5 cm15 cm5 cmE1E3E4E5E215 cmTable 5.1 Dataset splits

Split Boundary condition Load position Load number Geometry number

train
train
train
val
test

E2
E2E3
E1E2
E3
E1E5

E4E5
E5
E4
E2E4
E2

1-8
1-8
1-8
9-12
12-14

1-614
1-614
1-614
615-819
820-1024

(a)

(b)

Figure 5.6 Various load sequences in (a) horizontal and (b) vertical directions.

last for 1 second with each time-step lasting 0.01 seconds. All the details of the input variables

used to initialize train-validation-test distribution of the population are shown in Table 5.1.

5.4.1.1

Input data

Input parameters include geometry, boundary condition, and body force in horizontal and

vertical directions, each encoded as vectors in a 3-dimensional matrix. The size of the input matrix

is 𝑁 × 𝑀 × 𝑇. where, 𝑁, 𝑀, and 𝑇 represent mesh nodes, input parameters, and time, respectively.

For example, if a sample contains 200 mesh nodes, the size of the input matrix is 200 × 5 × 100.

Fig. 5.7 shows how we construct the input matrix based on the geometry, boundary conditions,

and body forces. This figure presents a sample with five mesh nodes. However, all real samples

in the trained model have more than 100 mesh nodes. The first and the second columns of the

69

020406080100−15000−10000−5000050001000015000020406080100−15000−10000−5000050001000015000020406080100−15000−10000−5000050001000015000020406080100−15000−10000−5000050001000015000020406080100−15000−10000−5000050001000015000020406080100−15000−10000−5000050001000015000020406080100−15000−10000−5000050001000015000020406080100−15000−10000−5000050001000015000020406080100−15000−10000−5000050001000015000020406080100−15000−10000−5000050001000015000020406080100−15000−10000−5000050001000015000020406080100−15000−10000−5000050001000015000020406080100−15000−10000−5000050001000015000020406080100−15000−10000−5000050001000015000Time(s)Load(N)020406080100−10000−50000500010000020406080100−10000−50000500010000020406080100−10000−50000500010000020406080100−10000−50000500010000020406080100−10000−50000500010000020406080100−10000−50000500010000020406080100−10000−50000500010000020406080100−10000−50000500010000020406080100−10000−50000500010000020406080100−10000−50000500010000020406080100−10000−50000500010000020406080100−10000−50000500010000020406080100−10000−50000500010000020406080100−10000−50000500010000Time(s)Load(N)input matrix are 𝑥 and 𝑦 coordinates of the mesh nodes, respectively. The third column represents

the condition of boundary constraint at each node using a Boolean value. If there is a boundary

constraint at the corresponding node, then the value is one, otherwise is zero. The fourth and the

fifth columns represent body force sequences at each node along 𝑥 and 𝑦 directions. Details of

boundary conditions and their load positions are described in Table 5.1.

Figure 5.7 Construction of input matrix (Unit: m, N).

5.4.1.2 Output Data

To obtain the stress distributions for each sample, we perform FEA using the Partial Differential

Equation (PDE) solver in the MATLAB toolbox. Specifically, we use transient-planestress

function of MATLAB PDE solver to generate dynamic stress contours, which will act as the

ground truth for our model. We define geometry, boundary condition, material properties, and

time histories as input, and the PDE solver returns the sequence of stress distributions of 𝜎𝑥𝑥, 𝜎𝑦𝑦

and 𝜎𝑥𝑦 corresponding to the inputs. The size of each output is mesh nodes × load sequence. For

example, if a sample contains 200 mesh nodes, the size of the output matrix is 200 × 100. Each

of the three outputs is normalized separately between -1 and 1 to ensure faster convergence. The

input and the output representations of the model are shown in Fig. 5.8.

70

x(a)(b)ybcbfxbfyTimebfxN1N1N2N2N3N3N4N4N5N5bfybfybfxxy0.3-0.30.3-0.3Mesh nodes0.30.3013366.70.30.30000.30.30000.30.30000.30.3013366.70.30.30330160−0.30.3100−0.3−0.31000.3−0.31000.15−0.303301600.30.30533266−0.30.3100−0.3−0.31000.3−0.31000.15−0.30533266Figure 5.8 Input and output representation for normal and shear stress distribution prediction: (a)
Input matrix, (b) Output (𝜎𝑥𝑥), (c) Output (𝜎𝑦𝑦), (d) Output (𝜎𝑥𝑦).

5.4.2 Metrics

We use Mean Absolute Error (MAE), defined in Eq. 5.7 as the primary training loss and

metric. To ensure that we do not overfit to a single metric, we also use Mean Relative Percentage

Error (MRPE) to evaluate the overall quality of predicted stress distribution.

MAE =

1
𝑁𝑇

𝑛,𝑡
∑︁

𝑁,𝑇

(cid:12)𝑆(𝑛, 𝑡) − ˆ𝑆(𝑛, 𝑡)(cid:12)
(cid:12)
(cid:12)

MRPE =

MAE
max(|𝑆(𝑛, 𝑡)|, | ˆ𝑆(𝑛, 𝑡)|)

× 100

(5.7)

(5.8)

where 𝑆(𝑛, 𝑡) is the true stress value at a node 𝑛 at time step 𝑡, as computed by FEA, and ˆ𝑆(𝑛, 𝑡)

is the corresponding stress value predicted by our model, 𝑁 is the total number of mesh nodes in

each frame of a sample, and 𝑇 is a total number of time steps in each sample. As mentioned earlier,

we set 𝑇 = 100 in our experiments.

5.4.3

Implementation

We implemented our model using PyTorch [61] and PyTorch Lightning. AdamW optimizer [59]

was used with an initial learning rate of 10−3. We found that a batch size of 10 gives the best

results. The computational performance of the model was evaluated on an AMD EPYC 7313

16-core processor and one NVIDIA A6000 48GB GPU per experiment. The time required during

the training phase for a single batch with 100 frames and a batch size of 10 for NeuroStress and

PINN-Stress were 7 and 20 milliseconds respectively. The inference time of NeuroStress and

PINN-Stress for one sample was 1 millisecond which satisfies the real-time requirement. The most

71

(b)(d)(c)(a)TimeMesh nodes0.30.3013366.70.30.30000.30.30000.30.30000.30.3013366.70.30.30330160−0.30.3100−0.3−0.31000.3−0.31000.15−0.303301600.30.30533266−0.30.3100−0.3−0.31000.3−0.31000.15−0.30533266powerful FE solvers take between 10 minutes to an hour to solve the same. We use MATLAB PDE

solver as a FE solver to compare the efficiency of our model. We consider the minimum time for

all processes of modeling geometry, meshing, and analysis of one sample in the FE solver to be

about 10 minutes. MATLAB PDE solver does not use GPU acceleration. Therefore, NeuroStress

and PINN-Stress are about 6 × 105 faster than MATLAB PDE solver.

5.4.4 Results

We implement two main models, NeuroStress and PINN-Stress. Both models are trained on

the same train dataset for 300 epochs, evaluated on the validation dataset for fine-tuning, and

we report all metrics on the test dataset. The entire dataset contains 71,680 samples, while the

train dataset contains 43,008 samples, validation and test datasets each contain 14336, forming the

60%-20%-20% split of the whole dataset. Error metrics are calculated using the checkpoint with

the least validation error. Fig. 5.9 shows stress distribution prediction for 𝜎𝑥𝑥, 𝜎𝑦𝑦, 𝜎𝑥𝑦 and 𝜎𝑣𝑚

of a randomly selected frame in a sample. PINN-Stress predictions are almost identical to their

corresponding references, and the errors in a PINN-Stress prediction are substantially lower than

those in a NeuroStress prediction. Particularly, PINN-Stress can capture peak stress better than

NeuroStress, which is of primary importance in structural design. The importance of maximum

stress matters in the design phase since maximum stress should be less than yield strength to avoid

permanent deformation.

Table 5.2 Data split for generalization experiments

Quantity

Data split*
Val

Train

Test

MRPE (%)
NeuroStress PINN-Stress

Geometry
Load
BC

1-614
1-8
E2, E2E3, E1E2

615-819
9-11
E3

820-1024
12-14
E1E5

1.7
4.8
18.3

1.5
4.2
16

* The values in the data split column refer to indices of the corresponding generalization quantity.

72

Figure 5.9 Comparison of NeuroStress and PINN-Stress predictions for 𝜎𝑥𝑥, 𝜎𝑦𝑦, 𝜎𝑥𝑦 and 𝜎𝑣𝑚
(Unit: MPa).

5.5 Ablation Studies

5.5.1 Generalization

We investigate and compare the generalization capabilities of NeuroStress and PINN-Stress

models for varying distributions of boundary conditions, load sequences, and geometries. To that

end, we collect the entire dataset and split them into the train, validation, and test sets such that

validation and test sets contain unseen instances of the entity to check generalization on. For

example, for checking generalization on geometry, the training set will consist of 614 geometries

out of 1024, and validation and test sets will contain the remaining (205 each). We compare the

mean relative percent error (MRPE) of each method on von Mises stress prediction. As von Mises

stress identifies if a given material is likely to yield or fracture, we use its prediction error as the

sole criterion. Figs 5.10 and 5.11 demonstrate the generalization capability of PINN-Stress and

73

NeuroStress to unseen load sequences and geometries, respectively. As it can be seen, 𝜎𝑥𝑥, 𝜎𝑦𝑦,

𝜎𝑥𝑦 and 𝜎𝑣𝑚 predictions by PINN-Stress are significantly better than those by NeuroStress.

Figure 5.10 Predicting dynamic stress distribution for diverse load sequences: Augmenting
our novel architecture with a physics-based loss can induce generalization capabilities while still
remaining real-time (Unit: MPa). The overview of our method is given in Fig. 5.1.

Fig 5.12 shows the error of each frame for a random spatial node across all time frames for

unseen load sequences and structural geometries. As it can be seen, in both figures, the errors in

PINN-Stress are less than NeuroStress, especially in extreme peaks, which demonstrates the ability

of PINN-Stress to predict the maximum stress values. We have also compared the generalization

capability of PINN-Stress and NeuroStress over unseen load sequences and geometries in a single

74

Figure 5.11 Predicting dynamic stress distribution for diverse geometries (Unit: MPa).

spatial node across all time frames in Figs 5.13 and 5.14. Figs 5.13 and 5.14 demonstrate the ability

of our models to capture the temporal dependencies over time frames. It can be seen that both

models’ predictions are almost identical to references in all the time frames. However, in extreme

peaks PINN-Stress outperforms NeuroStress. Table 5.2 shows the data split for each experiment

and the corresponding results. The lowest error in each experiment is highlighted in bold.

75

In every experiment, we can observe that PINN-Stress generalizes better than NeuroStress.

However, neither method generalizes satisfyingly for various boundary conditions. Since we only

considered five different boundary conditions in total, we ran the same experiment for different

combinations of boundary conditions. The results were similar.

(a)

(b)

Figure 5.12 Comparison of NeuroStress and PINN-Stress errors for 𝜎𝑣𝑚 across 100 frames for a
random spatial node in a sample. (a) unseen load sequences and (b) unseen geometries.

Figure 5.13 Comparison of NeuroStress and PINN-Stress predictions for 𝜎𝑥𝑥, 𝜎𝑦𝑦, 𝜎𝑥𝑦 and 𝜎𝑣𝑚
across 100 frames for a sample with unseen load sequences.

5.5.2 Choice of architecture

The efficiency of architecture can be attributed to several design choices we have made. Our

architecture models the temporal dependency between time frames and the relationship between

different nodes in an input via our spatiotemporal multiplexing mechanism. As mentioned earlier,

76

020406080100Time(s)0246MRPEperframePINN-StressNeuroStress020406080100Time(s)05101520MRPEperframePINN-StressNeuroStress0204060801005000100001500020000σx020406080100−11000−10000−9000−8000−7000−6000−5000−4000σy020406080100−20000200040006000σxy0204060801001000015000200002500030000σvmTime(s)Stress(MPa)ReferenceNeuroStressPINN-StressFigure 5.14 Comparison of NeuroStress and PINN-Stress predictions for 𝜎𝑥𝑥, 𝜎𝑦𝑦, 𝜎𝑥𝑦 and 𝜎𝑣𝑚
across 100 frames for a sample with unseen geometries.

we are the first to introduce such a design into PINNs to the best of our knowledge. We train

our dataset using both LSTM and self-attention for different dataset sizes to ensure which of them

has a better performance based on our dataset. As illustrated in Fig 5.15, LSTM consistently

demonstrated better performance compared to self-attention, with lower error rates. This suggests

that LSTM is more suitable for our dataset compared to self-attention. Even though self-attention

has shown state-of-the-art performance in sequence modeling, they are not suitable for tasks without

large amounts of data. Hence, we use LSTMs for sequence modeling. To demonstrate our claim,

we also compare our architecture against other baseline architectures.

Figure 5.15 comparison of LSTM and self-attention for different training data size.

We compare against three architectures: Spatiotempo-Att, Tempo-LSTM, Spatio-MLP.

77

02040608010005000100001500020000σx020406080100−9000−8500−8000−7500−7000−6500−6000−5500σy020406080100−10000100020003000σxy02040608010050001000015000200002500030000σvmTime(s)Stress(MPa)ReferenceNeuroStressPINN-Stress20406080100DataSize(%)2.55.07.510.012.515.0PMAE(%)Spatiotempo-LSTMSpatiotempo-AttTable 5.3 Architecture comparison

Architecture

Spatiotempo-Att Tempo-LSTM Spatio-MLP Spatiotempo-LSTM

#Params (𝐾)
MRPE(%)

309
19.5

208
17.5

828
25.4

208
16.6

Spatiotempo-Att is very similar to our architecture, except the LSTM modules in our model are

replaced with self-attention modules. Tempo-LSTM is also similar to our architecture except for

the LSTMs act only along the temporal dimension. Spatio-MLP is a normal feedforward network

with six layers with LeakyReLU activation in between. It treats each time frame separately but

considers all the nodes simultaneously. We will refer to our architecture as Spatiotempo-LSTM.

To save time and resources, we train all the architectures on 10% of training data with MAE loss.

Similar to our experiments on generalization, we report the error on von Mises stress prediction.

The results are shown in Table 5.3, and the best results are highlighted in bold.

78

CHAPTER 6

SUMMARY AND CONCLUSION

This study presents a framework for stress distribution prediction in structural components utilizing

deep learning techniques. Our models can predict both static and dynamic stress distribution.

In the first project, we used end-to-end DL techniques. We developed a CNN to alleviate the

need for finite element methods for the prediction of high-resolution stress distributions in loaded

steel plates. The CNN was designed and trained to use the geometry, boundary conditions, and

load as input, providing high-resolution stress contours as the output. We used the PDE toolbox of

MATLAB to generate the output data for training, containing 104,448 FEM samples. We trained

and evaluated different models to find the model with the best performance. The best model can

predict the stress distributions with a mean absolute error of 0.9% and a maximum stress error of

0.46% in the von Mises stress distribution. The effects of dataset size on the model performance

were also studied. Training the network with just 10% of the dataset achieved a mean error of

1.85%, which can be considered acceptable in specific engineering applications. Moreover, we

evaluated the effect of dataset size on the Gaussian distribution of mean and maximum stress errors.

Increasing the data size decreased the standard deviation of mean error. The standard deviation of

maximum stress error also decreased with increase of the number of samples. Furthermore, the

Gaussian distributions of mean and maximum stress errors demonstrated that a greater quantity of

data induced less standard deviation in PMAE and PPAE.

In the second project we develop a convolutional neural network (CNN) augmented with the

custom loss function which is inspired from stress concentration physics equation to predict high-

resolution von Mises stress distribution in the specific domain of damaged steel plates. The proposed

network learns to predict the stress distribution given the damaged geometries, load, and boundary

conditions as input and high-resolution stress contours as the output. The dataset is composed

of 61,440 unique and complex cases of various geometries, boundary conditions, and loads. The

PDE toolbox of MATLAB was used to generate the output data for training as FEA samples. We

79

also build a CNN model using torch.nn.MSELoss function to see how much our proposed custom

loss function is efficient. The CNN network achieves high accuracy in both custom loss and MSE

models, under multiple metrics, in the evaluations of stress distribution datasets. The custom

loss model outperforms the MSE model in terms of peak stress value predictions, in addition to

accurately localizing damages and capturing stress concentration around crack tips, which is not

possible with other ML methods. The custom loss trained DL model which trained with 49152

FEA samples can be used for future predictions of stress distributions of damaged steel plates of

12888 FEA samples. The custom loss trained DL model has a mean absolute error of 0.22% and a

maximum stress error of 1.5% in the von Mises stress distribution.

In the third project We propose Neuro-DynaStress model equipped with Convolutional Neural

Network (CNN) and Long Short Term Memory (LSTM) to predict the entire sequence of dynamic

stress distribution. The model was designed and trained to use the geometry, boundary conditions,

and the sequence of loads as input and predicts the sequence of high-resolution dynamic stress

contours. The convolutional components are used to extract spatial features, and the LSTM captures

the temporal dependence between the frames. Feature alignment modules are used to improve the

training and performance of our model. The model is trained using synthetic data generated using

the PDE toolbox in MATLAB. Neuro-DynaStress can predict dynamic stress distribution with a

mean relative percentage error of 2.3%, which is considered an acceptable error rate in engineering

communities.

In the fourth Project We propose NeuroStress and PINN-Stress, two models for dynamic

stress prediction based on a novel architecture, with the latter augmented with a physics-informed

loss function. Our models explicitly learn both spatial and temporal information through our

spatiotemporal multiplexing (STM) module. Experiments on simulated gusset plates show that not

only are our models accurate but adding physics-informed loss function facilitates generalization

with respect to varying load sequences and structural geometries. PINN-Stress is also better at

estimating high-stress values which are of more importance to the structural engineering community.

However, collecting sufficient data points from real gusset plates using sensors can be expensive

80

and noisy. Therefore, our future efforts will be directed toward achieving lower sample complexity

under noisy conditions.

81

CHAPTER 7

FUTURE WORKS

     Develop a deep learning model for 3-D geometries which can predict stress distribution in more 

real  and  complex  samples.  Convolutional  neural  networks  (CNNs)  have  been  widely  used  for 

image-based  deep  learning  tasks,  and  they  can  also  be  used  for  3-D  geometries.  However,  3-D 

CNNs  can  be  computationally  expensive,  therefore,  other  architectures,  such  as  graph  neural 

networks  (GNNs),  or  using  matrix-based  approaches  like  the  way  we  embedded  our  input  and 

output  dataset  into  the matrix in the last chapter may be more suitable for 3-D geometries.  This 

can  be  extremely  useful  in  many  fields,  such  as  aerospace  engineering,  civil  engineering,  and 

biomechanics,  where  accurate predictions of stress distribution are critical for ensuring the safety 

and reliability of structures and systems.

Create a deep learning model to predict vibration modes and frequencies of structural compo-

nents.  Engineers need to do modal analysis before most dynamic analysis to calculate vibration 

modes  and  natural  frequencies  of  the  model.  This  helps  them  to  have  more  intuition  about  the 

global and local behavior of their model.  A deep learning model for predicting vibration modes and 

natural frequencies of a model can be more efficient than traditional me thods. Once the model is 

trained, it can provide predictions quickly and efficiently, without the need for lengthy simulations 

or computations.

Propose a model to predict temperature distribution, radial stress, and hoop stress for real-world 

scenarios such as disc brakes and pipes.  The heat is generated by the friction in disk brakes and 

then  dissipated  through  the  disc  and  surrounding  components.  Since  the  lifetime  of  disc  brakes 

is  important  for  automotive  companies’  prediction  of  temperature  distribution  can  increase  disc 

brakes’  lifetime.  This  can  be  integrated  with  a  digital  twin  model  which  will  provide  real-time 

monitoring of disk brake temperature.

Predicting  crack  propagation  using  sequential  algorithms  such  as  LSTM  and  transformers 

augmented with PINN. This would be better to perform with mesh-free methods which have been

82

applied to a variety of engineering problems involving complex geometries, large deformations,

or discontinuities. The mesh-free method since does not require mesh updating for modeling the

propagation of cracks could be an accurate and efficient approach for predicting crack propagation.

83

BIBLIOGRAPHY

[1] N. Umetani, “Exploring generative 3d shapes using autoencoder networks,” in SIGGRAPH

Asia 2017 Technical Briefs, pp. 1–4, 2017.

[2] Y. Yu, T. Hur, J. Jung, and I. G. Jang, “Deep learning for determining a near-optimal topological
design without any iteration,” Structural and Multidisciplinary Optimization, vol. 59, no. 3,
pp. 787–799, 2019.

[3] A. B. Farimani, J. Gomes, and V. S. Pande, “Deep learning the physics of transport phenom-

ena,” ArXiv Preprint ArXiv:1709.02432, 2017.

[4] B. Kim, V. C. Azevedo, N. Thuerey, T. Kim, M. Gross, and B. Solenthaler, “Deep fluids:
A generative network for parameterized fluid simulations,” in Computer Graphics Forum,
vol. 38, pp. 59–70, Wiley Online Library, 2019.

[5] G. B. Goh, N. O. Hodas, and A. Vishnu, “Deep learning for computational chemistry,” Journal

of Computational Chemistry, vol. 38, no. 16, pp. 1291–1307, 2017.

[6] A. Mardt, L. Pasquali, H. Wu, and F. Noé, “Vampnets for deep learning of molecular kinetics,”

Nature Communications, vol. 9, no. 1, pp. 1–11, 2018.

[7] A. Mohammadi Bayazidi, G.-G. Wang, H. Bolandi, A. H. Alavi, and A. H. Gandomi, “Multi-
gene genetic programming for estimation of elastic modulus of concrete,” Mathematical
Problems in Engineering, vol. 2014, 2014.

[8] M. Sarveghadi, A. H. Gandomi, H. Bolandi, and A. H. Alavi, “Development of prediction
models for shear strength of sfrcb using a machine learning approach,” Neural Computing
and Applications, vol. 31, no. 7, pp. 2085–2094, 2019.

[9] S. M. Mousavi, P. Aminian, A. H. Gandomi, A. H. Alavi, and H. Bolandi, “A new predictive
model for compressive strength of hpc using gene expression programming,” Advances in
Engineering Software, vol. 45, no. 1, pp. 105–114, 2012.

[10] H. Bolandi, W. Banzhaf, N. Lajnef, K. Barri, and A. H. Alavi, “An intelligent model for the
prediction of bond strength of frp bars in concrete: A soft computing approach,” Technologies,
vol. 7, no. 2, p. 42, 2019.

[11] M. J. Atalla and D. J. Inman, “On model updating using neural networks,” Mechanical Systems

and Signal Processing, vol. 12, no. 1, pp. 135–161, 1998.

[12] R. I. Levin and N. Lieven, “Dynamic finite element model updating using neural networks,”

Journal of Sound and Vibration, vol. 210, no. 5, pp. 593–607, 1998.

[13] Z. Fan, Y. Wu, J. Lu, and W. Li, “Automatic pavement crack detection based on structured
prediction with the convolutional neural network,” ArXiv Preprint ArXiv:1802.02208, 2018.

[14] C. V. Dung et al., “Autonomous concrete crack detection using deep fully convolutional neural

network,” Automation in Construction, vol. 99, pp. 52–58, 2019.

84

[15] A. Javadi, T. Tan, and M. Zhang, “Neural Network for Constitutive Modeling in Finite
Element Analysis,” Computer Assisted Mechanics and Engineering Sciences, vol. 10, no. 4,
pp. 523–530, 2003.

[16] A. Oishi and G. Yagawa, “Computational mechanics enhanced by deep learning,” Computer

Methods in Applied Mechanics and Engineering, vol. 327, pp. 327–351, 2017.

[17] A. Madani, A. Bakhaty, J. Kim, Y. Mubarak, and M. R. Mofrad, “Bridging finite element and
machine learning modeling: stress prediction of arterial walls in atherosclerosis,” Journal of
biomechanical engineering, vol. 141, no. 8, 2019.

[18] L. Liang, M. Liu, C. Martin, and W. Sun, “A deep learning approach to estimate stress
distribution: a fast and accurate surrogate of finite-element analysis,” Journal of The Royal
Society Interface, vol. 15, no. 138, p. 20170844, 2018.

[19] N. S. Gulgec, M. Takáč, and S. N. Pakzad, “Convolutional neural network approach for robust
structural damage detection and localization,” Journal of Computing in Civil Engineering,
vol. 33, no. 3, p. 04019005, 2019.

[20] C. Modarres, N. Astorga, E. L. Droguett, and V. Meruane, “Convolutional neural networks
for automated damage recognition and damage type identification,” Structural Control and
Health Monitoring, vol. 25, no. 10, p. e2230, 2018.

[21] Y.-J. Cha, W. Choi, and O. Büyüköztürk, “Deep learning-based crack damage detection
using convolutional neural networks,” Computer-Aided Civil and Infrastructure Engineering,
vol. 32, no. 5, pp. 361–378, 2017.

[22] D. T. Do, J. Lee, and H. Nguyen-Xuan, “Fast evaluation of crack growth path using time series

forecasting,” Engineering Fracture Mechanics, vol. 218, p. 106567, 2019.

[23] T. T. Truong, D. Dinh-Cong, J. Lee, and T. Nguyen-Thoi, “An effective deep feedforward neural
networks (dfnn) method for damage identification of truss structures using noisy incomplete
modal data,” Journal of Building Engineering, vol. 30, p. 101244, 2020.

[24] Q. X. Lieu, K. T. Nguyen, K. D. Dang, S. Lee, J. Kang, and J. Lee, “An adaptive surrogate
model to structural reliability analysis using deep neural network,” Expert Systems with
Applications, vol. 189, p. 116104, 2022.

[25] X. Zhuang, H. Guo, N. Alajlan, H. Zhu, and T. Rabczuk, “Deep autoencoder based energy
method for the bending, vibration, and buckling analysis of kirchhoff plates with transfer
learning,” European Journal of Mechanics-A/Solids, vol. 87, p. 104225, 2021.

[26] E. Samaniego, C. Anitescu, S. Goswami, V. M. Nguyen-Thanh, H. Guo, K. Hamdia,
X. Zhuang, and T. Rabczuk, “An energy approach to the solution of partial differential
equations in computational mechanics via machine learning: Concepts, implementation and
applications,” Computer Methods in Applied Mechanics and Engineering, vol. 362, p. 112790,
2020.

85

[27] J. Berg and K. Nyström, “A unified deep artificial neural network approach to partial differ-
ential equations in complex geometries,” Neurocomputing, vol. 317, pp. 28–41, 2018.

[28] V.-H. Truong, Q.-V. Vu, H.-T. Thai, and M.-H. Ha, “A robust method for safety evaluation
of steel trusses using gradient tree boosting algorithm,” Advances in Engineering Software,
vol. 147, p. 102825, 2020.

[29] A. Khadilkar, J. Wang, and R. Rai, “Deep learning–based stress prediction for bottom-up
sla 3d printing process,” The International Journal of Advanced Manufacturing Technology,
vol. 102, no. 5, pp. 2555–2569, 2019.

[30] Z. Nie, H. Jiang, and L. B. Kara, “Stress field prediction in cantilevered structures using con-
volutional neural networks,” Journal of Computing and Information Science in Engineering,
vol. 20, no. 1, p. 011002, 2020.

[31] H. Jiang, Z. Nie, R. Yeo, A. B. Farimani, and L. B. Kara, “Stressgan: A generative deep learn-
ing model for two-dimensional stress distribution prediction,” Journal of Applied Mechanics,
vol. 88, no. 5, 2021.

[32] D. Yinfeng, L. Yingmin, L. Ming, and X. Mingkui, “Nonlinear structural response prediction
based on support vector machines,” Journal of Sound and Vibration, vol. 311, no. 3-5,
pp. 886–897, 2008.

[33] R.-T. Wu and M. R. Jahanshahi, “Deep convolutional neural network for structural dynamic
response estimation and system identification,” Journal of Engineering Mechanics, vol. 145,
no. 1, p. 04018125, 2019.

[34] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9,

no. 8, pp. 1735–1780, 1997.

[35] R. Zhang, Z. Chen, S. Chen, J. Zheng, O. Büyüköztürk, and H. Sun, “Deep long short-
term memory networks for nonlinear structural seismic response prediction,” Computers &
Structures, vol. 220, pp. 55–68, 2019.

[36] X. Fang, H. Li, S.-r. Zhang, X.-h. Wang, C. Wang, and X.-c. Luo, “A combined finite element
and deep learning network for structural dynamic response estimation on concrete gravity
dam subjected to blast loads,” Defence Technology, 2022.

[37] C. P. Kohar, L. Greve, T. K. Eller, D. S. Connolly, and K. Inal, “A machine learning frame-
work for accelerating the design process using cae simulations: An application to finite
element analysis in structural crashworthiness,” Computer Methods in Applied Mechanics
and Engineering, vol. 385, p. 114008, 2021.

[38] M. Schwarzer, B. Rogan, Y. Ruan, Z. Song, D. Y. Lee, A. G. Percus, V. T. Chau, B. A. Moore,
E. Rougier, H. S. Viswanathan, et al., “Learning to fail: Predicting fracture evolution in
brittle material models using recurrent graph convolutional neural networks,” Computational
Materials Science, vol. 162, pp. 322–332, 2019.

86

[39] M. Lazzara, M. Chevalier, M. Colombo, J. G. Garcia, C. Lapeyre, and O. Teste, “Surrogate
modelling for an aircraft dynamic landing loads simulation using an lstm autoencoder-based
dimensionality reduction approach,” Aerospace Science and Technology, vol. 126, p. 107629,
2022.

[40] M. Jahanbakht, W. Xiang, and M. R. Azghadi, “Sediment prediction in the great barrier reef
using vision transformer with finite element analysis,” Neural Networks, vol. 152, pp. 311–321,
2022.

[41] S. M. Zahrai and M. Heidarzadeh, “Destructive effects of the 2003 bam earthquake on

structures,” Asian Journal of Civil Engineering (Building and Housing), 2007.

[42] S. M. Zahrai and H. Bolandi, “Towards lateral performance of cbf with unwanted eccentric
connection: A finite element modeling approach,” KSCE Journal of Civil Engineering, vol. 18,
no. 5, pp. 1421–1428, 2014.

[43] S. Zahrai and H. Bolandi, “Numerical study on the impact of out-of-plane eccentricity on
lateral behavior of concentrically braced frames,” International Journal of Steel Structures,
vol. 19, no. 2, pp. 341–350, 2019.

[44] H. Bolandi and S. Zahrai, “Influence of in-plane eccentricity in connection of brace members
to columns and beams on performance of steel frames,” Journal of Civil Engineering (Journal
of School of Engineering), 2013.

[45] J. Masci, U. Meier, D. Cireşan, and J. Schmidhuber, “Stacked convolutional auto-encoders for
hierarchical feature extraction,” in International Conference on Artificial Neural Networks,
pp. 52–59, Springer, 2011.

[46] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–
778, 2016.

[47] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE

Conference on Computer Vision and Pattern Recognition, pp. 7132–7141, 2018.

[48] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and
A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 1–9, 2015.

[49] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted
residuals and linear bottlenecks,” in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, pp. 4510–4520, 2018.

[50] Y. Liu, B. Stratman, and S. Mahadevan, “Fatigue crack initiation life prediction of railroad

wheels,” International Journal of Fatigue, vol. 28, no. 7, pp. 747–756, 2006.

[51] N. Dutta, “Geopressure prediction using seismic data: Current status and the road ahead,”

Geophysics, vol. 67, no. 6, pp. 2012–2041, 2002.

87

[52] I. Maqsood, M. R. Khan, and A. Abraham, “An ensemble of neural networks for weather

forecasting,” Neural Computing & Applications, vol. 13, no. 2, pp. 112–122, 2004.

[53] A. Karpatne, G. Atluri, J. H. Faghmous, M. Steinbach, A. Banerjee, A. Ganguly, S. Shekhar,
N. Samatova, and V. Kumar, “Theory-guided data science: A new paradigm for scientific
discovery from data,” IEEE Transactions on Knowledge and Data Engineering, vol. 29,
no. 10, pp. 2318–2331, 2017.

[54] E. Carrera, M. Cinefra, M. Petrolo, and E. Zappino, Finite Element Analysis of Structures

Through Unified Formulation. Wiley Online Library, 2014.

[55] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention generative adversarial
networks,” in International Conference on Machine Learning, pp. 7354–7363, PMLR, 2019.

[56] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and
I. Polosukhin, “Attention is all you need,” Advances in Neural Information Processing Systems,
vol. 30, 2017.

[57] G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang, “Physics-
informed machine learning,” Nature Reviews Physics, vol. 3, no. 6, pp. 422–440, 2021.

[58] W. D. Pilkey and W. D. Pilkey, Formulas for stress, strain, and structural matrices, vol. 107.

John Wiley & Sons New Jersey, 2005.

[59] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” ArXiv Preprint

ArXiv:1711.05101, 2017.

[60] S. Huang, Z. Lu, R. Cheng, and C. He, “Fapn: Feature-aligned pyramid network for dense
image prediction,” in Proceedings of the IEEE/CVF International Conference on Computer
Vision, pp. 864–873, 2021.

[61] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin,
N. Gimelshein, L. Antiga, et al., “Pytorch: An imperative style, high-performance deep
learning library,” Advances in Neural Information Processing Systems, vol. 32, 2019.

[62] A. Presas, D. Valentin, W. Zhao, M. Egusquiza, C. Valero, and E. Egusquiza, “On the use
of neural networks for dynamic stress prediction in francis turbines by means of stationary
sensors,” Renewable Energy, vol. 170, pp. 652–660, 2021.

[63] M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Inferring solutions of differential equations
using noisy multi-fidelity data,” Journal of Computational Physics, vol. 335, pp. 736–746,
2017.

[64] M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Machine learning of linear differential
equations using gaussian processes,” Journal of Computational Physics, vol. 348, pp. 683–
693, 2017.

[65] M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Numerical gaussian processes for time-
dependent and nonlinear partial differential equations,” SIAM Journal on Scientific Comput-
ing, vol. 40, no. 1, pp. A172–A198, 2018.

88

[66] M. Raissi, A. Yazdani, and G. E. Karniadakis, “Hidden fluid mechanics: Learning velocity
and pressure fields from flow visualizations,” Science, vol. 367, no. 6481, pp. 1026–1030,
2020.

[67] M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics informed deep learning (part i): Data-
driven solutions of nonlinear partial differential equations,” ArXiv Preprint ArXiv:1711.10561,
2017.

[68] M. Raissi, P. Perdikaris, and G. E. Karniadakis, “Physics-informed neural networks: A deep
learning framework for solving forward and inverse problems involving nonlinear partial
differential equations,” Journal of Computational physics, vol. 378, pp. 686–707, 2019.

[69] M. Vahab, E. Haghighat, M. Khaleghi, and N. Khalili, “A physics-informed neural network
approach to solution and identification of biharmonic equations of elasticity,” Journal of
Engineering Mechanics, vol. 148, no. 2, p. 04021154, 2022.

[70] C. Yan, R. Vescovini, and L. Dozio, “A framework based on physics-informed neural networks
and extreme learning for the analysis of composite structures,” Computers & Structures,
vol. 265, p. 106761, 2022.

[71] D. Chen, Y. Li, K. Liu, and Y. Li, “A physics-informed neural network approach to fatigue
life prediction using small quantity of samples,” International Journal of Fatigue, vol. 166,
p. 107270, 2023.

[72] J. Bai, T. Rabczuk, A. Gupta, L. Alzubaidi, and Y. Gu, “A physics-informed neural network
technique based on a modified loss function for computational 2d and 3d solid mechanics,”
Computational Mechanics, vol. 71, no. 3, pp. 543–562, 2023.

[73] H. Jeong, J. Bai, C. Batuwatta-Gamage, C. Rathnayaka, Y. Zhou, and Y. Gu, “A physics-
informed neural network-based topology optimization (pinnto) framework for structural opti-
mization,” Engineering Structures, vol. 278, p. 115484, 2023.

[74] E. Zhang, M. Dao, G. E. Karniadakis, and S. Suresh, “Analyses of internal structures and
defects in materials using physics-informed neural networks,” Science Advances, vol. 8, no. 7,
p. eabk0644, 2022.

[75] A. Fallah and M. M. Aghdam, “Physics-informed neural network for bending and free vibration
analysis of three-dimensional functionally graded porous beam resting on elastic foundation,”
Engineering with Computers, pp. 1–18, 2023.

[76] M. Bazmara, M. Silani, M. Mianroodi, et al., “Physics-informed neural networks for nonlinear
bending of 3d functionally graded beam,” in Structures, vol. 49, pp. 152–162, Elsevier, 2023.

[77] X. Zhao, Z. Gong, Y. Zhang, W. Yao, and X. Chen, “Physics-informed convolutional neu-
ral networks for temperature field prediction of heat source layout without labeled data,”
Engineering Applications of Artificial Intelligence, vol. 117, p. 105516, 2023.

89

[78] C. Xu, B. T. Cao, Y. Yuan, and G. Meschke, “Transfer learning based physics-informed
neural networks for solving inverse problems in engineering structures under different loading
scenarios,” Computer Methods in Applied Mechanics and Engineering, vol. 405, p. 115852,
2023.

[79] B. Zheng, T. Li, H. Qi, L. Gao, X. Liu, and L. Yuan, “Physics-informed machine learning
model for computational fracture of quasi-brittle materials without labelled data,” Interna-
tional Journal of Mechanical Sciences, vol. 223, p. 107282, 2022.

[80] H. Yao, Y. Gao, and Y. Liu, “Fea-net: A physics-guided data-driven model for efficient
mechanical response prediction,” Computer Methods in Applied Mechanics and Engineering,
vol. 363, p. 112892, 2020.

[81] S. Das, S. Dutta, C. Putcha, S. Majumdar, and D. Adak, “A data-driven physics-informed
method for prognosis of infrastructure systems: Theory and application to crack predic-
tion,” ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil
Engineering, vol. 6, no. 2, p. 04020013, 2020.

[82] R. Wang, K. Kashinath, M. Mustafa, A. Albert, and R. Yu, “Towards physics-informed deep
learning for turbulent flow prediction,” in Proceedings of the 26th ACM SIGKDD International
Conference on Knowledge Discovery & Data Mining, pp. 1457–1466, 2020.

[83] S. Goswami, M. Yin, Y. Yu, and G. E. Karniadakis, “A physics-informed variational deeponet
for predicting crack path in quasi-brittle materials,” Computer Methods in Applied Mechanics
and Engineering, vol. 391, p. 114587, 2022.

[84] E. Haghighat, M. Raissi, A. Moure, H. Gomez, and R. Juanes, “A physics-informed deep
learning framework for inversion and surrogate modeling in solid mechanics,” Computer
Methods in Applied Mechanics and Engineering, vol. 379, p. 113741, 2021.

[85] X. Jin, S. Cai, H. Li, and G. E. Karniadakis, “Nsfnets (navier-stokes flow nets): Physics-
informed neural networks for the incompressible navier-stokes equations,” Journal of Com-
putational Physics, vol. 426, p. 109951, 2021.

[86] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anand-
kumar, “Fourier neural operator for parametric partial differential equations,” ArXiv Preprint
ArXiv:2010.08895, 2020.

[87] H. Bolandi, X. Li, T. Salem, V. Boddeti, and N. Lajnef, “Bridging finite element and deep
learning: High-resolution stress distribution prediction in structural components,” Frontiers
of Structural and Civil Engineering, 2022.

[88] H. Bolandi, X. Li, T. Salem, V. N. Boddeti, and N. Lajnef, “Deep learning paradigm for
prediction of stress distribution in damaged structural components with stress concentrations,”
Advances in Engineering Software, vol. 173, p. 103240, 2022.

[89] M. Raissi, Z. Wang, M. S. Triantafyllou, and G. E. Karniadakis, “Deep learning of vortex-

induced vibrations,” Journal of Fluid Mechanics, vol. 861, pp. 119–137, 2019.

90

[90] A. Astaneh-Asl, “Gusset plates in steel bridges–design and evaluation,” Steel TIPS Report,
Structural Steel Educational Council Technical Information & Product Services. Moraga, CA,
2010.

[91] N. R. Ke, S. Chiappa, J. Wang, J. Bornschein, T. Weber, A. Goyal, M. Botvinic, M. Mozer,
and D. J. Rezende, “Learning to induce causal structure,” ArXiv Preprint ArXiv:2204.04875,
2022.

91