MACHINE LEARNED DATA AUGMENTATION TECHNIQUES FOR IMPROVING 
PATHOLOGY OBJECT DETECTION 

By 

Ethan Tu 

A DISSERTATION 

Submitted to 
Michigan State University 
in partial fulfillment of the requirements   
for the degree of 

Biomedical Engineering – Doctor of Philosophy 

2023 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ABSTRACT 

Artificial  intelligence  (AI)  has  evolved  immensely  in  recent  years,  with  AI  achieving 

human levels of performance on a wide variety of tasks. However, AI has had limited adoption in 

clinical  settings  despite  its  promising  prediction,  classification  and  pathology  detection 

applications. For a machine learned (ML) model to train effectively, the observed data must be a 

diverse, accurate representation of the true distribution. Therefore, to properly estimate the true 

distribution,  extremely  large  datasets  become  necessary.  In  healthcare  scenarios,  datasets  of 

sufficient size may be rare or absent, thus hindering the training of ML models. One of the ways 

to  mitigate this problem is  through data augmentation, where we supplement our datasets  with 

slightly  modified  copies  of  already  existing  data  or  newly  created  synthetic  data.  Recently, 

sophisticated data augmentation methods are based on a class of neural  networks (NNs) called 

Generative  Adversarial  Networks  (GANs),  which  can  generate  new  images  of  high  perceptual 

quality. This  dissertation  describes  the design  and development  of a new type of GAN, named 

near-pair patch cycleGAN (NPP-cycleGAN), which generates realistic pathology-present images. 

Here, we train and test this network using pediatric chest radiographs. We demonstrate that the 

proposed GAN can generate high quality fracture-present pediatric chest radiographs. With the 

addition of these synthetic images to an object detector’s training dataset, we are able to improve 

the fracture detection performance. These results suggest that our proposed method can be applied 

to  other  pathology  detection  tasks  and  could  potentially  enable  improved  object  detector 

performance in multiple clinical scenarios. 

 
 
 
 
 
Copyright by 
ETHAN TU 
2023 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ACKNOWLEDGEMENTS 

I would like to thank my friends, family, and peers for their endless support. Many thanks 

to my advisor, Dr. Adam Alessio, who without his guidance and wisdom this dissertation would 

not have been possible. Finally, thank you to all the Spartans who have made this community feel 

fun and welcoming. 

iv 

 
 
 
TABLE OF CONTENTS 

LIST OF ABBREVIATIONS ....................................................................................................... vii 

CHAPTER 1: PREVIOUS AND RELATED WORK ................................................................... 1 
1.1 Machine Learned Approach for Estimating Myocardial Blood Flow from Dynamic CT and 
Coronary Artery Disease Risk Factors ........................................................................................ 1 
1.1.1 Introduction ................................................................................................................ 2 
1.1.2 Materials and Methods ............................................................................................... 3 
1.1.3 Results and Discussion .............................................................................................. 4 

1.2 Diagnostic Accuracy of Combined Dynamic Myocardial Perfusion CT and Coronary CT 
Angiography Compared with PET .............................................................................................. 8 
1.2.1 Introduction ................................................................................................................ 9 
1.2.2 Materials and Methods ............................................................................................. 11 
1.2.3 Results ...................................................................................................................... 15 
1.3 Segmentation of Porous Implantable Polymeric Scaffolds for µCT Monitoring ............... 21 
1.3.1 Introduction .............................................................................................................. 21 
1.3.2 Materials and Methods ............................................................................................. 23 
1.3.3 Results and Discussion ............................................................................................ 26 

CHAPTER 2: INTRODUCTION ................................................................................................. 29 
2.1 Basics of Neural Networks .................................................................................................. 30 
2.1.1 Activation Functions ................................................................................................ 31 
2.1.2 Backpropagation and Loss Functions ...................................................................... 33 
2.2 Machine Learning and Medical Imaging Applications ....................................................... 35 
2.2.1 Convolutional Neural Networks .............................................................................. 36 
2.2.2 U-Net........................................................................................................................ 40 
2.2.3 Residual Blocks ....................................................................................................... 41 
2.2.4 Deep Learning for Rib Fracture Detection .............................................................. 42 
2.3 Data Augmentation Techniques .......................................................................................... 46 
2.3.1 Generative Adversarial Nets .................................................................................... 48 
2.3.2 GAN Variants .......................................................................................................... 50 
2.3.3 Image-to-Image Translation GANs ......................................................................... 52 
2.3.4 Common quantitative evaluation metrics ................................................................ 56 
2.4 Review of GANs applied to Medical Imaging Applications .............................................. 57 
2.4.1 Image Reconstruction .............................................................................................. 57 
2.4.2 Image Synthesis ....................................................................................................... 61 
2.4.3 Cross-Modality Translation ..................................................................................... 69 

CHAPTER 3: NEAR-PAIR PATCH GENERATIVE ADVERSARIAL NETWORK ............... 73 
3.1 Introduction ......................................................................................................................... 74 
3.2 Methods ............................................................................................................................... 75 
3.2.1 Near-Pair Patch cycleGAN ...................................................................................... 76 
3.2.2 Unpaired cycleGAN................................................................................................. 78 
3.2.3 Fréchet Inception Distance Near-Pair Patch cycleGAN .......................................... 79 
3.2.4 Blinded Observer Study ........................................................................................... 80 
3.2.5 Full Radiograph Generation ..................................................................................... 80 

v 

 
3.3 Results and Discussion ........................................................................................................ 83 

CHAPTER 4: DATA AUGMENTED NEURAL NETWORK PEDIATRIC RIB FRACTURE 
DETECTION ................................................................................................................................ 87 
4.1 Introduction ......................................................................................................................... 87 
4.2 Materials and Methods ........................................................................................................ 88 
4.3 Results and Discussion ........................................................................................................ 89 
4.4 Perspective on the Future of Image Generation .................................................................. 91 
4.5 Supplemental Information ................................................................................................... 94 

REFERENCES ............................................................................................................................. 95 

APPENDIX: DATA, CODE, AND SUPPLEMENTAL INFORMATION ............................... 119  

vi 

 
 
LIST OF ABBREVIATIONS 

AHA  - American Heart Association 

AUC – Area Under the Curve 

CAD – Coronary Artery Disease  

CNN – Convolutional Neural Network 

CT – Computed Tomography 

CTA or CCTA – Coronary Computed Tomography Angiography 

dCTP or DCE-CT – Dynamic Contrast-Enhanced Computed Tomography Perfusion 

DL – Deep Learning 

DNN – Deep Neural Network 

FFR – Fractional Flow Reserve 

FFR-CTPA – Fractional Flow Reserve from CT Perfusion and Anatomy 

FID – Fréchet Inception Distance 

GAN – Generative Adversarial Network 

IRB – Institutional Review Board 

IQR – Interquartile Range 

JI – Jaccard Similarity Index 

LAD – Left Anterior Descending Coronary Artery 

LCX – Left Circumflex Coronary Artery 

MBF – Myocardial Blood Flow  

MFR – Myocardial Flow Reserve 

ML – Machine Learning  

MRI – Magnetic Resonance Imaging 

vii 

 
MSE – Mean Squared Error 

NN – Neural Network 

NPP – Near Pair Patch 

PCL – Polycaprolactone  

PET – Positron Emission Tomography 

PLGA – Poly(lactic-co-glycolic acid)  

PSNR – Peak Signal to Noise Ratio 

TAC - Time Attenuation Curve 

RMSE - Root Mean Squared Error  

PCA – Principal Component Analysis  

RCA – Right Coronary Artery 

ROC – Receiver Operating Characteristics 

SSIM – Structural Similarity Index Measure 

TaOX – Tantalum Oxide 

viii 

 
CHAPTER 1: PREVIOUS AND RELATED WORK 

In this chapter I will describe three of my major research projects. I will review the key 

findings and methodologies of the relevant studies, highlighting their significance. For source code 

and data, please refer to Appendix A. These efforts were presented in the following outlets: Section 

2.1 [1], Section 2.2 (to be submitted to Journal of Cardiovascular Computed Tomography), Section 

2.3 [2].   

1.1 Machine Learned Approach for Estimating Myocardial Blood Flow from Dynamic CT 
and Coronary Artery Disease Risk Factors  

Heart disease is one of the leading causes of death in the US, with about 1 in 20 adults over 

the age of 20 diagnosed with some form of coronary artery disease. The estimation of myocardial 

blood flow (MBF) is crucial for diagnosing and risk stratifying myocardial ischemia. Currently, 

the gold standard for non-invasive, quantitative MBF measurements is to use positron emission 

tomography  (PET).  However,  we  seek  to  use  low  radiation  dosage  dynamic  contrast-enhanced 

computed tomography perfusion (dCTP) as an alternative approach due to its wide availability and 

lower cost. This work uses machine learning techniques to estimate MBF from a combination of 

dCTP derived time attenuation curves (TACs) and 9 risk factors for coronary artery disease (CAD). 

We compare our machine learned MBF estimates to PET derived estimates, and for a control, we 

used a 2-compartmental model that has been previously presented and verified with simulation 

studies. Four machine learning regression techniques were explored: 1) Binary regression tree, 2) 

Ensemble of Learners regression, 3) Support vector machine, and 4) Kernel regression. Our best 

performing model (ensemble of trees) had a root mean squared error (RMSE) of 0.47 ml/min/g. 

Comparatively, the compartmental model achieved an RMSE of 0.80 ml/min/g. In general, the 

inclusion of risk factors neither improved nor worsened estimates. Overall, our machine learning 

approach produces comparable MBF estimations to verified DCE-CT and PET estimates and can 

1 

 
 
provide rapid assessments for myocardial ischemia. 

1.1.1 Introduction  

The non-invasive quantitative assessment of myocardial perfusion is essential for grading 

coronary  artery  disease.  Quantitative  MBF  provides  valuable  prognostic  and  diagnostic 

information and can be measured through the use of positron emission tomography (PET) [3]–[5], 

magnetic  resonance  imaging  (MRI)  [6]–[9],  and  computed  tomography  (CT)  [7],  [10]–[12]. 

However, PET remains the gold standard for quantitative MBF measurements. Recently, cardiac 

perfusion related PET has been focused on reducing costs and improving patient outcomes through 

the development of new radiotracers [13], determining optimal thresholds to stratify CAD [14], 

[15],  or  diagnostic  accuracy  studies  [16]–[18].  Even  with  these  studies,  the  use  of  PET  for 

quantifying ischemia has remain limited due to the cost of PET perfusion tracers, methodologic 

complexity,  and  insurance  reimbursement  issues  [4].  MRI,  likewise,  shares  the  same  cost, 

availability.  and  complexity  drawbacks  [10].  CT,  on  the  other  hand,  is  low  cost,  rapid,  widely 

available,  and  produces  images  of  better  spatial  resolution  than  PET  [12].  However,  CT  based 

estimates are generated with compartmental modeling that requires the entire time course of the 

contrast agent and that do not implicitly model noise properties in the data (requiring relatively 

high radiation dose).  

This work compares four different machine learning algorithms to derive MBF from dCTP 

and patient risk factors. Using simple machine learning methods has many potential benefits: a) 

bypassing  computationally  expensive  compartmental  models,  b)  inherently  learning  noise 

properties  of  the  data,  and  c)  identifying  future  candidate  approaches  for  simplifying  CT 

acquisitions. Other studies have tried to use machine learning (ML) techniques for cardiology tasks 

due to its ability to handle large volumes of data and its ability to model hidden patterns within 

data [19]. ML has been used to predict the likelihood of revascularization in patients with CAD 

2 

 
[20],  assessment  of  coronary  stenosis  through  FFR  prediction  [21],  to  predict  major  adverse 

cardiac events [22], and for calcium scoring [23]. We are the first, to our knowledge, to directly 

estimate MBF using a ML model trained on a fusion of DCE-CT derived time attenuation curves 

and  patient  risk  factors.  Here,  we  compare  our  ML  derived  estimates  to  CT-derived  2-

compartmental model and quantitative PET estimates.  

1.1.2 Materials and Methods 

Twenty-nine  patients  underwent  clinical  rest  and  stress  regadenoson  rubidium-82  PET 

scans. QPET software  (Cedars-Sinai, Los Angeles, CA) was used to produce quantitative PET 

MBF estimations. Each patient then underwent a DCE-CT exam, using a Revolution CT scanner 

(GE Healthcare, Waukesha, WI) with a 16 cm z-axis coverage, within 30 days of the initial PET 

scan.  Using  a  custom,  previously  verified  2-compartmental  model  written  in  MATLAB  (ver 

2017b; MathWorks, Natick, MA) we estimated MBF from the CT images [24]. For each PET and 

CT scan, we segmented the heart into the recommended 17-region myocardial AHA model yield 

a total of 493-time attenuation curves and segments for flow estimation  [25].  

In  Figure  1,  representative  time  attenuation  curves  (TAC)  for  one  segment  are  shown. 

Quantitative MBF was expressed as mL/min/gram of myocardial tissue. We explored 4 ways of 

summarizing the TAC data for our ML models: (1) trained on only the myocardial output response 

TAC, (2) trained on both the arterial input and myocardial output TAC, (3) trained on semantic 

feature selection of our myocardial output TAC, and (4) trained on principal component analysis 

(PCA) of the myocardial output TAC. For our semantic feature selection, we choose predictors 

that we already know are important for flow estimation. These predictors include the rising slope, 

area under the curve normalized by time, and time to maximum concentration of our myocardial 

output TAC. For dimensionality reduction through PCA, we chose to keep the first 5 coefficients. 

For  each  variation  we  also  concatenated  9  different  patient  risk  factors  and  trained  a  separate 

3 

 
model. These patient risk factors include age, gender, BMI, hypertension, and presence of diabetes. 

We applied four machine learning regression techniques: 1) Binary regression tree, 2) Ensemble 

of  learners  regression,  3)  Support  vector  machine,  and  4)  Kernel  regression  to  the  four  sets  of 

summarized  data  in  MATLAB  (ver.  2020b).  For  all  methods  the  data  was  divided  into  70% 

training and 30% testing cases. Results report RMSE of the tests cases for each estimate using 

quantitative PET derived flow as our true values. 

Figure 1. Representative time attenuation curve of 1 region of a 17-segment model (Left). In red 
is the injected bolus and in blue is the myocardial response. Visualization of the semantic features 
for data summarization (Right). 

1.1.3 Results and Discussion 

On  average,  the  MBF  estimates  for  PET  were  1.76±1.05  mL/min/g  and  for  the 

conventional compartmental modeling estimates from DCE-CT were 1.58±0.84 mL/min/g. Table 

1  summarizes  the  model  performance  for  each  regression  model  trained  on  various  TAC  data 

summaries. As a measurement for performance, root mean squared error was calculated for the 

predicted flows in our test data set. On average, ensemble of learners had the lowest RMSE across 

all  types  of  predictors,  including  the  best  performing  model  (RMSE  =  0.47)  which  trained  on 

semantic  features  +  risk  factors.  For  comparison,  verified  dCTP  flow  estimations  using  a  two 

4 

Rising SlopeAUC Normalized by TimeTime to Max Concentration 
 
 
compartmental model yielded an RMSE of 0.80 ml/g/min. The addition of risk factors as predictors 

had mixed effects. Of the 16 different data summary + model combinations, 8 were improved by 

the addition of risk factors and 8 worsened. It is interesting to note that semantic features + risk 

factors gave both the best and worst performing models. Overall, any variation of semantic features 

estimates outperformed PCA estimates. When only using risk factors and not accounting for any 

TAC data, all four ML methods yield poor estimates. SVM and kernel  models that training on 

complete TAC curve data yielded lower RMSE than training on summarized data. The opposite is 

true for binary tree and ensemble models. In general, training on either the full myocardial TAC 

or with the addition of the input TAC performed better than summarized data. Additionally, the 

semantic features of rising slope, time to max, and area under the curve normalized by time were 

found to be a better data summary technique than PCA, having a generally lower RMSE across all 

models. 

Table 1. Root mean squared error (RMSE) of MBF estimates for each regression model and TAC 
data  summary  technique.  Units  in  ml/min/g.  Best  and  worst  performing  approach  shown  with 
highlighted cells (green for best, red for worst). For reference, our compartmental model has an 
RMSE of 0.80. 

Compartmental 
Model 

0.80 

Model 

Myo TAC 

Myo TAC + Risk 
Factors 

Input + Myo TAC 
Input + Myo TAC 
+ Risk Factors 

Semantic Features 

Semantic Features 
+ Risk Factors 

PCA 

PCA + Risk 
Factors 

Only Risk Factors 

Binary 
Regression Tree 

Ensemble 

Support Vector 
Machine 

Kernel 
Regression 

0.86 

0.89 

1.23 

1.19 

0.94 

1.00 

0.96 

1.00 

1.12 

0.72 

0.72 

0.71 

0.80 

0.54 

0.47 

0.66 

0.58 

0.77 

0.64 

0.79 

0.75 

0.71 

0.78 

1.59 

0.78 

0.63 

1.10 

0.67 

0.76 

0.72 

0.69 

1.48 

0.71 

0.85 

1.10 

0.81 

Across all models, the higher MBF values tended to be underestimated compared to the 

PET  derived  estimates.  In  Figure  2A-B,  a  Bland-Altman  plot  of  the  2-Compartmental  model 

5 

 
 
 
 
 
 
 
 
 
derived estimates vs PET estimates is shown. The 2-comp model performs poorly in flows above 

2 mL/min/g; overestimating between 2-3 mL/min/g and underestimating at < 3 mL/min/g. Figure 

2C-D presents correlation and Bland-Altman plots of the “Semantic Features” data technique + 

Ensemble Tree, comparing our best fitting model and the PET. The semantic feature technique 

reduced each TAC to 3 features, for a total of 6 predictors as inputs the regression tree (features 

from  both  input  function  and  myocardial  response  function).  Likewise,  Figure  2E-F  presents 

results from the “Semantic Features + Risk Factors” approach that included 3 TAC features and 9 

risk factors as inputs to the regression tree.   

With  multi-observation  correction,  the  “Semantic  Features”  estimates  have  a  mean 

estimate of 1.55±0.56 mL/min/g. We can see a general linear correlation (Fig. 2C) between PET 

and our ensemble regression  tree model flow  estimates, which indicates  a moderate agreement 

between both methods. The Bland-Altman plots also show where the slight overestimation of our 

model  occurs.  In  Figure  2D,  the  negative  bias  seems  to  increase  with  respect  to  higher  flows, 

indicating our model performs poorly, and underestimates more significantly in this area. Both the 

general linear correlation (Fig. 2E) and underestimation in high flow areas (Fig. 2F) are also seen 

with the “Semantic Feature + Risk Factors” estimates. The addition of risk factors as predictors 

increased our tree depth and conversely improved the regression estimates significantly. The linear 

correlation is much stronger, and the negative bias is much smaller than the “Semantic Features” 

only estimates. 

6 

 
 
Figure 2. Performance on test set.  Correlation and Bland-Altman plots of 2-Compartmental model 
(row 1), Ensemble of Learners Tree regression with Semantic features (row 2) and Ensemble of 
Learners Tree regression with Semantic features and risk factors (row 3). A) Basic correlation plot 
between 2-compartment model and PET flow estimates. Best fit line, r2, sum of squared error, and 
number of samples are reported. B) Bland-Altman plot. A mean difference of -0.08 indicates no 
inherent  biases.  C)  Basic correlation  plot between ensemble regression  tree estimates and PET 
flow  estimates  using  semantic  TAC  features.  D)  Bland-Altman  plot.  A  mean  difference  of  0.1 
indicates no inherent biases. E) Basic correlation plot of ensemble regression tree estimates using 
TAC features and nine risk factors. F) Bland-Altman plot. A mean difference of 0.05 indicates no 
inherent biases. 

7 

EFDCBA 
 
We assessed the performance of four different machine learning regression models using 

summarized TAC curve and coronary artery disease risk factors as predictors. In general, machine 

learning provided better MBF estimates compared to conventional compartmental modeling. The 

use of semantic features in an ensemble regression tree led to estimates with an RMSE of 0.54 

ml/min/g, compared to conventional compartmental modeling estimates with an RMSE of 0.80 

ml/min/g.  The addition of patient risk factors to the TAC data further improved machine learned 

estimates  to  0.47  ml/min/g.  Overall,  an  ensemble  regression  tree  model  trained  on  semantic 

features  such  as  rising  slope,  area  under  the  curve  normalized  by  time,  and  time  to  maximum 

concentration  had  the  lowest  RMSE.  The  large  variance  in  both  the  compartmental  model  and 

machine learned estimates can partially be attributed to significant noise in PET MBF estimates. 

We plan to continue to fine tune our model to allow for better identification of ischemic areas, as 

well as look into using reduced temporal sampling techniques to reduce radiation dosage. 

1.2 Diagnostic Accuracy of Combined Dynamic Myocardial Perfusion CT and Coronary 
CT Angiography Compared with PET 

Estimating myocardial blood flow (MBF) is valuable for diagnosing and risk stratifying 

myocardial  ischemia.  Positron  emission  tomography  (PET)  is  the  standard  for  non-invasive, 

quantitative MBF measurements. However, its high cost and limited availability have limited its 

use.  Dynamic  contrast-enhanced  computed  tomography  perfusion  (dCTP)  offers  a  widely 

accessible  approach  for  MBF  measurements  offering  the  potential  for  similar  diagnostic 

information as PET. In this work, we compare the ischemia detection performance of dCTP and 

cardiac  CT  angiography  (CTA)  to  cardiac  PET.  We  propose  a  new  metric  (FFR-CTPA)  for 

combining  dCTP  derived  myocardial  blood  flow  and  coronary  CTA  stenosis  information.  CT 

derived  myocardial  flow  reserve  (MFR)  and  stress  MBF  detected  regional  PET-confirmed 

ischemia  with  area  under  the  curve  (AUC)  of  0.84±0.04  and  0.85±0.04.  Combining  CTA 

8 

 
information with the MFR and stress MBF in the proposed FFR-CTPA improved the detection of 

ischemia (p<0.001), with AUC of 0.85±0.04 and 0.89±0.03 respectively. The combination of CTA 

anatomical  information  with  stress  MBF  yielded  the  highest  detection  performance.  This  work 

demonstrates that dCTP + CTA can generate better ischemia detection performance than stenosis 

information or CT-derived flow alone. 

1.2.1 Introduction 

Assessment  of  the  coronary  arteries  is  often  performed  with  invasive  coronary 

angiography. However, approximately 40-50% of all invasive angiographies do not find evidence 

of stenosis [26]. Consequently, in low to moderate stenosis risk cases, CT angiography (CTA) is 

preferred since it is noninvasive [27]. While it offers high sensitivity for CAD, the quality of a CT 

angiogram can be affected by the motion of heart, presence of arterial calcification, and presence 

of a coronary stent [28], [29]. Another limitation of CTA is noisy assessment of stenosis in distal 

coronary  arteries  [30].  Most  importantly,  it  lacks  functional  information  for  a  given  stenosis, 

leading to poor evaluation of ischemia [27]. Therefore, many efforts have sought to add functional 

information, such as myocardial blood flow (MBF) information, to non-invasive CTA exams [31].   

Cardiac  positron  emission  tomography  (PET)  is  considered  to  be  the  gold  standard  in 

quantitative MBF measurements [32]–[34]. Numerous efforts have advanced the use of different 

myocardial  perfusion  tracers,  such  as  82Rb  or  13N-ammonia  for  PET  [35],  [36].  Despite  the 

validation of these tracers and the clinical tools for generating estimates [37], the cost of the tracers 

and imaging technology of PET has limited the wide-spread use of cardiac PET perfusion imaging. 

Measuring MBF using dCTP offers a cheaper, faster, and more accessible alternative over PET 

[10],  [38].  Numerous  studies  have  validated  the  quantitative  accuracy  of  DCE-CT  for  MBF 

measurements and its use in grading ischemia [10], [24], [39]–[41]. 

9 

 
In  this  work,  we  are  among  the  first  to  a)  evaluate  CT  myocardial  perfusion  for  the 

detection of regional  ischemia with PET as  the reference test and b) use  a quantitative scoring 

metric for the combination of perfusion and anatomical information. Recent efforts have reported 

the  diagnostic  accuracy  of  CT  myocardial  perfusion  imaging  compared  to  different  reference 

standards.  Pontone  et  al.  presented  a  meta-analysis  of  77  studies  of  leading  non-invasive  tests, 

including  7  studies  of  stress  CT  perfusion  and  CTA,  for  the  detection  of  abnormal  invasive 

fractional flow reserve (FFR) [42].  Similarly, Lu et al. conducted a meta-analysis of just dynamic 

myocardial  perfusion  CT  compared  to  either  another  myocardial  perfusion  imaging  modality 

(SPECT/PET/MRI) or invasive FFR [43]. At the time, their search revealed thirteen prior studies 

for this purpose, although none of them used PET as a reference perfusion test. Recent work by 

Nous et al. reported dynamic CT perfusion compared to invasive FFR in a study of 132 patients 

from 9 centers [44].  Expert readers combined the CT perfusion and CTA information in a non-

quantitative, although rigorous, fashion. This leads to perfusion+anatomical information that is not 

reported on a continuous scale to allow for area under the receiver operating curve assessment and 

adjustment of sensitivity/specificity performance. 

In  this  work,  we  combine  the  functional  information  of  dCTP  MBF  measurements  and 

anatomical  information  from  CTA  to  grade  ischemia  and  compare  it  to  quantitative  PET.  We 

propose a new scoring method for combining the MBF and CTA information into a continuous 

variable  representative  of  flow  reduction  from  a  stenosis.  The  dCTP  MBF  measurements  were 

presented in our previous work that evaluated the quantitative (not diagnostic) accuracy of global 

MBF estimation of CT compared to PET [24]. This work uses regional MBF estimates and seeks 

to  determine  the  diagnostic  accuracy  of  CT  assessment  compared  to  PET  for  the  detection  of 

regional  ischemia.  Here,  we  determine  the  diagnostic  accuracy  of  the  dCTP  values  and  the 

10 

 
combined CT + CTA information. 

1.2.2 Materials and Methods 

Study Design 

Anonymized data, CT-MBF estimation tools, and MBF estimates that we generated in our 

previous  work  are  publicly  available  at 

the  Dataverse  and  can  be  accessed  at 

https://doi.org/10.7910/DVN/VUP5TC.  Details  on  the  study  protocol,  image  acquisition,  and 

patient demographics were previously presented [24]. Briefly, thirty-four patients received a rest 

and regadenoson stress rubidium-82 PET scan and then within 30 days a dCTP with CTA exam. 

All CT exams were performed on a Revolution CT scanner (GE Healthcare, Waukesha, WI) with 

a 16 cm z-axis coverage.  

Of  the  34  total  DCE-CT  scans,  5  were  excluded  due  to  injection  errors  or  mismatched 

hemodynamics. All  dCTP, PET, and CTA images  were aligned along  the short axis  view and 

segmented according to the standard American Heart Association 17-region model (Figure 3) [25]. 

The  myocardial  blood  flow  was  estimated  for  each  region  and  modality  to  provide  regional 

absolute  quantitative  MBF  estimates  in  units  of  mL/min/gram.  This  led  to  a  total  of  493-time 

attenuation curves and segments for comparison between PET and CT.  

11 

 
Figure 3. Example frame from a DCE-CT image (A) and example CT-derived MBF (B) Rest, (C) 
Stress, and (D) MFR estimates for single patient performed at the 17-segment level. 

PET MBF estimation  

PET  estimations  were  generated  using  the  QPET  software  (Cedars-Sinai,  Los  Angeles, 

CA).  Following  ischemic  definitions  proposed  by  Johnson  and  Gould,  these  regional  absolute 

MBF estimates were used to assign each region as either A) normal (stress flow > 1.12 mL/g/min 

or myocardial flow reserve (MFR) > 2.03) or B) at moderately to definitely ischemic (stress flow 

<  1.12  mL/g/min  and  a  myocardial  flow  reserve  (MFR)  >  2.03)  [45].  This  definition  for  each 

region  served  as  the  reference  test  for  evaluating  the  diagnostic  accuracy  of  the  CT-derived 

information. Their study suggests that the threshold for a binary definition of ischemia vs non-

ischemia is a stress flow less than 0.91 mL/g/min and a myocardial flow reserve (MFR) of less 

than 1.74.  

CT MBF estimation  

The  CT  MBF  estimates  were  generated  with  custom  processing  using  MATLAB  (ver 

12 

ABmL/g/minCDRestMFRStress 
 
2017b; MathWorks, Natick, MA) and JSim. The left ventricular myocardium was isolated using 

semiautomated edge detection with manual interaction to account for any interframe motion, when 

necessary. The median CT number within the myocardium was extracted from each frame over 

time  to  generate  the  myocardial  TAC.  The  median  CT  number  in  the  descending  aorta  was 

extracted for the input function TAC. We used a 2-compartment model that has been previously 

presented and verified with simulation studies [10], [24]. Driven by the input function TAC, the 

model  was  optimized  across  4  free  parameters  (MBF,  volume  of  interstitial  fluid,  baseline 

correction, and delay between input and myocardial TAC) to fit the myocardial TAC to generate 

MBF estimates in units of milliliters per minute per gram. Myocardial flow reserve (MFR) was 

calculated as the ratio between MBF at stress to MBF at rest.  

CCTA stenosis evaluation  

Assessment  of  anatomic  CTA  vessel  information  was  performed  through  joint 

interpretation  by  a  cardiology  fellow  and  cardiologist.  One-beat,  whole  heart  axial  scans  were 

acquired for all CTA exams with padding from 60-80% of the cardiac cycle, gantry rotation 280ms, 

tube  voltage  120  kVp,  and  an  effective  tube  current  of  approximately  500  mA.  Images  were 

reconstructed every 5% phase with 0.625 mm slice thickness and standard reconstruction kernel. 

The  CTA  interpretation  involved  the  visual  match  of  the  coronary  arteries  to  downstream 

myocardial segments. Each myocardial segment was assigned a percent stenosis ranging from 0 

for coronary trees with no apparent stenosis to 100% for an upstream branch with one or more 

fully  occluded  stenoses.  For  analysis  purposes,  any  unevaluated  CTA  segments  due  to  heavy 

artifacts were imputed by conservatively assuming the max percent stenosis of the patient. The 

cardiologists were blinded to the myocardial perfusion information during the CTA interpretation.  

13 

 
 
Combined dCTP and CCTA Diagnostic Score, FFR-CTPA 

dCTP  and CTA information  were combined to  yield a new diagnostic score, Fractional 

Flow Reserve from CT perfusion and anatomy (FFR-CTPA), and is calculated by summing their 

individual quantitative estimates in a weighted fashion: 

(1) 

Where  SFFR-CTPA  indicates  our  proposed  diagnostic  score,  which  combines  the  patient-

specific myocardial perfusion estimate, Sp, and percent stenosis, Ss. This score includes constants: 

τp, which is the perfusion threshold for ischemic vs. non-ischemic regions, and τs, which is the 

percent stenosis threshold for ischemic vs. non-ischemic regions. This score can be calculated for 

the three different measures of perfusion: resting state, stress state, or myocardial flow reserve.   

The new score incorporates three concepts: 1) the contribution of the perfusion and stenosis 

information is normalized (divided by) the threshold for ischemia detection for that information, 

2)  perfusion  information  is  slightly  more  predictive  of  ischemia  than  stenosis  information  and 

therefore  receives  more  weight;  3)  the  score  is  clipped  to  a  range  of  0-2  to  enable  easy 

interpretation. 

A lower SFFR-CTPA suggests a higher severity of ischemia; as MBF or MFR decreases or 

percent coronary stenosis increases SFFR-CTPA will decrease. The FFR-CTPA score was calculated 

for  rest,  stress,  and  MFR  perfusion  states,  requiring  different  threshold,  τp,  values  in  the 

calculation.  The  relative  weighting  of  each  contribution  was  adjusted  to  achieve  reasonable 

performance on the data and provide round numbers for ease of implementation. Specifically, the 

weighting was changed in intervals of 10% until the highest AUC was achieved. This led to the 

14 

 
 
 
dCTP information receiving 60% weight in the new score and CTA stenosis a 40% weight.  Values 

for each variable are given in Table 2. 

Table 2. Example frame from a DCE-CT image (A) and example CT-derived MBF (B) Rest, (C) 
Stress, and (D) MFR estimates for single patient performed at the 17-segment level. 

Comparative and Statistical Analysis 

We  used  an  unpaired  parametric  t-test  to  determine  group  differences  in  MBF  between 

ischemic  and  non-ischemic  regions.  ROC  analysis  was  performed  using  dCTP  diagnoses  as 

classifier predictions and PET diagnoses as true labels. Area under the ROC (AUC), accuracy at 

90% sensitivity, and specificity at 90% sensitivity were all reported. An unpaired, two sample, t-

test was used to determine the group differences between AUC, accuracy, and specificity of rest 

vs stress vs MFR. Similar analysis was performed using a 3-region model, where the original 17 

regions were regrouped into three regions based on the supply beds of the three main coronary 

arteries: left anterior descending, right coronary artery, and left circumflex. Bootstrapping methods 

were employed, with replacement, to generate error bar on all performance measures. A total of 

1000 resamples were used for each error bar. 

1.2.3 Results 

Figure  4  presents  boxplots  of  dCTP  derived  MBF  estimates  grouped  according  to  PET 

diagnosed non-ischemic vs ischemic. There is a significant separation between non-ischemic and 

15 

MFRStress MBFRest MBFSymbolParameterConstant Values1.740.91 mL/g/min0.50 mL/g/minτpThreshold for Ischemic vs Non-Ischemic, Perfusion70%70%70%τsThreshold for Ischemic vs Non-Ischemic, Percent StenosisSummary of segments2.30±1.602.04±0.89 mL/g/min0.96±0.36 mL/g/minSpMyocardial Perfusion Estimate46±36%SsPercent Stenosis0.52±0.391.05±0.510.87±0.41FFR-CTPACombined Diagnostic Score 
 
ischemic  dCTP  derived  MBF  values  for  resting,  stressed,  and  MFR  (top  row).  This  separation 

increases using our combined score, SFFR-CTPA, suggesting better stratification of ischemia (bottom 

row). The bottom left graph shows the values from CTA stenosis information alone.  

Figure  4.  Comparison  of  the  CT  derived  measures  grouped  according  to  PET  diagnosed  non-
ischemic vs ischemia regions. The first row summarizes measures from CT flow information (rest, 
stress,  MFR)  and  bottom  row  summarizes  measures  from  CT  anatomy  (first  column)  and 
combined FFR-CTPA using rest, stress, and MFR respectively. The percent difference between 
groups and p-value are presented on these box plots.   

In  Figure  5A  a  ROC  curve  was  constructed  for  the  prediction  of  ischemia  using  dCTP 

derived MFR, rest MBF and stress MBF measurements. Here, we see that stress MBF produces 

the high AUC (0.85), suggesting decent diagnostic accuracy. In Figure 5B, a similar ROC curve 

was  constructed  using  SFFR-CTPA  scores  to  predict  ischemia.  For  all  perfusion  estimates,  the 

combined score increased AUC. Particularly, SFFR-CTPA calculated using stress MBF is the best at 

diagnosing ischemic regions. For reference, a random predictor of ischemia would yield an AUC 

of 0.5.  

16 

 
 
 
A 

B 

Figure 5. Receiver Operating Characteristic (ROC) curves for CT-derived regional myocardial 
blood flow estimates (A) and for CT-derived flow estimates combined with stenosis information 
(B) for the diagnosis of ischemia. ROC was performed on 493 regions, 18 of which are labeled as 
ischemic via PET. 

Table 3 displays summary statistics for dCTP and SFFR-CTPA results. Rest MBF performed 

poorly as a detector, with an AUC of 0.64. The stress MBF threshold to achieve 90% sensitivity 

was 1.93 mL/g/min and is much higher than the 0.91 mL/g/min PET threshold, indicating general 

overestimation of CT MBF. Both MFR and stress MBF were better classifiers of ischemia, having 

an AUC of 0.84 and 0.85, respectively. Stress MBF achieved an accuracy at 90% sensitivity score 

of  0.54  and  a  specificity  at  90%  sensitivity  of  0.53  where  MFR  achieved  slightly  higher 

performance (0.59 and 0.58, respectively). Stenosis information alone performed relatively poorly 

with an AUC of 0.69.  Combining CTA information with the CT classifiers significantly improved 

the classification performance over the CT only classifier. The MFR, rest MBF, and stress MBF 

AUC  all  increased  to  0.85,  0.72,  and  0.89,  respectively.  The  accuracy  and  specificity  at  90% 

sensitivity also improved (Table 3) with the addition of CTA information for both rest and stress 

MBF. Interestingly, MFR did not benefit from the addition of CTA information as much as rest or 

stress  MBF,  despite  being  a  combination  of  the  two  metrics.  Using  bootstrapping  with 

replacement, there was sufficient evidence that all reported mean AUC, accuracy, and specificity 

estimates are different from each other (p<0.0001) except for the CT Stress MBF compared to 

17 

00.20.40.60.81False positive rate00.10.20.30.40.50.60.70.80.91True positive rateCT Stress AUC=0.85CT MFR AUC=0.84CT Rest AUC=0.6400.20.40.60.81False positive rate00.10.20.30.40.50.60.70.80.91True positive rateFFR-CTPA using CT Stress, AUC=0.89FFR-CTPA using CT MFR, AUC=0.85FFR-CTPA using CT Rest, AUC=0.73CCTA Stenosis Only, AUC=0.69 
 
 
 
FFR-CTPA MFR measures. 

Table  3.  Summary  statistics  of  detection  analysis  for  CT  derived  estimates  for  17-segment 
information. 
Method 

AUC* 

Accuracy at 
90% Sensitivity 

Specificity at 
90% Sensitivity 

Threshold to achieve 
90% Sensitivity 

CT Rest MBF 

0.64 ± 0.07 

0.24 ± 0.02 

0.22 ± 0.02 

1.19 ± 0.14 mL/g/min 

CT Stress MBF 

0.85 ± 0.05 

0.54 ± 0.02 

0.53 ± 0.02 

1.93 ± 0.29 mL/g/min 

CT MFR 

0.84 ± 0.04 

0.59 ± 0.02 

0.58 ± 0.02 

1.89 ± 0.19 r.u. 

CCTA Stenosis Only 

0.69 ± 0.06 

0.37 ± 0.02 

FFR-CTPA Rest 

0.72 ± 0.06 

0.26 ± 0.02 

0.35 ± 0.02 

0.24 ± 0.02 

30.00 ± 5.59 % 

1.13 ± 0.06 n.u. 

FFR-CTPA Stress 

0.89 ± 0.03 

0.66 ± 0.02 

0.65 ± 0.02 

0.85 ± 0.11 n.u. 

FFR-CTPA MFR 

0.85 ± 0.04 

0.58 ± 0.02 

0.57 ± 0.02 

0.42 ± 0.06 n.u. 

The final column of Table 3 indicates the threshold for that measure to operate at a high 

sensitivity. For example, we would classify anything below 1.93 mL/g/min as ischemic for dCTP 

derived stress MBF. Likewise, anything below 0.85 would classify as ischemic for our stress SFFR-

CTPA. With this threshold, we are expected to detect 90% of all disease and have a specificity of 

0.65.  

To  see  if  diagnostic  performance  is  a  function  of  each  coronary  artery  region,  we 

summarized the stress MBF and stress SFFR-CTPA accuracies for all 17 regions and grouped them 

together  according  to  the  coronary  arterial  distribution  proposed  by  the  American  Heart 

Association  (Figure  6).  Table  10  shows  relevant  metrics  including  AUC,  accuracy  at  90% 

sensitivity, specificity at 90% sensitivity. We see in Figure 6B that our stress SFFR-CTPA information 

produced  accuracies  at  90%  sensitivity  of  0.69,  0.66,  and  0.62  for  LAD,  RCA,  and  LCX, 

respectively. This is an improvement over stress MBF alone, who had accuracies of 0.59, 0.50, 

and 0.53, respectively. AUC is slightly higher in RCA and LCX regions compared to LAD. 

18 

 
 
 
 
Table 4. Summary of detection analysis for Stenosis only, Stress MBF, and FFR-CPTA with 
Stress by major coronary bed. 

Method 

Region 

CT Stenosis  

CT Stress MBF 

FFR-CTPA Stress 

LAD 

RCA 

LCX 

LAD 

RCA 

LCX 

LAD 

RCA 

LCX 

Prevalence of 
ischemia each 
segment 
8/203 

AUC 

Accuracy at 
90% Sensitivity 

0.58 ± 0.10 

0.51 ± 0.03 

Specificity at 
90% 
Sensitivity 
0.52 ± 0.04 

6/145 

4/145 

8/203 

6/145 

4/145 

8/203 

6/145 

4/145 

0.77 ± 0.08 

0.58 ± 0.04 

0.57 ± 0.04 

0.78 ± 0.09 

0.57 ± 0.04 

0.55 ± 0.04 

0.78 ± 0.07 

0.59 ± 0.03 

0.58 ± 0.04 

0.96 ± 0.03 

0.50 ± 0.04 

0.48 ± 0.04 

0.82 ± 0.11 

0.53 ± 0.04 

0.52 ± 0.04 

0.82 ± 0.06 

0.69 ± 0.03 

0.68 ± 0.03 

0.99 ± 0.01 

0.66 ± 0.04 

0.64 ± 0.04 

0.89 ± 0.05 

0.62 ± 0.04 

0.61 ± 0.04 

A 

B 

Figure 6. Polar maps of area under the ROC (AUC) for stress MBF (A) and FFR-CTPA (B) using 
stress MBF information at the 3-region level. Individual segments (1-17) were grouped together 
according to their common coronary artery region. In this display, upper left segments are supplied 
by  the  left  anterior  descending  artery,  right  by  the  left  circumflex,  and  lower  left  by  the  right 
coronary artery.  

This  study  demonstrated  that  MBF  estimates  derived  from  stress  dCTP  combined  with 

CTA information can detect regional myocardial ischemia as identified by PET with an AUC of 

0.89± 0.03. We demonstrate that by combining anatomical information about upstream stenoses 

with myocardial perfusion information will improve detection performance. The combined score 

specificity and accuracy at 90% sensitivity suggests that dCTP-derived measures of ischemia can 

19 

0.780.960.820.750.80.850.90.9510.820.990.890.750.80.850.90.951 
 
 
           
 
reliably detect ischemia as confirmed by cardiac PET.  The proposed new score, FFR-CTPA, offers 

the  best  performance  when  calculated  with  stress  MBF.  In  high-sensitivity  mode  (@  90% 

sensitivity), SFFR-CTPA using stress flow achieves a specificity of 0.65 ± 0.02, which is superior to 

specificity  estimates  of  0.29-0.61  presented  by  Meijboom  et  al.  who  evaluated  the  diagnostic 

accuracy of anatomical assessment with modern CCTA compared against a functional reference 

test, invasive FFR [28].  

We present preliminary evidence that the detection performance may vary slightly with 

different  coronary  beds.  The  LAD  supplied  myocardial  bed  had  lower  AUC,  accuracy,  and 

specificity  compared  to  the  RCA  and  LCX  regions  for  all  methods  evaluated  (Table  4).  For 

example, for FFR-CTPA using stress flow, the AUC of the LAD was 0.82 compared to 0.99 and 

0.89 of the RCA and LCX respectively. This discrepancy may be caused by the increased difficulty 

of taking LAD CTA measurements as well as higher levels of average motion in the area, both 

contributing to higher levels of noise. This also may highlight potential errors in our reference test, 

the PET estimates of flow; previous studies have demonstrated that patient motion and, to a lesser 

extent, attenuation correction mis-alignment can  lead to  large  regional  errors in  PET estimated 

flow [46].   

Study limitations 

This  proof-of-concept  study  of  a  new  metric  for  combining  perfusion  and  stenosis 

information included only 29 patients. This small sample size, along with low disease prevalence, 

suggests that our reported performance measures have high error bars. Additional research with a 

larger set of patients is needed. Among the 493 total segments analyzed, PET only identified 18 

as definite ischemic (3.7% prevalence), distributed across 7 patients. Our best performing stress 

SFFR-CTPA method only missed one ischemic region, but overcalled many regions leading to a high 

20 

 
sensitivity (94%) but low specificity and accuracy (64% and 66%, respectively). Additionally, our 

DCE-CT derived MBF were generally overestimated, partially attributing to the high false positive 

rate. While quantitative PET is the gold standard in MBF measurements and ischemia detection, 

it  remains  a  noisy  modality  which  greatly  affects  our  ground  truth  values.  Additionally,  we 

assumed  that  each  of  the  17-segments  were  independent,  but  there  is  likely  intra-patient 

correlations of these measures that were not accounted for in the detectability analysis.      

1.3 Segmentation of Porous Implantable Polymeric Scaffolds for µCT Monitoring 

To  assess  the  safety  and  efficacy  of  implantable  biomedical  devices,  longitudinal 

radiological monitoring is necessary for risk evaluation. However, polymeric devices are poorly 

visualized with clinical imaging, hampering efforts to use diagnostic imaging to predict failure and 

enable  intervention.  Combining  contrast  agents  with  these  biomedical  devices,  either  through 

coating  methods  or  direct  mixing  with  the  polymer,  offers  a  potential  solution  to  poor  image 

quality.  Direct  mixing  is  more  favorable  for  degradation  studies,  but  the  effect  of  the  contrast 

agents  may  alter  the  device’s  mechanical  properties.  Here,  we  describe  nanoparticle-doped 

biomedical devices (phantoms), created from 0–40 wt% tantalum oxide (TaOx) nanoparticles in 

polycaprolactone and poly(lactide-co-glycolide) 85:15 and 50:50, representing non, slow, and fast 

degrading systems, respectively. We run a degradation study of 20 weeks in length in multiple 

simulated  physiological  environments:  healthy  tissue  (pH  7.4),  inflammation  (pH  6.5),  and 

lysosomal conditions (pH 5.5), while mass and gross volume loss are monitored. We show that an 

optimal range of 5–20 wt% TaOx nanoparticles balances radiopacity requirements with implant 

properties, facilitating next-generation biomedical devices. 

1.3.1 Introduction 

Polymers  are  commonly  used  for  biomedical  devices  due  to  several  advantageous 

properties.  Namely,  they  offer  high  biocompatibility,  tunable  mechanical  properties,  and  are 

21 

 
generally easy to manufacture [47]. This has led to proliferation of implantable biomedical devices 

in  research  and clinical  scenarios.  Despite their  frequent use in the  clinic, implants made from 

polymers fail for a number of reasons such as wear, tearing, migration, and infection [48]. With a 

growing concern for the complications due to device failure, there exists an increased need for a 

clinical methodology for in situ monitoring of device status following implantation [48]. However, 

most polymeric devices offer no radiological contrast mechanism for clinical diagnostic imaging, 

and therefore radiologists cannot monitor the integrity of the device prior to catastrophic failure. 

Incorporating  contrast  agents  for  radiological  monitoring  of  biomedical  devices  would  be  a 

significant step in prevention of emergency device failures. 

The widespread use and availability of computed tomography (CT) makes it an excellent 

modality for device monitoring. While CT has difficulty distinguishing soft tissues compared to 

magnetic resonance imaging (MRI) and exposes patients to small amounts of radiation, it remains 

favorable due to its low cost and high signal-to-noise ratio [49], [50]. To tailor polymeric devices 

for CT monitoring, we must modify or incorporate polymers with contrast agents specific for CT 

[51].  In  other  words,  we  must  make  them  radiopaque  while  keeping  in  mind  the  possibility  of 

releasing cytotoxic elements during degradation [52], [53]. Tantalum oxide (TaOx) nanoparticles, 

in  particular,  are  biocompatible  in  vivo  with  superior  CT  contrast  over  traditional  iodinated 

compounds, and can further be incorporated into polymeric matrices for use as biomaterials [54]–

[57].  In  previous  studies,  it  was  shown  that  TaOx  integrated  polymer  phantoms  were  easier  to 

identify than phantoms without TaOx [50]. However, more research needs to be done to evaluate 

the impact of TaOx on the mechanical properties of the polymer. Namely, the nanoparticle should 

not affect the mechanical stability or material properties of the device while making it radiopaque.  

22 

 
 
 
1.3.2 Materials and Methods 

The study utilized three types of biocompatible polymers: PCL, PLGA 50:50, and PLGA 

85:15.  PCL  (Sigma  Aldrich)  had  a  molecular  weight  average  of  80 kDa.  PLGA  50:50 

(Lactel/Evonik  B6010-4)  and  PLGA  85:15  (Expansorb  DLG  85–7E,  Merck)  were  both  ester 

terminated and had a weight average molecular weight between 80 and 90 kDa, to minimize the 

effects of polymer chain length on the degradation rate [58]. The polymers were solubilized in 

suspensions of TaOx (spherical, 3–9 nm in diameter) in dichloromethane (DCM, Sigma-Aldrich). 

A degradation study was conducted in vitro for 20 weeks and in vivo for 5 weeks. 

Phantom Manufacture 

The detailed protocol for polymer preparation and phantom manufacture can be found here: 

https://doi.org/10.1002/adhm.202203167.  Briefly,  PCL  or  PLGA  were  solubilized  in  TaOx 

nanoparticles in DCM at 8 and 12 wt%, respectively. Proportions were calculated so that the final 

scaffold will be tunable 0-40 wt% TaOx. Sucrose (Meijer) was added to the suspension, calculated 

to be 70 vol% of the polymer + nanoparticle mass in solution, followed by NaCl (Jade Scientific) 

at 60 vol% of the total polymer + nanoparticle volume. The suspension was vortexed for 10 min 

and  pressed  into  a  silicon  mold  that  was  4.7 mm  diameter,  and  2 mm  high.  After  air  drying, 

phantoms were removed, trimmed of excess polymer, and then washed for 2 h in distilled water, 

changing the water every 30 min to remove sucrose and NaCl. Washed phantoms were air-dried 

overnight and stored in a desiccator prior to use. This process yielded micro-porous (<100 µm) 

scaffolds  that  mimics  tissue  properties,  allowing  for  nutrient  diffusion  and  cell  and  tissue 

infiltration.  

Micro-Computed Tomography 

All tomography images were obtained using a Perkin-Elmer Quantum GX. At every time 

point, groups of three phantoms were imaged at 90 keV, 88 µA, with a 25 mm field of view at a 

23 

 
50 µm  resolution.  After  the  acquisition,  individual  phantoms  were  sub-reconstructed  using  the 

Quantum GX software to 12–18 µm resolution. Phantoms used for serial monitoring were imaged 

on day 0 prior to hydration for pore size analysis (Supporting Information) and imaged again 24 h 

after hydration with buffer. Throughout the remainder of the study, all groups were imaged every 

week after changing the buffer media. 

In vivo µCT on mice was performed at 90 keV, 88 µA. At each time point, two scans were 

taken of the phantoms, 1) 72 mm field of view (14 min total scan time) at 90 µm resolution and 2) 

36 mm  field  of  view  (4  min  total  scan  time)  at  20–50 µm  resolution.  In  the  subcutaneous 

implantation, both phantoms could not be captured in a single higher-resolution scan, so two scans 

were taken, one centered on each implant. During acquisition, mice were anesthetized using an 

inhalant anesthetic of 1–3% Isoflurane in 1 L min−1 oxygen. Mice were scanned immediately post-

implantation,  on  day  1  post-implantation,  and  at  day  7  and  week  5  post-implantation.  Total 

cumulative radiation dosage was 14–19 Gy over 5 weeks.  

Tomography Image Analysis 

From the tomography scans of phantoms, several parameters were quantified. Analysis of 

the  polymer  matrix  component  of  phantoms  with  20  and  40  wt%  TaOx was  performed  using 

custom  software  developed  with  MATLAB  (vR2021b,  Mathworks,  Natick,  MA)  on  µCT  sub-

reconstructions. Properties such as scaffold thickness, diameter, porosity, average pore diameter, 

average pore volume, and mean attenuation were analyzed. From this, the percent porosity of the 

phantoms was calculated as the percentage of the gross volume not occupied by the matrix. From 

the  diameter  and  thickness,  a  “gross  volume”  was  defined  as  the  volume  occupied  by  a  solid 

cylinder with the corresponding thickness and diameter. Subsequently, “scaffold volume” is the 

24 

 
gross  volume  subtracted  by  the  total  pore  volume.  Phantoms  with  5  wt%  TaOx could  not  be 

radiographically distinguished from the background.  

Before segmenting the polymer from the background, the image was preprocessed by using 

an  adaptive  histogram  equalization  technique  to  enhance  contrast  [59]. After,  Otsu's  binary 

segmentation  method  was  used  to  create  a  rough  mask  of  the  volume  [60]. An  adaptive 

thresholding method was then employed to segment the polymer within the rough mask from the 

background [61]. The resulting volume was cleaned up using erosion and dilation operations.  

Adaptive  histogram  equalization  works  by  applying  the  normal  histogram  equalization 

algorithm on local, non-overlapping regions of an image. Histogram equalization can enhance the 

contrast of images by redistributing the pixel intensities more evenly. Let us first assume an 𝑖 𝑥 𝑗 

image f of pixels ranging from 0 to 𝐿 –  1. Let us denote p as the normalized histogram of f with a 

bin for each pixel value (often 28 or 216). So, 

𝑝𝑛 =

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑖𝑥𝑒𝑙𝑠 𝑤𝑖𝑡ℎ 𝑖𝑛𝑡𝑒𝑛𝑠𝑖𝑡𝑦 𝑛
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑖𝑥𝑒𝑙𝑠

      𝑤ℎ𝑒𝑟𝑒 𝑛 = 0 … 𝐿 − 1 

Then, the histogram equalized image g can be defined by 

𝑓𝑖,𝑗

𝑔𝑖,𝑗 = 𝑓𝑙𝑜𝑜𝑟((𝐿 − 1) ∑ 𝑝𝑛

𝑛=0

(2) 

(3) 

Meaning for every pixel bin the cumulative pixel intensity are multiplied by the pixel value, 

then rounded down. This intuitively makes sense; when we normalize g, bins with few pixels will 

be weighted higher and bins with many pixels will be weighted lower. To apply this in adaptive 

fashion, we first split the image into non-overlapping regions and simply apply the algorithm to 

each region individually before reconstructing the image. 

25 

 
 
 
Similarly, adaptive thresholding works by applying a thresholding algorithm (often Otsu) 

on a local level. Otsu binary thresholding works by minimizing the intra-class variance, defined as 

a weighted sum of variances of the two classes, foreground and background. The algorithm goes 

through  every  possible  threshold  value  t  (0-255  for  an  8-bit  image),  and  calculates  the  class 

probability using: 

𝜇𝑏𝑎𝑐𝑘𝑔𝑟𝑜𝑢𝑛𝑑 = 𝑛

𝜇𝑓𝑜𝑟𝑒𝑔𝑟𝑜𝑢𝑛𝑑 = 𝑛

∑
∑

𝑡−1
𝑛=0
𝑡−1
𝑖=0

𝑝𝑛
𝑝𝑛

∑
∑

𝐿−1
𝑖=𝑡
𝐿−1
𝑖=𝑡

𝑝𝑛
𝑝𝑛

(4) 

(5) 

Intuitively,  the  threshold  t  that  maximizes  the  difference  of  mean  pixel  intensity  of  the 

foreground  and  background  is  chosen.  To  make  this  algorithm  adaptive,  we  apply  it  on  local 

regions just like the adaptive histogram method. 

1.3.3 Results and Discussion 

Shown in Figure 7 is the result of the segmentation algorithm. Top row are y and z slices 

of the µCT volume and the bottom row is the scaffold mask. Results of analyzing the masks are as 

follows: All phantoms had an interconnected porosity, with a mean pore size between 350 and 400 

µm.  Pore  walls  show  generally  even  dispersion  of  the  TaOx  (Figure  7,  top  row),  and  the 

homogeneous  dispersion  ensured  that  no  regions  of  the  polymer  matrices  had  significantly 

different  material  properties  or  X-ray  attenuation.  As  expected,  the  mean  attenuation  of  the 

scaffold is dependent on the TaOx wt%. We use this mask to compute the gross volume of the 

scaffold over time.  

26 

 
 
 
 
 
Figure  7.  Example  of  generated  scaffold  masks  from  uCT  images.  An  increase  in  TaOx 
incorporation increased mean attenuation of the phantom. Scaffolds of TaOx 10-40 wt% for PLGA 
and PCL could be segmented easily. The attenuation 5 wt% was too low to be differentiated from 
the background. 

Plotting  the  gross  volume  alone  shows  a  very  clear  trend  in  phantom  volume  changes 

(Figure 8e1-f1). We also see from the mean attenuation that radiopacity lasts for at least 20 weeks 

(Figure 8e3-f3). Due to mechanical properties of PLGA, it degrades rapidly compared to PCL, and 

even more so in acidic environments [58], [62], [63].  

Designing implantable biomedical  devices to  be  radiopaque is  an important  property to 

consider. The radiopacity allows physicians and researchers to evaluate and monitor in real time 

the structural integrity of implantable devices and therefore predict device failure. Here, a novel 

radiopaque TaOx nanoparticles combined PCL or PLGA scaffold is proposed. We demonstrate 

that  the  addition  of  TaOx  nanoparticles  enables  in  situ  monitoring  of  gross  phantom  features 

(overall  volume,  location)  using  µCT.  Importantly,  within  the  range  of  5–20  wt%  TaOx,  the 

27 

 
 
 
radiopacity of phantoms was maintained over 20 weeks. Monitoring size and attenuation properties 

enabled in vivo assessment of the environmental impact on the scaffold. We show that lower pH 

environments  and  high  nanoparticle  content  (>20  wt%  TaOx)  increased  degradation  rate  and 

decreased  structural  integrity and mechanical  stability.  This  study  represents a significant  step 

toward incorporating in situ monitoring into the next generation of implantable devices. 

Figure 8. Phantoms with TaOx nanoparticles could be monitored for 20 weeks without loss of 
radiopacity due to particle leaching. a–c) This allowed for visual monitoring of changes to phantom 
shape and porosity, as illustrated by CT images from 1) day 1 and 2) 6 weeks: a) PCL+20 wt% 
TaOx, b) PLGA 85:15 + 20 wt% TaOx, and c) PLGA 50:50 + 20 wt% TaOx. During degradation, 
significant  changes  occurred  within  the  phantoms:  1)  gross  phantom  volume,  2)  percentage 
porosity, and 3) X-ray attenuation. TaOx incorporation ranged from d) 5 wt% TaOx, e) 20 wt% 
TaOx, and f) 40 wt% TaOx. At 5 wt% TaOx, only the gross volume of the phantoms could be 
quantified, as the matrix could not be segmented from the background. Scale (a–c): 1 mm; HU 
window is consistent for all images. Data reported as mean ± SEM. Figure courtesy of [2] under 
CC BY-NC-ND 4.0. 

28 

 
 
 
CHAPTER 2: INTRODUCTION 

The  concept  of  machine  learning  and  artificial  intelligence  may  sound  like  a  recent 

development,  but  it  has  its  roots  in  the  early  days  of  computing.  In  1943,  neurophysiologists 

Warren McCulloch and Walter Pitts first described how neurons communicate with each other, 

laying down the foundation of neural network architecture [64]. In 1949, Donald Hebb developed 

a model based on the idea of neural plasticity, where connections between neurons can change 

depending on the feedback it receives [65]. Rosenblatt is credited with building the first perceptron 

in 1958, a machine designed for image recognition and capable of distinguishing basic patterns 

[66]. After decades of research, we now have machine learned models nearing human levels of 

performance  in  certain  tasks.  In  the  medical  field,  AI  has  already  demonstrated  capability  in 

diagnosis,  pathology  detection,  and  risk  assessment,  and  is  already  impacting  clinical  decision 

making.  

However,  machine  learned  models  are  far  from  perfect.  The  limited  use  of  AI  in  the 

medical field is a testament to how difficult certain tasks can be. One major limitation of AI is the 

need for large quantities of data. The volume of data has a major impact on the performance of the 

model; in general, higher volume training datasets include greater diversity, enabling better and 

more generalizable performance. For medical imaging tasks, it is difficult to curate large volumes 

of  data.  To  partially  solve  this  issue,  data  augmentation  techniques  have  been  proposed.  By 

supplementing  our  training  set  with  augmented  data,  we  can  synthetically  add  diversity  and 

therefore improve performance. This chapter has four main aims: (1) a high-level overview on the 

basics  of  neural  networks,  (2)  how  neural  networks  are  tailored  to  allow  for  object  detection, 

classification, and segmentation tasks, (3) how we apply these models for rib fracture detection 

29 

 
tasks, (4) data augmentation techniques, and (5)  data augmentation applications in  the medical 

imaging field. 

2.1 Basics of Neural Networks 

A neural network (NN) is a machine learning method that is designed to mimic the human 

brain. Individual neurons in our brain collect electrochemical signals through dendrites and then 

pass on the signal through the axon. The axon splits up into thousands of branches with a structure 

called  a  synapse  at  each  end.  Depending  on  the  input  signal,  the  synapse  may  release 

neurotransmitters  that  inhibit  or  excite  the  next  neuron.  To  learn  which  neuron  is  correctly 

inhibited or excited, synapses receive feedback and adjust its behavior accordingly. Based off this 

understanding, NNs are constructed around a basic unit called a node (Figure 9). Nodes behave 

similarly to a neuron, where it can activate, propagate a signal, and receive feedback to learn. We 

structure multiple neurons in groups called layers. An individual node in one layer is connected to 

every  node  in  the  next  layer.  The  connections  have  weights  attached  to  them  that  dictate  the 

importance of the input information passed to that connection. The weighted inputs are then added 

with a bias and processed through an activation function to determine which downstream nodes 

should be activated. The weights and biases can be modified depending on the feedback it receives. 

The feedback is evaluated using a loss function. The final layer in our network is called the output 

layer and may contain a single or multiple nodes depending on the type of output we are expecting 

[67]–[69].  Here,  we  will  describe  more  in  depth  how  activation  functions  and  loss  functions 

operate, and how weights are updated using a process called back propagation. 

30 

 
 
Figure 9. Example of a simple neural network. (Top) A single input node goes through two layers 
and then an output node. Each connection between nodes has a weight assigned to it. (Bottom) 
The input, weight, and bias of a node is then used by an activation function to calculate if the next 
node will be activated. 

2.1.1 Activation Functions 

At the most basic level, an activation function calculates the output of a node given a set 

of inputs. It decides whether a downstream node is activated or not. The most commonly  used 

activation functions are non-linear activation functions. If a NN uses a linear activation function, 

it is equivalent to a regular linear regression model [70]. While a simple linear regression model 

is easy to solve, it often lacks the complexity necessary to model real world data. The non-linearity 

of  activation  functions  allows  NNs  to  model  complex  relationship  between  nodes.  Table  5 

highlights several types of non-linear activation functions, each with its own pros and cons. 

31 

NodeInput ConnectionInputOutputWeightActivationfunction(Input x Weight) + Bias(Input x Weight) + BiasOutput Connection 
 
 
 
Table 5. Common activation functions used in neural networks. 

Function 

Value Range 

Pros 

Cons 

•  Gives  you  a  smooth  gradient 

•  Prone  to  vanishing  gradient 

Sigmoid 

0,1 

Tanh 

-1, 1 

while converging. 
a 

clear 

•  Gives 

prediction 

(classification) with 1 & 0. 

problem. 

•  Not a zero-centric function. 
•  Computationally  expensive 
in 

(exponential 

function 
nature) 

•  Zero-centric 
• 

is 

It 
smooth 
a 
converging function. 

•  Prone to Vanishing Gradient 

gradient 

function.  

•  Computationally  expensive 
in 

(exponential 

function 
nature) 

ReLu 

0, ∞ 

•  Computationally 

inexpensive 

in the negative axis. 

function (linear in nature). 

•  Can deal with vanishing gradient 

problem. 

•  Not a zero-centric function. 
•  Gives zero value as inactive 

Leaky ReLu 

-∞ 

•  Same  as  ReLu,  except  it  gives 
some small value instead of 0 in 
the negative axis. 

•  Same as ReLu 

Binary Step 

0,1 

(classification) with 1 & 0. 

classification 

•  Gives 

a 

clear 

prediction 

•  Only 

supports 

binary 

•  Zero-centric 

Choosing the correct activation function is vital to the success of the model. Currently, the 

ReLU (Rectified Linear Unit, [71]) and leaky ReLU activation functions [72] are most commonly 

used for the hidden layers in deep learning models as it avoids the vanishing gradient problem that 

sigmoid and tanh functions have, and it converges approximately 6 times faster [73]. Choosing the 

right  function  for  the  output  layer,  is  slightly  more  complicated.  Generally,  in  a  regression 

problem, we use the linear (identity) activation function with one node. In a binary classifier, we 

use the sigmoid activation function with one node. In a multiclass classification problem, we use 

the softmax activation function with one node per class. In a multilabel classification problem, we 

use the sigmoid activation function with one node per class [74]. Generative adversarial networks 

(GANs), and other image generating networks generally uses Tanh. After choosing an appropriate 

activation  function  for  a  neural  network,  the  next  important  consideration  is  selecting  an 

appropriate loss function. 

32 

 
 
  
2.1.2 Backpropagation and Loss Functions  

Neural networks “learn” by adjusting the weights and biases of the connections between 

nodes. To know how much we need to adjust the weights and biases by, we need some way to 

determine  how  well  our  model  is  doing.  To  do  this  we  use  a  loss  function.  Also  called  a  cost 

function, a loss function compares your model’s current prediction with the actual value. Our aim 

is to minimize the loss function, that is to minimize the difference between our predictions and 

ground truth values. We then use a method called backpropagation to adjust the weights and biases 

based on the results of our loss function.  

First, let us define some common loss functions. There are several types of loss functions 

used in neural networks, each with its own strengths and weaknesses. If we are trying to perform 

a regression task where the predicted output is a continuous scalar, we would use a mean squared 

error (MSE) loss. The MSE measures the average squared difference between the predicted value 

and the actual value (Equation 6). The main advantage of MSE is that it is smooth and easy to 

optimize using gradient descent. However, it can be sensitive to outliers since the squared term 

will magnify the error of a very poor prediction [69].  

𝑀𝑆𝐸 =

1
𝑛

𝑛
∑(𝑌𝑖 − 𝑌̂𝑖)2
𝑖=1

(6) 

Where n is the number of data points, Yi is the true values, and Ŷi is the predicted values. 

Another commonly used loss function is the binary cross entropy loss, which is used for binary 

classification problems. For these types of problems, we only have two possible outputs: yes or no 

(i.e. 0 or 1). The binary cross-entropy can be expressed as (Equation 7) [75].  

𝐻𝑝(𝑞) = −

1
𝑛

𝑛
∑ 𝑦𝑖
𝑖=1

∙ log (𝑝(𝑌𝑖)) + (1 − 𝑌𝑖) ∙ log (1 − 𝑝(𝑌𝑖)) 

(7) 

33 

 
 
 
 
Where Yi is the true label of the i-th sample, and p represents the predicted probability that 

the sample belongs to the positive class. The main advantage of binary cross-entropy is that it is a 

simple  and  efficient  measure  of  the  uncertainty  of  the  model's  predictions  and  can  be  easily 

optimized using gradient descent [76]. The main idea behind this loss function is to penalize the 

model  more  heavily  when  it  makes  a  wrong  prediction  with  high  confidence  (i.e.,  when  the 

predicted probability is close to 0 or 1), and less heavily when it makes a wrong prediction with 

low confidence (i.e., when the predicted probability is close to 0.5) [77].  

Categorical Cross-Entropy: When our classification task has more than 2 possible classes, 

we  would  use  a  categorical  cross-entropy  loss.  It  uses  the  same  equation  as  the  binary  cross 

entropy, and simply calculates the cross entropy for each class label per observation. 

Huber Loss: This is a loss function used for regression problems, similar to the MSE. The 

main difference is that it is less sensitive to outliers and can handle them better (Equation 8) [78]. 

It achieves this by combining MSE with mean absolute error. That is, it uses a different function 

for large errors (absolute value) and small errors (squared value). 

𝐿𝛿(𝑌𝑖 − 𝑌̂𝑖) = {

1
2
𝛿|𝑌𝑖 − 𝑌̂𝑖| −

(𝑌𝑖 − 𝑌̂𝑖)2             𝑓𝑜𝑟 |𝑌𝑖 − 𝑌̂𝑖| ≤ 𝛿

1
2

𝛿 2           𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒     

(8) 

Simply put, we use MSE when loss values are less than a parameter δ and we use MAE 

when it is greater than δ. Now that we understand a few ways to measure model predictions, we 

can  adjust  the  weights  and  biases  of  the  network  using  a  process  called  backpropagation. 

Backpropagation was first described by Werbos is 1974 [79], it has since been improved for more 

complex systems [80]–[82]. At its core, backpropagation is an iterative process to calculate the 

derivatives of our loss function with respect to our weights and biases [83]. The backpropagation 

algorithm consists of two phases: the forward pass and the backward pass. During the forward 

34 

 
 
 
pass, the input is fed into the network, and the output is computed using the current weights. We 

then use the loss function to calculate the error between our predicted output and the true output. 

During the backward pass, the error is propagated backwards through the network, starting from 

the output layer and moving towards the input layer. The weights of each connection are adjusted 

in the opposite direction of the gradient of the error with respect to that weight. In other words, if 

a weight is causing the error to increase, the weight is decreased, and if the weight is causing the 

error  to  decrease,  the  weight  is  increased.  This  adjustment  is  performed  using  an  optimization 

algorithm such as gradient descent. This operation terminates when the error is minimized. 

Now, with a basic understanding of how neural  networks are constructed and how they 

learn, we can begin to talk about how they can be tailored for different tasks such as classification, 

detection, segmentation, and prediction. 

2.2 Machine Learning and Medical Imaging Applications 

Classification, detection, and segmentation are all tasks commonly performed in machine 

learning and computer vision.  Classification is  the task of assigning  an input  to  one of several 

predefined categories. For example, given an image of an animal, a classification algorithm would 

determine which animal it represents (Figure 10a). Detection is the task of identifying the presence 

and location of specific objects within an image. Object detection algorithms typically produce a 

bounding box around each detected object, along with a confidence score indicating the likelihood 

that the object is present (Figure 10b). Segmentation is the task of dividing an image into multiple 

segments, where each segment corresponds to a distinct object or region within the image. For 

example, in an image of multiple animals, segmentation might be used to identify and distinguish 

pixel-level labels for each animal (Figure 10c).  

35 

 
Figure 10. Computer vision tasks. Classification generates a label that best describes the image. 
Object detection produces labels of any found object within the image and its location. Instance 
segmentation gives a pixel-level label of objects found within the image. Image courtesy of [84] 
under CC-BY 4.0.  

Using  automated  computer  analysis  for  medical  imaging  applications  has  been  around 

since the 1970s [85]. Early on, hardware limitations prevented complex tasks such as pathology 

detection  and  patient  outcome  prediction.  However,  researchers  at  the  time  were  still  able  to 

successfully perform low level pixel segmentation [86]–[89], image enhancement [90]–[94], and 

basic classifiers  [30]–[33]. Breakthroughs in both hardware and neural network architecture have 

led to NNs promising human levels of performance. Here, we will highlight some of the modern 

types of NNs and recent medical imaging applications. 

2.2.1 Convolutional Neural Networks 

One of the biggest breakthroughs in NNs is the use of convolution filters. Convolutional 

neural  networks  (CNNs)  were  first  described  by  Fukushima  and  is  designed  to  capture  spatial 

patterns within images by using small convolutional filters [99]. The convolution layer is the first 

layer that is used to extract the various features from the input images. After this convolution layer, 

the data is then passed on to an activation and pooling layer, and then fully connected layers (Figure 

36 

 
 
11). The first two, convolution and pooling layers, perform feature extraction, whereas the third, a 

fully connected layer, maps the extracted features into final output, such as classification [100]. 

The goal of this convolution operation is to obtain all the high-level features of the image and at 

the  same  time  reduce  dimensionality.  By  reducing  the  dimensionality,  we  decrease  the  require 

computational power for processing the data and increase the rate of training [101].  

Figure 11. A simple convolutional neural network. Feature extraction of our input is done through 
convolution  layers  and  pooling  layers.  The  extracted  features  then  are  fully  connected  to  our 
desired final output. Image courtesy of [102] under CC-BY-NC-ND 4.0. 

To  understand  the  convolution  operation,  first  let  us  consider  a  7x7  matrix  and  a  3x3 

convolution  kernel  (Figure  12). An  element-wise product  between the kernel  and the matrix  is 

computed starting from the top right-hand corner of the matrix. The sum of the products for each 

cell is added to a new matrix called the feature map. The kernel is then shifted one cell the right 

and the process is repeated. The operation stops when the kernel reaches the bottom left of the 

matrix.  

37 

 
 
Figure 12. Visualization of the convolution operation. The 3x3 kernel (red box) will march along 
the matrix (7x7 top row), computing an element-wise product along the way. For each operation, 
the sum of the products will be added to a new matrix (5x5, bottom row). This newly generated 
matrix is called the feature map. 

The  convolution  operation  described  above  does  not  allow  the  center  of  each  kernel  to 

overlap the outermost element of the input tensor and reduces the height and width of the output 

feature map compared to the input tensor. Padding, typically zero padding, is a technique to address 

this issue, where rows and columns of zeros are added on each side of the input tensor, to fit the 

center of a kernel on the outermost element and keep the same in-plane dimension through the 

convolution  operation  (Figure  13).  Modern  CNN  architectures  usually  employ  zero  padding  to 

retain in-plane dimensions in order to apply more layers. Without zero padding, each successive 

feature map would get smaller after the convolution operation.  

38 

…  
 
 
Figure 13. Visualization of zero padding. To retain the same 7x7 dimension of our original matrix, 
we must add a layer of zeros around it. The zero padded 9x9 matrix when convolved with a 3x3 
kernel will yield a 7x7 matrix. 

As  mentioned  previously,  once  the  convolution  process  is  complete,  it  is  fed  into  an 

activation function and then a pooling layer. Generally, ReLU activation functions are used. A 

pooling layer is downsampling operation where we reduce our data, and subsequently decrease the 

number of subsequent learnable parameters [100]. This operation reduces the computational costs 

and  therefore  speeds  up  training.  Pooling  also  allows  for  the  extraction  of  features  at  different 

spatial  scales.  Two  commonly  used  pooling  operations  are  max  pooling  and  average  pooling 

(Figure  14).  The  operation  creates  “pools”  of  non-overlapping  regions  in  the  data  and  then 

represents each pool with a single number. For average pooling, we simply average all the data 

together in a pool. For max pooling, we take the maximum value of the pool. Max pooling is more 

commonly used as it captures the strongest activation in the feature map, which can help to retain 

information about the edges and other key features of an object [103]. 

39 

00000000000000000000000000000000 
 
Figure 14. Visualization of max pooling. Here we use a 2x2 filter to run over out input 4x4 matrix. 
We'll use a stride of 2 (skipping every other index) so that regions won’t overlap. For each region 
highlighted by the filter, we take the MAX value of the region and then map it to a new matrix.  

Finally, the pooling layer is flattened (transformed into a 1-Dimensional vector) and input 

into a fully connected layer. A fully connected layer simply means that every element in the input 

vector is connected to every output node. The fully connected layer is eventually mapped to our 

final output layer. For classification tasks, it is normal to use one hot encoding, where each class 

is encoded by one output node. An output node simply returns a probability between 0-1 that the 

input data belongs to a particular class, therefore the final layer would return a vector of length 

equal to the number of possible classes. 

2.2.2 U-Net 

The U-net architecture was first proposed by Ronneberger et al. in 2015 and is a type of 

CNN [104]. The architecture consists of two phases, a downsampling phase and an upsampling 

phase, and was originally designed for image segmentation tasks (Figure 15). The downsampling 

phase is identical to a normal CNN as described in the previous section. The upsampling phase 

consists of up-convolutions with a large number of feature channels which allow the network to 

propagate  context  information  to  higher  resolution  layers  [105].  These  layers  also  have 

40 

18302523114921108329 
 
 
concatenations  with  features  from  the  contracting  path.  These  concatenations  are  necessary  to 

recover spatial information lost during downsampling [106]. The final layer uses a 1x1 convolution 

to map the component feature vector to the desired number of classes [104]. Many object detection 

networks such as EfficientNet, DenseNet, Yolo, and RetinaNet all use elements that are common 

to the U-net architecture.  

Figure 15. Original  U-net architecture design from  [104]. The input matrix is reduced through 
convolution  and  max  pooling  operations.  Then,  the  extracted  features  are  upsampled  using 
transpose convolution operations. Image license CC-BY-NC-ND 4.0. 

2.2.3 Residual Blocks 

Residual  blocks  are  a  building  block  used  in  deep  neural  networks,  particularly  in  the 

architecture known as ResNet (short for Residual Network). Residual blocks address the problem 

of vanishing gradients in very deep neural networks. As the network gets deeper, it becomes harder 

for the gradients to propagate backwards through the layers during training, leading to increasingly 

large errors. Residual blocks aim to address this problem by introducing a skip-connection, which 

41 

 
 
 
maps to activations earlier in the network (Figure 16). They are designed in such a way that the 

output of a layer is taken and added to another layer deeper in the block [107]. By learning these 

residual mappings rather than the underlying mappings, ResNet was able to significantly  reduce 

the difficulty of training, which resulted in great performance boosts in terms of both training and 

generalization error [108]. 

Figure 16. The original design of a single residual block from [107]. Here, the output from one 
layer directly feeds into the next layer and a layer 2 or 3 connections away. We can concatenate 
many of these blocks together to form deep neural networks. Image license under CC-BY-NC-ND 
4.0. 

2.2.4 Deep Learning for Rib Fracture Detection 

Rib  fractures  are  a  common  type  of  injury,  present  in  approximately  4-12%  of  trauma 

admissions  and  is  a  major  source  of  chronic  pain  [109],  [110],  making  early  detection  and 

diagnosis  critical.  Therefore,  with  the  rapid  development  of  deep  learning  techniques  in  recent 

years, there has been a growing interest in the use of deep neural networks or deep learning (DNN 

and DL, respectively) for rib fracture detection. Several recent studies have shown that deep neural 

networks can outperform human radiologists in detecting rib fractures. Table 6 highlights articles 

published between 2020 and the present using state-of-the-art neural networks. 

42 

 
 
 
Table  6.  Recently  published  studies  localizing,  detecting,  segmenting,  and  classifying  rib 
fractures. 
Data and Model Description 

Reference 

Results 

3644 chest CT images were used to train a 
single shot deep neural network based off 
DenseNet.  The  performances  of 
rib 
fracture detection by the network and two 
medical interns and two radiologists were 
compared. 

865  fractures  on  713  ribs  from  198  CT 
images  were  used  to  train  a  CNN  object 
detector. Total fractures detected and total 
reading time by two radiologists (R1, R2) 
either  assisted  or  unassisted  by  NN  was 
recorded. 
8529  CT 
images  containing  861  rib 
fractures  were  used  to  train  a  CNN. 
Precision, recall, F1-score, and diagnostic 
time  of  two  junior  radiologists  with  and 
without  the  deep  learning  model  were 
computed. 
1697  CT  scans  divided  into  65:20:15 
training,  validation,  and  testing  sets  (594 
fracture present cases) were used to train a 
3D  DNN.  Their  model  was  compared  to 
ResNet,  DenseNet,  R(2+1)D,  and  CSN 
classifiers. 
1707  chest  CTs  split  into  1507:100:100 
training, validation, and testing were used 
to train a custom 3-step segmentation and 
detection  model.  First  and  second  stage 
consisted of a U-net segmenting bone and 
then detecting ribs. The final stage used a 
3D DenseNet to propose fracture location 
and  classification.  Radiologists  were 
evaluated  on  precision,  recall,  F1-score, 
negative predictive value with and without 
the aid of the model. 
511 whole body CT scans (fracture absent, 
n  =  159,  fracture  present,  n  =  352)  were 
used 
train  a  2-stage  deep  neural 
network. The first stage was a 3D ResNet 
model used to propose a region(s) that the 
Fast-Region  CNN  second  stage  would 
then filter out poor predictions. 

to 

Model 
Used 
(Type) 
DNN 
(Modified 
DenseNet) 

CNN 
(Faster R-
CNN) 

The  model  outperformed  the  interns, 
achieving a sensitivity, positive predictive 
value,  and  F1-score  of  0.645,  0.793,  and 
0.711,  respectively,  while  the  interns 
achieved mean scores of 0.285, 0.797, and 
0.649.  However, 
the  model  did  not 
perform  as  well  as  the  radiologists,  who 
achieved a sensitivity of 0.860.  
DL detected 687 (79.4%) of the 865 true 
fractures  with  0.43  FPS.  Sensitivity  of 
radiologists  assisted  by  DL  significantly 
increased;  82.8%  to  88.9%  for  R1,  and 
83.9% to 88.7% for R2. 

[111] 

[112] 

CNN 
(VRB-Net) 

CNN  informed  radiologists’  precision, 
recall,  and  F1-score  increased  to  0.943, 
0.978,  and  0.960,  respectively,  from 
0.812, 0.885, and 0.845. 

[113] 

DNN 
(SGANet) 

DNN 
(Modified 
DenseNet) 

DNN 
(Modified 
ResNet) 

[114] 

[115] 

[116] 

in 

all 

improved 

outperformed 
networks 

SGANet 
other 
precision, 
established 
sensitivity,  and  F1-score  (68.97,  90.91, 
and  0.7843,  respectively),  and  had  the 
second-best  specificity  (78.05  vs  CSN’s 
78.66) 
Radiologists 
F1-score, 
precision, recall, and NPV with the use of 
the model (0.842 to 0.948, 0.773 to 0.946, 
0.932  to  0.949,  and  0.979  to  0.989, 
respectively).  On  average,  the  diagnosis 
time  of  radiologist  assisted  with  this 
detection  system  was  reduced  by  65.3s. 
The  model  alone  achieved  F1-score, 
precision, recall, and NPV of 0.890, 0.869, 
0.913 and 0.969, respectively. 

value, 

predictive 

sensitivity, 

specificity, 
The  model’s 
negative 
positive 
predictive  value,  accuracy,  and  F1-score 
was 87.4%, 91.5%, 82.3%, 94.1%, 90.2%, 
0.85, 
respectively.  Their  model’s 
sensitivity  is  approximately  the  same  as 
others in literature, however their  PPV is 
higher than average. 

43 

 
 
 
[117] 

[118] 

[119] 

[120] 

[121] 

In  general,  results  showed  that  6 
attending  radiologists  tended  to  miss 
the  deep 
more  rib  fractures 
than 
models;  however, 
they  generally 
reported fewer false positives on their 
external  dataset, 
the  model  beat 
radiologists in patient-level diagnoses, 
with a sensitivity of 86.2% compared to 
70.5%. 
Generally,  the  reduction  of  a  single 
block  reduced  accuracy  1-1.5%  but 
decreased  the  inference  time  by  10-
25%.  The  best  performance-to-speed 
model  was  the  InceptionV3  network 
with  7  blocks,  with  an  accuracy  and 
sensitivity  of  96.00%  and  94.0%, 
respectively. 
The  model  achieved  a  sensitivity  of 
92.9%  with  5.27  false  positives  per 
scan  where  radiologists  achieved  a 
much  lower  false  positives  per  scan 
(1.13), while underperforming the deep 
neural  networks  in  terms  of  detection 
sensitivities (77.5%).  

a 

(89.2%)  but  a 

The  model  achieved  a  precision  of 
82.2%  and  sensitivity  of  84.9%; 
compared  to  three  radiologists  with  a 
precision  of  90.6%  and  sensitivity  of 
79.7%.  With  the  help  of  the  model, 
higher 
radiologists 
achieved 
lower 
sensitivity 
precision (88.4%). 
A higher score is better for both Dice 
and IOU, and a lower score is better for 
ASSD. The nnU-net model achieved a 
Dice, IOU, and ASSD score of 62.80, 
48.81,  and  11.40,  respectively.  The 
model  outperformed  its  2D  and  3D 
cascaded versions in all three metrics. 

Table 6 (cont’d) 

(ER) 

room 

12208  emergency 
trauma 
patients and an external dataset of 1613 ER 
trauma patients taking chest CT scans were 
used  to  train  a  cascaded  deep  neural 
network.  The  model  consisted  of 
two 
cascading U-net models that first segmented 
ribs and then detected fractures. 

DNN 
(RB-Net) 

20646 annotated axial CT scans were  used 
to  transfer  learn  a  variety  of  deep  neural 
networks.  The  paper  assessed  the  speed-
accuracy trade-offs by using only the first -n 
blocks of each pretrained network. 

DNN 
(InceptionV3, 
ResNet50, 
MobileNetV2, 
and VGG16) 

the  RCNN  was  used 

7473  annotated  CT 
images  from  900 
patients were used to train a 3-step fracture 
segmentation model. The model is based on 
a  3D  U-Net  structure  and  consists  of  a 
preprocessing  step,  a  sliding  window 
prediction step, and a post-processing step. 
Sensitivity  and  false  positives  of 
the 
detection  performance  were  compared 
between the model and expert radiologists.  
10943  CT  scans  were  used  to  train  an 
ensemble 3D U-Net + 2D RCNN network. 
The U-Net was used to segment the ribcage 
while 
to  detect 
fractures. Precision, sensitivity, and F1 score 
were  used  as  metrics  to  assess  model  vs 
detection 
radiologist 
performance. 
The  MICCAI  2020  RibFrac  challenge 
consists  of  660  CT  scans  with  ~5000 
fractures  split 
training,  80 
into  420 
validation, and 160 testing set. This dataset 
was used to train a two-stage detector with a 
nnU-Net  segmentation  network  and  a 
DenseNet classification network. This nnU-
net model was compared to two other nnU-
net  versions  and  assessed  based  on  Dice 
coefficient,  intersection  over  union  (IOU), 
and  average  symmetric  surface  distance 
(ASSD). 

fracture 

rib 

3D 
(FracNet) 

U-Net 

Ensemble  3D  U-
Net  and  2D  Fast 
RCNN 

3D  nnU-Net  and 
DenseNet 

44 

 
 
 
 
Table 6 (cont’d) 

CNN 
(YOLOv3) 

1080 radiographs were randomly divided into 
the  training  set  (918  radiographs)  and  the 
testing set (162 radiographs) and used to train 
an  off-the-shelf  object  detector  (YOLOv3). 
Receiver  operating  characteristic  (ROC)  and 
free-response  ROC  (FROC)  were  used  to 
evaluate  the  model’s  diagnostic  performance 
against radiologists. 

operator 

characteristic 

4366  chest  X-rays  (3411  fracture  absent  and 
955 fracture present) were used to train a two-
stage object detector. A U-net model first took 
the  image  and  segmented  the  left  and  right 
lungs. Then an EfficientNet model was used to 
classify the ROIs into fracture or no fracture. 
The model was evaluated using area under the 
receiver 
curve 
(AUROC) and accuracy. 
1020  CT 
images  and  patient  clinical 
information  was  used  to  train  two  models: 
Faster RCNN and a fusion ResNet101+RCNN. 
The ResNet101 network was used as a feature 
extraction step, where then clinical information 
was concatenated to the output, and then used 
as input to the RCNN classifier. The diagnostic 
performance  of  both  models  and  radiologists 
were  assessed  based  on  precision,  recall 
(sensitivity), and F1-score. 

[122] 

[123] 

[124] 

The  sensitivity  and  precision  of  the 
detection  by  the  CNN  model,  senior 
radiologist,  and  junior  radiologist  were 
87.3%, 80.3%, and 80.3%, respectively, 
and  82.4%,  73.4%, 
and  81.7%, 
respectively. The sensitivity of detection 
was  significantly  higher  in  the  CNN 
model than among the junior radiologist 
(P = 0.01),  however  no 
significant 
difference existed between the CNN and 
senior radiologist (P > 0.05) 
The  model  achieved  an  AUROC  of 
0.965  and  an  accuracy  of  0.916.  This 
article  did  not  compare  the  model  to 
other networks or radiologists. 

and 
U-net 
EfficientNet 

ResNet101 
and 
RCNN 

Faster 

The  fusion  model  outperformed  the 
regular  Faster  RCNN  model  in  all 
metrics; precision 0.799 to 0.629, recall 
0.973  to  0.945,  and  F1-score  0.877  to 
0.755.  The  fusion  model  also  had 
significantly  higher  sensitivity  (0.95  vs 
0.77)  but  significantly  lower  precision 
(0.80 vs 0.87) compared to radiologists. 

Overall, these papers show promising results for machine learned rib fracture detection and 

suggest that it could be used to inform medical decision making. However, there are also several 

challenges that need to be addressed before deep neural networks can be widely used in clinical 

practice. One major challenge is biases in the data. Most of the above studies are heavily biased 

for white adult males. This means that potentially, the performance of the object detectors would 

be poor when presented with an image of a child, woman, or person of color. Additionally, only 4 

of  the  above  papers  are  multicenter,  which  means  most  object  detectors  may  have  limited 

generalizability.  Another  challenge  is  the  interpretability  of  deep  neural  networks.  We  often 

describe NNs as “black boxes” since we do not know why the network is making its decision. This 

45 

 
is a problem because physicians need to be able to provide an explanation for their diagnosis and 

treatment  plan.  The  black  box  issue  is  also  why  the  first  step  in  integrating  NNs  with  clinical 

decision making is to have them inform or aid doctors, not replace them. 

The first issue of biased data and challenges with poor generalizability could potentially be 

partially solved using data augmentation. In general, when the diversity of the training set is higher, 

the performance of the model also improves. With limited availability of such expertly labeled 

medical images, researchers use data augmentation to generate synthetic medical images, either 

by simple morphological operations or through more complex techniques. We will talk more about 

data augmentation in the following section. 

2.3 Data Augmentation Techniques 

For  a  ML  model  to  learn  effectively,  the  observed  data  must  be  a  diverse,  accurate 

representation  of  the  true  distribution.  Therefore,  to  properly  estimate  the  true  distribution, 

extremely large datasets become necessary [125]. However, in healthcare, datasets of sufficient 

size may be rare or absent, thus hindering direct training of ML models. Large amounts of medical 

imaging data are hard to acquire, as lack of standardization, lengthy curation process, releasing 

HIPPA  compliant  images,  and  need  for  expert  labeling  hinder  the  availability  of  training  data 

[126].  Additionally,  medical  imaging  data  acquisition  can  be  affected  by  the  prevalence  of  the 

disease in question as well as the cost of the imaging modality. One of the ways of dealing with 

this problem is data augmentation, where we supplement our datasets with slightly modified copies 

of already existing data or newly created synthetic data based on existing data. Early methods of 

data augmentation included simple morphological operations such as shrinking, rotations, blurs, 

flips, and noise addition [127]. Recently, sophisticated data augmentation methods are based on a 

class of neural networks called Generative Adversarial Networks (GANs), which generate new 

images of high perceptual quality that combine the content of a base image with the appearance of 

46 

 
another one [128]. GANs have also been used widely for image-to-image translation, where we 

transform an image from one domain to another (i.e. image of a horse to image of a zebra). 

GANs, were a huge breakthrough for data augmentation. It utilized two neural networks 

which  were  trained  simultaneously  to  output  realistic  fake  images  by  learning  the  probability 

distribution of a set of images (Figure 17) [129]. More specifically, a generator network creates 

fake images, and a discriminator is fed real and fake images and determines which is real. As both 

networks learn and improve, we reach a stop condition where hopefully our generator outputs fake 

images indistinguishable from the real ones by the discriminator. GANs have been adapted for 

image-to-image translation tasks, where a neural network learns to create a mapping from images 

of one domain to another domain. This technique has been used to transform PET to CT image 

[130], create lesions on non-lesioned dermatological samples [131], create tumors on healthy brain 

MR images [132], and transform T1 to T2-weighted MR images [133]. Here, we will review GANs 

and its many variants, common evaluation metrics, and its medical imaging applications. 

Figure  17.  Generative  Adversarial  Network  architecture.  We  have  two  neural  networks 
“competing” against each other. The generator’s goal is to fool the discriminator by outputting 
convincing fake samples. The discriminator’s goal is to tell whether it’s being given a real or a 
fake sample.  

47 

GeneratorDiscriminatorRandom InputTrue Data DistributionFakeSampleReal SampleReal or Fake? 
 
 
 
 
2.3.1 Generative Adversarial Nets  

The original GAN as described by Goodfellow et al. is non-conditional, meaning that it 

takes a random latent vector and maps it to the sample distribution [129]. The GAN simply needs 

to generate images like those in the dataset given a random vector. The generator architecture is 

designed similarly to the upsampling phase of the U-net architecture. It goes through a project and 

reshape operation, and then consecutive up-convolution layers until we reach the desired resolution 

(Figure 18). The discriminator is a normal CNN classifier and outputs a probability between 0-1. 

Figure  18.  Generator  architecture  of  a  of  a  non-conditional  GAN.  A  random  latent  vector  is 
upsampled through multiple transpose convolution layers until we get to the desired resolution. 
Image courtesy of [134] under CC-BY-NC-ND 4.0. 

Let us define a generator G that is trying to learn a distribution 𝑝𝑔 from data x and produces 

a mapping 𝐺(𝑧; 𝜃𝑔) from a vector 𝑝𝑧(𝑧). Here, 𝜃𝑔 are learnable parameters. Now, let us define a 

discriminator 𝐷(𝑥; 𝜃𝑑) is trained to maximize the probability of assigning the correct label to both 

training examples and samples from the generator. Here, 𝐷(𝑥) is the probability that x is from our 

training data and not from 𝐺(𝑧). Simultaneously, the generator is trained to minimize: 

48 

 
 
log  (1 − 𝐷(𝐺(𝑧))) 

(9) 

And both networks play the following two-player minimax game with the loss function: 

  𝑚𝑖𝑛
𝐺

𝑚𝑖𝑛
𝐷

𝑉(D, G) = 𝔼x~p𝑑𝑎𝑡𝑎(𝑥)[logD(𝑥)] + 𝔼z~p𝑧(𝑧) [log (1 − 𝐷(𝐺(𝑧)))] 

(10) 

Once trained sufficiently and assuming Equation 5 converges, it will reach a point where 

neither  D  or  G  can  improve  because  𝑝𝑔 = 𝑝𝑑𝑎𝑡𝑎  The  discriminator  is  unable  to  differentiate 

between the two distributions, i.e. 𝐷(𝑥)   =  1/2. 

There  are  many  challenges  when  training  the  original  GAN,  such  as  mode  collapse, 

vanishing gradients, and non-convergence. Mode collapse is when the generator can only create a 

small  set  of  convincing  outputs.  These  outputs,  while  realistic,  represent  only  a  portion  of  the 

sample distribution. Therefore, it easily fools the discriminator and hinders learning. For example, 

Figure 19, bottom  row, shows mode collapse when training on the MNIST digits  dataset.  This 

dataset contains handwritten digits from 0-9 and ideally the GAN must learn to produce each class. 

However, over time, it learns to only produce a 6, which fools the discriminator every time. 

Figure 19. Example of successful GAN training (top row) and mode collapse (bottom row). Image 
courtesy of [135] under CC-BY-NC-ND 4.0. 

49 

 
 
 
 
                         
 
The issue of non-convergence arises due to  the  non-convex nature of the loss  function. 

When  the  loss  function  is  non-convex,  it  is  more  difficult  to  minimize  using  gradient  descent. 

Intuitively,  D or G  always counters the  actions  of the other in the next  iteration, making large 

swings in the learning curve (Figure 20). Arjovsky and Bottou dives deeper into the nature of this 

phenomenon, showing how the norm of the gradient grows drastically as the discriminator trains 

longer [136]. In all cases, using this to update the generator leads to a notorious decrease in sample 

quality. Additionally, the large swings in the learning curve show that the variance of the gradients 

is increasing, which is known to delve into slower convergence and more unstable behavior in the 

optimization [137]. 

Figure 20. Example of unstable generator training. In this study, the generator network is fixed 
while  only  the  discriminator  trains.  The  gradient  norms  quickly  decay  with  wild  swings  from 
iteration to iteration. This demonstrates that as the discriminator improves, the generator’s gradient 
vanishes. Image courtesy of [136] under CC-BY-NC-ND 4.0. 

2.3.2 GAN Variants 

To  solve  these  issues,  multiple  papers  have  suggested  alternative  loss  functions  [138]–

[145].   Among these, the most popular and  robust  is  Wasserstein  distance GAN (WGAN) and 

WGAN with gradient penalty (WGAN-GP). The Wasserstein distance alone is informally defined 

50 

 
 
as  the  minimum  cost  of  transporting  mass  in  order  to  transform  the  distribution  q  into  the 

distribution p (where the cost  is mass times  transport distance)  [145] and is  represented by the 

following equation: 

𝑚𝑖𝑛𝐺𝑚𝑎𝑥𝐷∈D 𝔼x~p𝑟[D(𝑥)] − 𝔼𝑥̅~p𝑔[𝐷(𝑥̃)] 

(11) 

Intuitively,  we  can  explain  WGAN  as  simply  minimizing  the  distance  between  the 

distribution of the generator’s output and the true distribution (modeled by your training data). 

WGAN uses a technique called weight clipping, which enforces a 1-Lipschitz constraint on the 

discriminator. A Lipschitz constraint limits how fast a function changes by putting bounds on the 

function’s first derivative. This means that the weights of the discriminator are forced to lie within 

a compact space defined by a Lipschitz function. For example, a sin function, the absolute value 

of its derivative is always bounded by 1 and therefore it is 1-Lipschitz constrained. Intuitively, 

Lipschitz continuity bounds the gradients and is  beneficial in  mitigating gradient explosions  in 

deep learning. Instead of weight clipping WGAN-GP enforces the 1-Lipschitz constraint by adding 

an additional loss term to the Wasserstein distance (Equation 12). 

(12) 

The penalty term is based on the norm of the gradient of the discriminator's output with 

respect to the input data. Specifically, the penalty term is defined as the difference between the 

norm of the gradient and a constant value of 1, squared and multiplied by a hyperparameter λ. 

Intuitively, the gradient penalty satisfies the Lipschitz constraint by encouraging the discriminator 

to have a gradient with a norm close to 1, and penalizes it when the norm deviates from 1. While 

demonstrably better than the alternatives, it does not guarantee convergence [146]. 

51 

 
 
 
 
In  addition  to  loss  function  changes,  there  were  numerous  architectural  improvements 

made. We highlight the most popular types in Table 7: 

Table 7. Common GAN variants. 
Variant 

Novelty/Description 

cGAN 

DCGAN 

DA-GAN 

Adds  conditional  information,  specifically  class  label  information  to  the  basic  GAN 
architecture.  This  allows  the  generator  to  selectively  generate  an  imbalanced  class 
through label input. 
Introduces a “Deep Attention” mechanism that is a compound loss of instance-level and 
set-level  translation  task.  Essentially,  for  each  pair  of  images,  not  only  is  the  goal  to 
translate from one domain to another but translate local regions of each image to each 
Table 7 (cont’d) 
other.  
Applied specific architectural constraints to allow for stable training in a deep network. 
These  constraints  include  using  a  batchnorm  layer  in  both  the  generator  and  the 
discriminator, using ReLU activation in generator for all layers except for the output, 
which uses Tanh, and using LeakyReLU activation in the discriminator for all layers. 
The key idea is to grow both the generator and discriminator progressively: starting from 
a  low  resolution,  new  layers  are  added  so  that  the  model  produces  increasingly  fine 
details as training progresses. This both speeds the training up and greatly stabilizes it, 
allowing us to produce images of unprecedented quality. 
Instead of starting from a random vector input, the generator starts from a learned input. 
StyleGAN’s generator, therefore, consists of two separate networks; one for mapping 
the latent space input to an intermediate space and one for synthesis. This mapped input 
is now used in the synthesis network and only added to specific layers that correspond 
to a “style”, i.e. glasses, male, facing left, etc. 

StyleGAN 

PGGAN 

AC-GAN  Adds an auxiliary class decoder network to discriminator to reconstruct class labels. By 
forcing a model to perform additional tasks is known to improve performance on the 
original task. This was the first model to measure discriminability using the Inception 
network, which is now standard in assessing the quality of synthetic images for GANs. 
Pix2pix introduced the embedding of whole images as an input to the generator instead 
of random noise allowing a paired, image-to-image translation.  

Pix2pix 

StarGAN 

StarGAN is a novel and scalable approach to perform image-to-image translation among 
multiple domains using a single model. It has a unified modeling architecture that allows 
simultaneous training of multiple datasets and different domains within a single network. 
cycleGAN  CycleGANs  introduced  cycle  consistency  and  identity  losses  allowing  for  unpaired 

training of pix2pix architecture. 

Reference 

[147] 

[148] 

[134] 

[149] 

[150] 

[151] 

[152] 

[153] 

[154] 

2.3.3 Image-to-Image Translation GANs 

Image-to-Image translation  is  the transformation  of one image domain to another while 

preserving content representations [155]. For example, we can convert a natural black and white 

image  of  Marilyn  Monroe  to  a  green  one  (Figure  21).  Notice  how  the  intrinsic  source  content 

(background, location of features, etc) are preserved while the extrinsic target style is transferred 

(grayscale to green). We can use a type of conditional GAN, namely pix2pix, UNIT, StarGAN, 

52 

 
and cycleGAN for this task. Unlike the original GAN which is unsupervised learning (trying to 

discover  the  underlying  distribution),  conditional  GANs  are  a  type  of  supervised  learning  (we 

define the expected outcome).   

Figure 21. Unpaired and Paired Images. Unpaired images do not require the structural constraints 
that paired images have. In other words, the position of the apples do not have to match the position 
of the apples. However, with paired images, we can speed up training by applying this constraint.  

The chief innovation for pix2pix is the use of paired images, using a U-net architecture for 

the generator, and a patchGAN architecture for the discriminator [152]. Paired images (Figure 21) 

consist  of two sets  of spatially identical  but  texturally different  images.  They are paired in  the 

sense that an image from set A only maps to the corresponding image in set B and are used together 

as inputs to the pix2pix discriminator. The use of paired images has shown to be more efficient in 

training and is more likely to converge as it makes the translation task more constrained [156]. The 

use of a U-net shaped generator with skip connections allows for better preservation of underlying 

structures. GANs are known to produce blurry images, since the discriminator tends to model low-

frequency  content  better  [157].  To  model  high  frequencies,  pix2pix  uses  a  discriminator  that 

classifies  localized  image  patches.  It  has  been  shown  that  by  querying  random  samples  from 

images  models  are  better  able  to  preserve  local  structure  [158].  The  average  probability  of  all 

patches  is  then used as the final  output of the discriminator. While results produce sharp, high 

53 

PairedUnpaired 
 
resolution,  and  generally  realistic  images,  the  need  for  a  paired  dataset  limits  pix2pix  to  very 

specific image translation applications. 

CycleGAN  was  developed  specifically  to  use  unpaired  training  data  and  utilizes  two 

generators  and  two  discriminators  in  its  network  with  a  specialized  cyclic  loss  [154].  One 

generator-discriminator pair aims to translate images from domain A to domain B while the other 

performs the opposite operation and translates images from B to A. The generator network and 

discriminator network are nearly identical to pix2pix, other than having two sets (Figure 22).  

Figure  22.  CycleGAN  architecture.  Two  pairs  of  generators  and  discriminators  are  trained 
simultaneously. One pair translates from domain A to domain B, and the other from domain B to 
domain  A.  A  cyclic  loss  is  computed  to  ensure  that  an  image,  when  translated  through  both 
generators, remains the same. Image courtesy of [159] under BSD license.  

Images  from  each  domain  are  fed  into  a  generator  to  transform  them  to  the  opposing 

domain (GA : X→Y and GB : Y→X). Then, a real image and the transformed image are given to 

the respective discriminator to judge which is real and which is fake. DA is therefore trained given 

samples  {𝑥𝑖}𝑖=1

𝑛  , 𝑥 ∈ 𝑋,  with  distributions  𝑥 ~ 𝑝𝑋(𝑥)    and  DB  is  trained  given  the  samples 

54 

 
 
𝑛

{𝑦𝑗}

𝑗=1

 , 𝑦 ∈ 𝑌,  with  distribution  𝑦 ~ 𝑝𝑌(𝑦),  where  X  and  Y  are  the  domains  of  the  unpaired 

images. Discriminator B’s loss function is therefore defined as  

LGAN(GA, DB, X, Y) = 𝔼y~pY(y)[logDB(y)] + 𝔼x~pX(x)[log(1 − DB(GA(x))] 

(13) 

and discriminator A’s loss function is 

LGAN(GB, DA, Y, X) = 𝔼y~pY(y)[log(1 − DA(GB(x))]   +   𝔼x~pX(x)[logDA(y)] 

(14) 

The use of the cyclic loss function is inspired by an intuitive idea in natural languages. 

When translating from one language to another (English → Spanish: Hello to Hola), we should 

expect the original input when translating back (Spanish → English: Hola to Hello). Therefore, we 

can add an additional loss term, originally called the cycle consistency loss, to the discriminator. 

To ensure cycle  consistency, the transformed image is  fed to the opposing generator,  and then 

compared  to  the  original  input  image.  In  other  words,  GA(GB(X))  =  X  and  GB(GA(Y))  =  Y. 

Therefore, the cycle consistency loss function is given by  

LCyc(GA, GB) = 𝔼x~pX(x) [‖F(G(x)) −  x‖
1

]   +   𝔼y~pY(y) [‖G(F(y)) −  y‖
1

] 

(15) 

Combined, the full loss function is  

Ltot(GA, GB, DA , DB) = LGAN(GA, DB, X, Y)   +   LGAN(GB, DA, Y, X)   + λLCyc(G, F)    

(16) 

where λ controls the relative importance of the cycle consistency loss. 

CycleGAN shows similar performance as pix2pix, but again uses unpaired images. This 

allows for easier data curation, and therefore can be applied to more image translation tasks. This 

is particularly useful for medical image translation, where data is notoriously difficult to acquire 

in  high  quantities.  However,  unpaired  images  makes  the  translation  more  unconstrained,  and 

therefore is less likely to converge and is susceptible to exploding gradients [156]. 

55 

 
 
 
 
 
 
 
2.3.4 Common quantitative evaluation metrics 

There are 4 common ways to evaluate GANs: (1) MAE/MSE, (2) Human Observer Study, 

(3)  Inception  Score/Frechet  Inception  Distance,  and  (4)  Downstream  Task  Performance  [160], 

[161]. If we have ground truth images, such as in pix2pix or cGAN,  we can measure the mean 

squared error between the generator prediction and those images. We can also use peak signal to 

noise  ratio  (PSNR),  Dice  similarity  coefficient,  Jaccard  similarity  index  (JI),  and  structural 

similarity  index  measure  (SSIM)  to  evaluate  image  similarity.  These  metrics  are  often  used  in 

segmentation  GANs.  Ideally,  we  would  have  expert  readers  assess  the  quality,  realism,  and 

diversity of the synthetic images, but this is generally the costliest option. The inception score (IS) 

and  Frechet  Inception  Distance  (FID)  both  use  the  pretrained  InceptionV3  network  [162]  to 

measure average conditional probability of our generated samples [163]. The main difference is 

that IS measures the difference between the predicted class probabilities of the generated images 

and the training set and FID measures the distance between the distributions of the last layer of the 

InceptionV3 network for synthetic and real images [164], [165]. The FID is sensitive to class mode 

dropping, with distances increasing with greater class discrepancies between the two distributions 

as the average set of features begins to differ [166]. Both IS and FID have been empirically shown 

to have high correlation with the quality and diversity of generated images and is consistent with 

human  judgments  [167]–[169].  Finally,  we  can  indirectly  measure  the  quality  of  the  generated 

images by using them for a downstream task such as object detection. If the generator is able to 

produce realistic and diverse results, then the use of synthetic images in the training set should 

improve the performance of the object detector. Using downstream metrics is the most practical 

evaluation of the quality of synthetic images since it is the goal of data augmentation. 

56 

 
 
 
2.4 Review of GANs applied to Medical Imaging Applications 

This section is organized by GAN task and then image modality. Due to the sheer number 

of articles published in this field, we have limited the scope of this review to papers published after 

2019. 

2.4.1 Image Reconstruction 

Over the years, reconstruction methods for medical images have changed from analytical 

to iterative to machine learned. There are many factors that affect the quality of the reconstructed 

image for various modalities, from  concentration of contrast  administered, amount of radiation 

administered, level of noise and artifacts, resolution, and sampling [170]. Many articles use GANs 

described  in  Table  8,  or  close  variations.  For  example,  Waheed  et  al  present  a  method  that  

architecturally is identical to an ACGAN, the only changes made is the size of the input layer of 

the  generator  [171].  Therefore,  it  would  be  classified  in  the  ACGAN  family.  However,  while 

Kamran  et  al  proposed  a  model  whose  architecture  sounds  similar  to  cycleGAN;  two 

discriminators and two generators, is would not be considered in the cycleGAN family because 

each pair of networks are not performing the opposite translation task  [172]. If the GAN is not 

closely related to any of the ones mentioned in Table 8, then the name used by the authors is simply 

given.  Here,  we  highlight  in  Table  8  GANs  that  mitigate  these  factors  and  are  effective  in 

compromised reconstruction. We focus on the way GANs are tailored for the reconstruction task 

and the novelty of the application. 

57 

 
 
 
Table  8.  Image  reconstruction  applications  using  GANs.  Generally,  authors  choose  to  either 
reconstruct  an image from raw data or from  a low-sample rate/noisy image. For reconstruction 
tasks, PSNR, SSIM, and MSE are commonly used to evaluate the image quality, but is left out of 
this table for simplicity. 
Modality  Reference  GAN  Variant 

Data Description and Network Architecture Novelty 

[173] 

or Family 
cGAN 

[174] 

cycleGAN 

I

R
M

[175] 

DCGAN 

[176] 

DA-GAN 

[177] 

cycleGAN 

A  conditional  GAN  was  trained  to  reconstruct  fully  sampled,  multi-
contrast  MRI  using  undersampled  acquisitions.  This  deviates  from  a 
traditional cGAN where the input is a latent vector with a class label. The 
proposed  approach  can  successfully  recover  pathologies  that  are  either 
missing  in  the  source  contrast  or  are  not  clearly  visible  in  the 
undersampled  acquisitions  of  the  target  contrast.  Additionally,  this 
reconstruction  method  was  50x  faster  than  traditional  reconstruction 
techniques. This network was trained and tested using publicly available 
datasets: MIDAS, IXI, and BRATS and found to have reasonable peak 
signal to noise ratio (PSNR), structural similarity index (SSIM), and root 
mean squared error (RMSE) values. 
Due to lack of fully sampled, high spatio-temporal resolution ground truth 
images  for  time-resolved  MR  angiography  (tMRA),  a  cycleGAN  was 
trained on sparsely sampled, aliased images and unpaired high-resolution 
MRI images. Unlike the traditional cycleGAN, which uses two pairs of 
generator-discriminators,  the  author’s  “Optimal  Transport-cycleGAN” 
only uses one pair and replaces one generator with a deterministic Fourier 
transform. Twelve sets of in vivo 3D DCE MRI data was acquired, which 
corresponds to 78,740 slices of data, 33,890 which were used for training. 
They demonstrated that undersampled images with reduced view sharing 
can be properly reconstructed using a cycleGAN. 
Knee MR scans were obtained from 19 patients. Each volume consists of 
320  2D  slices  that  were  divided  into  training,  validation,  and  test 
examples  with  a  70/15/15  split.  This  data  was  used  to  train  multiple 
variable autoencoders (generator) of varying sizes (1-4 residual blocks). 
They  demonstrated  that  multiple  recurrent  blocks  decrease  uncertainty, 
which suggests an effective way of promoting robustness. 
The original DA-GAN was improved by replacing a generator loss term 
with  the  Wasserstein  distance  and  adding  a  perceptual  loss  term.  The 
authors  used  the  MICCAI  2013  grand  challenge  diencephalon  dataset 
using  3  different  under  sampling  masks.  They  demonstrated  that  their 
version of the DA-GAN was superior at reconstruction when compared 
to other GANs such as Pixel-GAN and regular DA-GAN, and a non-GAN 
neural network called ADMM-Net. 
The general architecture presented in [174] remains the same with minor 
adjustments.  Namely,  the  authors  used  a  Wasserstein  GAN  loss  in  the 
generator.  They  validated  their  network  with  two  publicly  available 
datasets:  Human  Connectome  Project  and  the  FastMRI  dataset.  The 
authors show with both datasets that the addition of the Wasserstein loss 
function improves the PSNR and SSIM of the reconstruction.  

58 

 
 
 
 
 
 
Table 8 (cont’d) 

[178] 

DCGAN 

[179] 

RED-GAN 
[180] 

[181] 

RED-GAN 

[182] 

SA-GAN 

T
C

[183] 

WGAN 

[184] 

cycleGAN, 
IdentityGAN, 
and  GAN-
CIRCLE 

image 

regions; 

A  novel  two-generator  GAN  was  proposed  for  sparsely  sampled 
reconstruction. The first generator maps the sparse data to a fully sampled 
k-space. The second generator maps the IFT of the fully sampled k-space 
to a denoised and anti-aliased image. The proposed method was applied 
to  three  essentially  different  brain  MRI  datasets:  IXI,  the  2015 
Longitudinal  MS  Lesion  Segmentation  Challenge,  and  a  private  DCE-
MRI  dataset  of  stroke  and  brain-tumor  patients  acquired  at  the  Soroka 
University Medical Center. This method of reconstruction was compared 
to  conventional  Compressed  Sensing  MRI  (CS-MRI)  and  a  different 
deep-learning approach called ADMM-Net and was shown to be superior 
to both. 
A GAN to denoise low dose CTs (LDCT) was developed. The novelty 
comes  with  the  proposed  “Noise  Aware  Loss”  in  the  discriminator 
network.  A  patch-wise  mean  squared  error  (MSE)  loss  was  used  to 
mitigate gradient vanishing. Since CT images contains air in a significant 
region; after a few epochs of training, the MSE will be very small for all 
those 
subsequently,  gradient  update  during 
backpropagation will be insignificant, as the total loss is small. So, after 
a  few  epochs,  effective  training  will  stop  implicitly.  The  generator 
network  was  borrowed  from  [180].They  tested  this  network  on  the 
publicly available 2016 NIH-AAPM-Mayo Clinic LDCT dataset. It was 
shown to outperform other state of the art NN reconstruction methods in 
terms of PSNR and SSIM. 
Unlike standard GAN discriminators, the proposed DU-GAN (dual U-net 
GAN)  utilizes  a  U-Net-based  discriminator  for  LDCT  denoising, 
therefore  providing  per-pixel  feedback  and  the  global  structural 
difference 
two 
the  denoising  model.  Additionally, 
discriminators, one for both the image and gradient domain. The RED-
GAN generator was also used. The network was trained and tested using 
2016 NIHAAPM-Mayo Clinic LDCT dataset. 
The proposed self-attention GAN (SA-GAN) generator is composed of 
multiple layers of self-attention blocks sandwiched in between 3D conv 
layers. Each block computes the correlation matrix that represents spatial 
dependencies between any two positions within the input feature maps. 
Each position is calculated and updated by the weighted sum of all other 
positions. This is done both depth wise (between slices) and plane wise 
(within  slices).  The  network  was  trained  and  tested  using  the  2016 
NIHAAPM-Mayo Clinic LDCT dataset. 
The  generator  is  U-net  structure  and  has  9  layers  total,  with  2 
downsampling blocks, 5 residual blocks, and 2 upsampling blocks. The 
generator loss has three parts: the WGAN loss, an SSIM loss, and a L2 
loss  (MSE).  The  discriminator  consists  of  six  convolutional  layers  and 
three  fully-connected  layers. The  network  was  trained  and  tested  using 
the 2016 NIHAAPM-Mayo Clinic LDCT dataset. 
The  authors  compared  three previously described  GANs  for  the  use  of 
to  FDCT.  Among  CycleGAN, 
unpaired 
IdentityGAN, and GAN-CIRCLE, the latter achieves the best denoising 
performance  (Lowest  PSNR  and  SSIM)  with  the  shortest  computation 
time.  Subsequently,  GAN-CIRCLE  is  used  to  demonstrate  that  the 
increasing  number  of  training  patches  and  of  training  patients  can 
improve  denoising  performance.  The  network  was  trained  and  tested 
using the 2016 NIHAAPM-Mayo Clinic LDCT dataset. 

translation  of  LDCT 

there  are 

to 

59 

 
 
 
Table 8 (cont’d) 

[185] 

WGAN 

[187] 

cGAN 

[189] 

LSGAN 
[139] 

[190] 

Transformer-
GAN  

T
E
P

[191] 

DCGAN 

[193] 

SA-GAN 

The  generator  and  discriminator  networks  are  similar  to  previously 
described WGANs. The generator is U-net shaped and the discriminator 
is borrowed from  [186]. The authors proposed an additional perceptual 
loss  in  addition  to  the  WGAN  and  L2  losses  and  is  simply  the  feature 
vector given by VGG-19. The performance of the proposed DR-WGAN  
is compared to other previously described GANs and was better in terms 
of PSNR and SSIM. All GANs were trained and tested using the publicly 
available LUNA16 dataset. 
Unlike other deep residual generators with a perception loss, the proposed 
network DRL is a conditional GAN. The input is a LDCT image and uses 
a Sobel filtered LDCT image as the class label. The authors show that the 
edge detection layer improves the performance of the network, and that 
the  overall  network  outperforms  other  reconstruction  techniques.  The 
GAN  was  trained  and  tested  on  a  simulated  dataset  from  The  Cancer 
Imaging Archive  [188], as well as curated deceased piglet and thoracic 
CT datasets. 
The proposed model is an improvement of the LSGAN, which uses a least 
squares loss function for the discriminator. The novelty comes with the 
addition of a self-attention layer in the generator, as well as more residual 
blocks to better preserve structural details and edges. Whole body scans 
were  used,  where  the  ground  truth  images  were  produced  after  150s 
scanning as noise-free HC data, and the input images were produced after 
scanning for 75 s as low-count input data with noise. In terms of PSNR, 
the  proposed  model  is  only  slightly  better  than  other  neural  networks 
(CNN3D) and traditional noise reduction reconstruction algorithms such 
as  non-local  means  (NLM).  However,  it  significantly  outperforms  all 
other techniques when measured by SSIM. 
The proposed generator network comprises three components: (1) a CNN-
based encoder, (2) a transformer network used to model the long-range 
dependencies between the input sequences learned, and (3) a CNN-based 
decoder.  Residual  blocks  that  are  normally  found  in  a  DCGAN  are 
replaced  by  a  transformer,  which  is  useful  in  capturing  slice-to-slice 
information.  This was tested on a clinical dataset which includes eight 
normal control (NC) subjects and eight mild cognitive impairment (MCI) 
subjects, from which 729 large patches of size 64 × 64 × 64 are sampled. 
The proposed network is described as a 2.5D encoder-decoder since it is 
normally designed to take in a 3-channel image, but instead takes 3 single 
channel  slices  as  inputs.  A  feature  matching  layer  was  also  applied  to 
reduce  noise  and 
to  correct  pathological  features.  This  model 
outperformed  [192]  in  terms  of  PSNR  and  SSIM,  and  overall  image 
quality  is  rated  decent  by  radiologists.  The  model  was  trained  on forty 
PET  datasets  from  39  participants  where  ground  truth  samples  were 
reconstructed  as  the  standard-dose  and  1%  low-dose  PET  scans  were 
reconstructed using randomly undersampled data. 
The network is a modified self-attention GAN where the input consists of 
5  consecutive  slices  of  128x128  PET  images.  Similar  to  the  non-local 
means filter, models trained with a self-attention module implicitly learn 
to suppress irrelevant regions in an input image while highlighting salient 
features. Both simulated and real patient data was used to train and test 
this  model.  Compared  to  other  SA-GAN  skews  the  proposed  method 
leads to higher contrast in tumors and have sharper boundaries. 

60 

 
 
 
 
Table 8 (cont’d) 

[194] 

cycleGAN 

[195] 

Task-GAN 

Full count PET images and low count PET sinograms were the two target 
domains  for  image-to-image  translation.  The  cycleGAN  generator 
architecture remains largely the same from the original paper except for 
the input and output layer size (now larger) and the number  of residual 
blocks (5 instead of 3). A total of 30 clinical PET volumes (310 slices per 
volume) were used to train and test the model. Evaluation of PSNR and 
MSE  revealed  that  the  proposed  method  was  better  than  other 
reconstruction methods (expectation-maximation, NLM denoising, and a 
vanilla GAN). 
Here, a novel task-specific network is used in addition to the generator 
and discriminator. It aims to help regularize the training of the generator 
and complement the adversarial loss to ensure the output images better 
approximate the ground truth images. The task-network learns to refine 
the output of the generator to match the label of the image. For example, 
if the reconstructed image is supposed to have pathology present, the task-
network would refine the generator output to make pathologies “clearer”. 
40  ultra-low-dose  1%  PET  images  were  reconstructed  after  random 
undersampling.  Each  PET  volume  consists  of  89  2.78  mm-thick  slices 
with 256 × 256 pixels. 

2.4.2 Image Synthesis 

As  previously  mentioned,  GAN’s  most  promising  application  is  in  data  augmentation, 

where we can create diverse synthetic images to solve the problem of low volume labeled imaging 

datasets.  Presented  in  Table  9  are  GANs  used  for  medical  image  synthesis  for  improving 

downstream segmentation, classification, and detection tasks. If the articles use privately curated 

datasets, a brief description is provided; otherwise if the articles use publicly available datasets, 

only the name is provided. For simplicity, novelty of network architecture is not described, and 

the results are focused on the promising performance of GANs. 

Table 9. Image synthesis applications using GANs.  
Modality  Ref 

Data 

Generation and 
Downstream 
Task 
[196]  Cardiac MR 

images for 
segmentation 

I

R
M

GAN 
Variant  or 
Family 
DT-GAN 

Results 

Proposed  method  achieves  a  mean 
Hausdorff distance (HD) of 2.98 mm ± 
0.43 mm and a Dice score of 0.79 mm ± 
0.10 mm for myocardium segmentation, 
which 
to  a  previously 
described  23-layer  U-net  (HD  =  3.04 
mm ± 0.27 mm, Dice = 0.74 ± 0.04. 

is  superior 

33 short-axis 
cardiac MR 
image sequences, 
each 20 frames 
with 8 to 15 
slices, for a total 
of 10,022 images. 

61 

 
 
 
 
 
 
Table 9 (cont’d) 

[197]  Knee MRI for 

detection 

cGAN 

OAI dataset + 25 
heterogeneous 
MRIs locally 
collected through 
clinical routines 

Training a cGAN on the OAI dataset led 
to  poor  performance  on  the  clinical 
dataset  (Dice  =  0.519,  HD  =  6.23). 
However,  using  this  pretrained  model 
and  transfer  learning  to  the  clinical 
dataset improved the mean Dice score to 
0.819 and the mean HD to 1.46. 

[198]  Brain MRI 
with tumors 
for detection 

BRATS 2013 

[132]  Brain MRI 
with tumors 
for detection 

BRATS 2016 

[199]  Brain MRI 
with tumors 
for 
classification 

BRATS 2016 

PGGAN, 
SIMGAN, 
and UNIT 

FixedGAN  The  downstream  ROC  response  of  an 
object detector (ResNet 50) is evaluated 
given  images  synthesized  by  Fixed-
Point  GAN,  Star-GAN,  and  CAM. 
Fixed-Point  GAN  achieved  an  AUC  of 
0.98 and a sensitivity of 84.5% at 1 false 
positive  per 
image,  outperforming 
StarGAN who had an AUC of 0.46 and 
a  sensitivity  levels  of  13.6%.  CAM, 
however,  outperformed  both  with  an 
AUC of 0.99 and a sensitivity of 60% at 
0.037 false positives per image.  
The  Visual  Turing  test  of  the  all  three 
GAN  generated  images  revealed  that 
images  contained  realistic 
75%  of 
texture  and 
tumor  appearance.  The 
downstream ROC response of an object 
detector (ResNet 50) was also evaluated 
given augmented data. Without synthetic 
images,  ResNet  50 
an 
accuracy,  sensitivity,  and  specificity  of 
93.14,  90.91,  and  95.85,  respectively. 
The addition of augmented data, whether 
it be through classical transformations or 
with GANs, all improved object detector 
performance.  The  addition  of  classical 
DA  and  UNIT  images  resulted  in  the 
highest  accuracy  and  sensitivity  (96.7 
and 97.48, respectively). 
This model was an improvement on the 
[132].  All 
PGGAN  described 
evaluation methods remained the  same. 
Without  synthetic  images,  ResNet  50 
achieved  an  accuracy,  sensitivity,  and 
specificity  of  90.06,  85.27,  and  97.04, 
respectively.  This  improved  to  91.08, 
86.60,  and  97.60  with  the  addition  of 
PGGAN synthetic data. 

achieved 

PGGAN 

in 

62 

 
 
 
 
 
 
Table 9 (cont’d) 

[200]  3D Brain MRI 

for detection 

ADNI and 
BRATS 2018 

[201]  Brain MRI 
with 
Parkinson’s for 
classification 

PPMI Dataset 

[202]  Diffusion MRI 

for 
classification 

Human 
Connectome 
Project 

[172]  Vessel 

segmentation 
map of retinal 
fundus images 

DRIVE, CHASE-
DB1, and STARE 

s
u
d
n
u
F

l
a
n
i
t
e
R

[203]  Vessel 

DRIVE, STARE 

segmentation 
map of retinal 
fundus images 

[204]  Vessel 

DRIVE, STARE 

segmentation 
map of retinal 
fundus images 

images, 

3D α-GAN  To quantitatively evaluate  synthetic 3D 
MRI 
the  maximum  mean 
discrepancy  (MMD)  and  multi-scale 
SSIM  metrics  were  used.  Compared  to 
other 3D GANs such as 3D-VAE-GAN 
and 3D-WGAN-GP, the proposed model 
achieved the lowest MMD of 0.072 and 
the  second  closest  MS-SSIM  to  real 
images (0.829). 
The  pre-trained  Le-Net-5  network  was 
used as a classifier to detect Parkinson’s 
from  brain  MRIs.  Synthetic  GAN  data 
were  added  to  the  training  set  which 
yielded  an  accuracy,  specificity,  and 
sensitivity  of  88,  87.14,  and  87.92, 
respectively,  compared  to  a  baseline  of 
84.67,  83.76,  and  84.13  without 
synthetic data. 

Modified 
DC GAN 

for 

synthetic 

RV-GAN 

outperforms 

cycleGAN  Using a cycleGAN to translate between 
structural and diffusion MRI images, the 
authors were able to achieve reasonable 
MSSIM values (0.839 ± 0.014 and 0.937 
±  0.008 
fractional 
anisotropy and mean diffusivity images, 
respectively).  The  authors  did  not 
provide a baseline to compare to. 
RV-GAN 
other 
segmentation  neural  networks  and 
GANs for all 3 datasets. RV-GAN beats 
traditional U-net, DenseNet, IterNet, and 
SUD-GAN  in  all  metrics,  with  F1, 
specificity, accuracy, AUC, mean IOU, 
and SSIM of 0.869, 0.996, 0.979, 0.988, 
0.976, 
respectively. 
0.9237, 
Surprisingly, RV-GAN only loses to M-
GAN  in  terms  of  sensitivity  (0.793  vs 
0.835). 
DRPAN  showed  to  match  or  barely 
exceed  other 
leading  segmentation 
algorithms.  All  CNN  performance  in 
terms  of  accuracy,  sensitivity,  and 
specificity  were  within  0.01,  and  were 
not statistically significant. 

DRPAN 

and 

RetinaGAN  RetinaGAN 

statistically 
achieves 
significant  improvement  in  AUROC, 
precision,  and  recall,  surpassing  the 
current state-of-the-art method by 0.2 − 
1.0%  in  ROC  and  0.8  −  1.2%  in 
precision and 0.5 − 0.7% in recall. 

63 

 
 
 
 
 
 
 
Table 9 (cont’d) 

4590 scanning 
laser 
ophthalmoscopy 
(SLO) images of 
size 2600 x 2048 
were captured 
from teenage (< 
18yo) patients. 
DRIVE and 
DRISHTI-GS 

[205]  SLO images 
with fundus 
disease for 
classification 

[206]  Vessel 

segmentation 
map of retinal 
fundus images 
for 
segmentation 

AMD-
GAN 

Compared  to  ResNet  50,  the  proposed 
AMD-GAN classifier resulted in higher 
recall  and  F1 
accuracy,  precision, 
(ResNet 50 = 77.13, 68.90, 68.42, 68.65, 
87.10,  AMD-GAN  =  84.75,  79.15, 
82.15, 80.41, 97.25, respectively). 

Pix2pix, 
cycleGAN 

for 

skews 

retinal 

Two  types  of  image-translation  models 
were  compared  in  generating  vessel 
segmentations 
fundus 
from 
the 
images.  Multiple 
generators  were  also  explored  (U-net, 
ResNet6,  and  ResNet9).  The  best 
performing  method  for  PSNR  was  the 
pix2pix with ResNet9 generator (25.36) 
with 
the  worst  performing  model 
cycleGAN  with  the  ResNet9  generator 
(21.9). The best performing method for 
SSIM was pix2pix with U-Net generator 
(0.911)  and  the  worst  performing  was 
the  cycleGAN  with  ResNet9  generator 
(0.877). 
Here, a model to generate fundus images 
with  controllable  diabetic  retinopathy 
severity  levels,  which  can  be  used  to 
augment 
the 
performance of the DR grading models 
by mitigating class imbalance. VGG16, 
ResNet 50, and InceptionV3 were  used 
to assess the  classification performance 
with/without  the  addition  of  synthetic 
images.  Overall,  DR-GAN  seems  to 
solve  the  class  imbalance  problem  and 
the  addition  of 
images 
significantly  improves  the  accuracy  of 
all three classifiers. 
SkrGAN achieves a MS-SSIM of 0.614, 
and  FID  of  27.59,  which  are  all  better 
than  DCGAN,  ACGAN,  WGAN,  and 
PGGAN.  Additionally, 
a  U-net 
segmentation  model  performed  better 
with 
images 
(sensitivity  =  0.846,  accuracy  =  0.951) 
than  without 
(sensitivity  =  0.778, 
accuracy = 0.948). 

images  and 

generated 

synthetic 

SkrGAN 

improve 

[207]  Retinal fundus 

EyePACS 

DR-GAN 

images with 
different 
grades of 
diabetic 
retinopathy for 
classification 

[208]  Vessel 

segmentation 
map of retinal 
fundus images 

SkrGAN 

Retinal color 
fundus dataset 
with 6,432 retinal 
images was 
collected from 
local hospitals. 
DRIVE was also 
used for 
segmentation 
tasks. 

64 

 
 
 
 
 
 
Table 9 (cont’d) 

[209]  Optical photos 
of skin cancer 
lesions for 
segmentation 

ISIC Skin Lesion 
Challenge Dataset  

DAGAN 

y
p
o
c
s
o
m
r
e
D

[210]  Optical photos 
of skin cancer 
lesions for 
classification 

ISIC Skin Lesion 
Challenge Dataset 

cGAN 

[131]  Optical photos 
of skin cancer 
lesions for 
segmentation 

SMARTSKINS, 
Dermofit image 
library, ISIC 
Challenge dataset 

cycleGAN 

The  authors  compared  the  proposed 
DAGAN segmentation network with U-
net  and  its  many  variations.  Overall, 
highest  Dice 
DAGAN 
the 
had 
coefficient  (0.859)  where 
the  next 
highest was a U-net with skip and dense 
(0.832).  DAGAN  also 
connections 
highest  accuracy  and  specificity,  but 
only the second highest sensitivity.  
In  this  paper,  a  CNN  was  trained 
with/without  synthetic  data  to  classify 
skin  cancer  into  benign  or  malignant. 
Without GAN images, the CNN had an 
accuracy, sensitivity, specificity, and F1-
score  of  53%,  0.51,  0.57,  and  0.5, 
respectively.  With  GAN  images,  the 
CNN  had  an  accuracy,  sensitivity, 
specificity,  and  F1-score  of  71%,  0.68, 
0.74, and 0.7, respectively. 
The  cycleGAN  generated  segmentation 
maps was compared to simple adaptive 
thresholding, and other neural networks 
(Gossip  and  Reduce  Mobile  Deeplab). 
Both  cycleGAN  Dice  coefficient  and 
Jaccard Index (JI) were superior to other 
segmentation 
techniques  (92.74  and 
86.7, respectively). 

[211]  X-ray images 

with bone 
lesions for 
detection 

[166]  X-ray images 
with various 
diseases for 
classification 

y
a
r
-

X

514 adult X-ray 
images of tibia, 
humerus, and 
femur bone 
lesions 

ChestX-ray8 

PGGAN 

cycleGAN  A trained CNN classifier had a baseline 
lesion  detection  sensitivity,  specificity, 
and  AUC  of  0.9,  0.776,  and  0.876, 
respectively. The CNN trained with real 
and  cycleGAN  generated  data  yielded 
values  of  0.84,  0.842,  and  0.924, 
respectively. 
FID  score 
indicated  generally  high 
realism  and  quality  for  most  disease 
classes. Overall synthetic images had an 
FID of 8.02, where closer to 0 is ideal. 
Images  that  are  supposed  to  contain 
edema or pneumonia were found to be of 
less  quality  (FID  59.4  and  32.05, 
images  were 
respectively).  Real 
identified  as  such  by  radiologists  73% 
(95%  CI  63,  82)  of  the  time,  while 
generates  were  identified  as  real  61% 
(95% CI 51, 70) of the time, with both 
groups  more  likely  than  chance  to  be 
identified as real. 

65 

 
 
 
 
 
 
Table 9 (cont’d) 

[171]  Chest X-ray 

with Covid for 
detection 

IEEE Covid 
Chest X-ray 
dataset, COVID-
19 Radiography 
Database, and 
COVID-19 Chest 
X-ray Dataset 
Initiative 

[212]  Mammography 

INbreast dataset 

images of 
varying levels 
of breast 
density for 
classification 
and 
segmentation 

[213]  Mammogram 
mass images 
for 
segmentation 

INbreast dataset, 
and a private 
dataset of 549 
mammograms 
containing 376 
mass regions. 

[214]  Rib suppressed 
chest X-ray for 
disease 
detection 

LIDC-IDRI, 
TianChi AI 
competition 2017, 
2019 

[215]  Chest x-ray for 
disease 
classification 

CheXpert 

in 

cGAN 

cGAN 

learned 

CovidGAN  A  variant  of  the  ACGAN  developed  to 
generate  synthetic,  Covid-19  positive 
chest  x-rays.  A  VGG16  detector  was 
with/without 
transfer 
CovidGAN 
Accuracy, 
images. 
sensitivity, and specificity of the VGG16 
classifier improved from 0.85, 0.69, and 
0.95  to  0.95,  0.90,  0.97,  respectively, 
when  adding 
synthetic  chest 
the 
radiographs. 
The best performing model segment the 
dense  regions  well  with  an  accuracy, 
Dice  coefficient,  JI  of  98%,  88%,  and 
78%,  respectively.  Compared  to  other 
NN segmentation techniques, the cGAN 
the  highest  precision, 
resulted 
sensitivity,  and  specificity  of  97.85%, 
97.85%,  and  99.28%,  respectively,  for 
breast density classification. FCN-8 and 
VGG16  yielded  0.748,  0.997,  0.69  and 
0.832, 0.996, 0.66, respectively. 
segmentation  model 
The  U-net 
the  best  when  using  a 
performed 
combination  of 
INbreast,  privately 
curated  data,  and  cGAN  generated 
images  with  a  JI,  Dice  score,  and 
accuracy  of  79.35,  88.2,  and  88.8, 
the  additional 
respectively.  Without 
the  segmentation 
synthetic 
performance  fell  to  77.23,  86.77,and 
87.29, respectively. 
The  important  metric  to  note  is  Weber 
Contrast  (WB),  which  provides  an 
estimation 
rib-suppression 
performance on the boundaries for chest 
x-ray images by calculating the contrast 
gap  between  the  rib-suppressed  region 
and  the  background,  where  a  lower 
contrast  gap  is  more  valuable.  RSGAN 
yielded in a WB of 1.96 while other NNs 
yielded a WB of 3.49 (U-net) and 2.36 
(ResNet). 
The DenseNet-121 pretrained network is 
used 
disease 
classification.  A  cGAN  was  developed 
to augment chest x-rays with either lung 
lesions,  pleural  effusion,  or  fractures. 
Overall, the addition of synthetic images 
only  improves  performance  in  low-
volume scenarios (<10% of real dataset). 
The authors found that training with only 
their  full  dataset  is  better  than  training 
with augmented data. 

cGAN, 
DenseNet 

RSGAN 

images, 

transfer 

learn 

of 

to 

66 

 
 
 
Table 9 (cont’d) 

[216]  Chest X-ray 

for lung 
segmentation 

JSRT and 
Montgomery 
County Datasets 

[217]  Rib suppressed 
chest X-ray for 
disease 
detection 

JSRT dataset 

[218]  High 

resolution CT 
image 
reconstruction 

29 post-registered 
ankle CT scans of 
low- and high 
resolution which 
resulted in 14,000 
matching pairs of 
low- and high-
resolution patches 
of size 64×64 

[219]  CT scans for 
organ 
segmentation  

NIH Pancreas-CT 
dataset 

[220]  CT scans for 
nodule 
segmentation 

LIDC-IDRI 

[221]  CT scans for 
nodule 
segmentation 

LIDC-IDRI 

[222]  Low dose to 

Standard dose 
PET for 
detection 
(LPET to 
SPET) 

Phantom Brain 
Dataset and Real 
Human Brain 
Dataset 

T
C

T
E
P

cGAN 

SFRM-
GAN 

GAN-
CIRCLE 

The proposed auxiliary U-net GAN with 
multiple  residual  blocksresulted  in  a 
Dice score of 0.979 when training on the 
JSRT  dataset,  where  a  normal  U-net 
yielded 0.946. 
The  proposed  model  is  a  mix  between 
WGAN-GP  and  pix2pix.  The  proposed 
model  resulted  in  a  mean  PSNR  of 
43.588,  mean  RMSE  of  0.00025,  and 
mean  SSIM  of  0.989.  Compared  to 
pix2pix which resulted in 41.37, 0.0004, 
and 0.982, respectively. 
The goal of this study was to reconstruct 
high  resolution  CT  images  to  assess 
trabecular  bone  microstructures.  To 
evaluate  the  GAN,  they  chose  to  use 
concordance 
coefficient 
correlation 
(CCC) to measure the agreement in the 
microstructure.  Evaluating  real 
low-
resolution images, the Tb thickness and 
Tb  volume  CCC  scores  were  0.66  and 
0.83, respectively. This increased to 0.95 
and  0.88  when  evaluating  synthetic 
high-resolution images. 
cycleGAN  When the kidney model was trained with 
CycleGAN  augmentation 
techniques, 
performance 
increased  dramatically 
(from  a  Dice  score  of  0.09  to  0.66, 
p < 0.001).  Improvements  for  the  liver 
and  spleen  were  smaller,  from  0.86  to 
0.89 and 0.65 to 0.69, respectively. 
This  study  compared  its  AUGAN  to 
other  segmentation  networks  (FCN,  U-
Net,  and  U-net  GAN).  For  Dice 
coefficient, AUGAN yielded the highest 
score  of  0.849  while  the  U-net  GAN 
yielded  a  the  second  highest  score  of 
0.835. Similar performance is seen when 
evaluating JI, with AUGAN at 0.750 and 
U-net at 0.733. 
Only  qualitative  approaches  were  used 
to  assess  image  quality  and  nodule 
realism. Generally, the cGAN generated 
images do appear realistic with a diverse 
set of nodule presentations. The nodules 
also appear to be tunable in size. 
The  proposed  AR-GAN  outperforms 
other  state-of-the-art  GANs  in  PSNR 
and  SSIM 
0.891, 
(28.106 
respectively).  Stack-GAN,  GDL-GAN, 
and  Ea-GAN  resulted  in  PSNR,  26.77, 
27.07,  and  26.39,  and  SSIM  values  of 
0.884, 0.886, and 0.882, respectively. 

3D cGAN 

AR-GAN 

AUGAN 

and 

67 

 
 
 
 
 
Table 9 (cont’d) 

[223]  PET images 
for 
Alzheimer’s 
classification 

[224]  High 

resolution US 
images for 
segmentation 

ADNI1 

DCGAN 

PGGAN 
(named 
spGAN) 

Privately curated 
set of B-mode US 
images contains 
6054 of chest, 
1231 of hip joints, 
and 3261 of 
ovaries 

[225]  Breast US for 
cancer 
segmentation 

DBUI, SPDBUI, 
ADBUI 

ASS-GAN 

to  standard 

The  authors  provided  no  other  neural 
network  as  a  point  of  comparison.  The 
mean  PSNR  of  the  DCGAN  generated 
images  was  32.83  and  the  SSIM  was 
77.48. 
Standard  interpolation  was  used  as  the 
baseline  method  for  generating  high-
resolution  US.  For  the  chest  and  ovary 
datasets, SpGAN had better FID SSIM, 
and LPIPS scores (chest = 36.36, 0.751, 
0.168,  ovary  =  47.11,  0.497,  0.795) 
interpolation 
compared 
(chest  =  65.41,  0.7428,  0.210,  ovary  = 
63.15, 0.4332, 0.8901). For the hip joint 
dataset, the interpolation technique only 
surpassed spGAN in SSIM. 
The  ASS-GAN  was  evaluated  on  IoU, 
accuracy,  and  Dice  and  compared  to 
other NN segmentation methods (U-net, 
The 
DeepLabV3, 
proposed method outperformed all other 
NNs with an IoU, acc, and Dice  scores 
of 0.7683, 0.9760, 0.8690, respectively, 
for the DBUI dataset. The performance 
was  approximately  the  same  for  other 
dataset, and ASS-GAN was still the best 
for each.  

AttenU-Net). 

d
n
u
o
s
a
r
t
l

U

[226]  Breast US for 
radiomics and 
cancer 
classification 

Privately curated 
dataset includes 
1447 tumor-
present images 
from 357 female 
patients. 

[227]  Thyroid US for 
nodule 
classification 

TDID 

[228]  US images for 
bone 
segmentation 

Privately curated 
dataset containing 
1235 in vivo B-
mode US images 
containing either 
radius, femur, 
spine, or tibia. 

Res-GAN 

TripleGAN  Breast  mass  classification  accuracy 
reached  90.41%,  sensitivity  87.94%, 
specificity  85.86%  with  TripleGAN 
synthetic data + real data. Compared to 
other  state-of-the-art  methods  such  as 
the 
GAN,  DCGAN,  and  InfoGAN, 
proposed  method  had  significantly 
higher metrics. 
Thyroid  nodules  were  classified  by  a 
either  ResNet18  or  Res-GAN 
into 
malignant  or  benign.  Res-GAN  had  a 
classification  accuracy,  specificity  of 
92.2, 86.5, and 95, respectively, which is 
significantly  higher 
than  ResNet18 
(82.2, 66.2, 89.8, respectively).  
and  WGAN 
Pix2pix,  DCGAN, 
segmentation 
were 
compared  to  the  proposed  patchGAN 
IoU  of  patchGAN 
network.  The 
generated 
segmentation  maps  was 
0.9357 with a Dice score of 0.9640. This 
was  significantly  better 
than  other 
models.  WGAN  performed  the  worst 
with  an  IoU  of  0.8726  and  a  Dice  of 
0.9158. 

performance 

patchGAN 

68 

 
 
 
It is important to note that while most studies have shown that the addition of augmented 

data will improve the performance of NNs, to date, most gains are only moderate. Most commonly, 

synthetic data results in 1-4% increase in sensitivity, specificity, accuracy, or respective metric. 

Additionally, while more complex GANs correlate to better downstream performance, this also 

means  that  the  networks  become  less  generalizable  as  they  are  heavily  tailored  to  a  singular 

application.  

2.4.3 Cross-Modality Translation 

Cross-modality translation  refers to  the generation of an image of one  medical  imaging 

modality from that of another, for example, CT to MR. This is beneficial for the patient as it may 

decrease their number of scans, decrease the risks from imaging (contrast reactions, radiation dose, 

etc),  and  decrease  healthcare  costs.  Healthcare  providers  may  also  benefit  from  cross-modality 

translation as it will increase patient throughput and decrease scanning turnaround time, along with 

allowing for less radiotracer production runs [229]. Additionally, registration mismatch between 

modalities will be eliminated if translation models are used. Highlighted in Table 10 are recent 

developments in cross-modality translation. Again, for simplicity, the network architecture of the 

GANs is not included. 

Table  10.  Cross-modality  image  translation  applications  using  GANs.  Interestingly,  while  MR 
and PET are the more expensive and less available modality, most studies are trying to derive CT 
images from either MR or PET. 
Modalities  Ref 

Results 

Data 

GAN  Variant 
or Family 
BiGAN 

[230]  ADNI 

T
E
P
d
n
a
R
M

Given  MR  brain  images,  BiGAN  outputs  the 
corresponding  PET  image.  BiGAN  outperforms 
(cycleGAN)  with  the  highest  PSNR  (27.36)  and 
SSIM  (0.88),  indicating  that  the  quality  of  the 
synthetic  images  derived  from  the  proposed 
method  is  closest  to  the  real  PET  images. 
CycleGAN  generated  PET  images  resulted  in  a 
PSNR of 24.68 and a SSIM of 0.78.  

69 

 
 
 
 
 
 
 
Table 10 (cont’d) 

[231]  ADNI 

GLA-GAN 

[232]  ADNI 

E-GAN 

[233]  ADNI 

TPA-GAN 

[234]  ADNI 

GANBERT 

[235] 

[236] 

Privately  curated  50 
contrast-
3D 
enhanced 
thoracic-
abdominal  CT  and 
abdominal 
MRI 
images. 
SpineWeb library 

cycleGAN  

cycleGAN 

R
M
d
n
a
T
C

70 

for 

images 

The GLA-GAN produces FDG PET images given 
the  downstream 
structural  MR 
disease.  A 
of  Alzheimer’s 
classification 
comparison between U-net, cycleGAN, and GLA-
GAN  is  made.  GLA-GAN  is  significantly  better 
than other methods in terms of SSIM, PSNR, and 
MAE (96.88, 29.32, and 0.014, respectively). U-
net performed better than CycleGAN in  terms of 
SSIM and PSNR but had a higher MAE. 
E-GAN  transforms  FDG-PET  images  to  T1 
weighted  MRI  images.  DCGAN,  WGAN,  and 
pix2pix  were  also  trained  to  perform  this  task. 
PSNR, SSIM, and MAE are used to evaluate all 4 
GANs.  E-GAN  performs  the  best  with  scores  of 
28.16,  0.75,  and  105,  respectively.  Pix2pix 
performed  the  second  best  in  all  metrics,  with 
PSNR,  SSIM,  and  MAE  scores  of  24.76,  0.61, 
295, respectively. 
The  goal  of  TPA-GAN  is  to  use  3D  MRI  to 
generate  corresponding  FDG  PET  volume  and 
then use both volumes to classify brain diseases. 
TPA-GAN  PET  generation  was  compared  to 
vanilla GAN, AttentionGAN, cycleGAN, and PA-
GAN. TPA-GAN images results in a mean SSIM, 
PSNR,  and  MSE  of  0.915,  29.0,  and  184, 
respectively.  The  next  best  performing  model 
(PAGAN)  resulted  in  scores  of  0.913,  28.5,  and 
204, respectively. 
BERT  was  incorporated  into  a  standard  U-net 
GAN as a secondary discriminator. Two variations 
of GANBERT were trained; one with a CNN and 
BERT discriminator, and one with only BERT as 
the discriminator. Both models were compared to 
a  standard  pix2pix  network.  GANBERT  +  CNN 
yielded a PSNR, SSI, and RMSE of 57.58. 0.27. 
and  0.80,  respectively.  GANBERT-only  yielded 
56.53,  0.31,  and  0.91,  and  pix2pix  resulted  in 
scores of 50.41, 0.0, and 1.85, respectively. 
This  two-stage  organ  detection  method  uses 
cycleGAN  to  translate  images  from  MR  to  CT, 
which  were  then  used  to  train  a  separate  NN 
(Yolov5) to detect organs. Mean average precision 
of  Yolov5  decreased  from  8.66mm  to  7.95mm 
when adding synthetic data. 
Another cycleGAN was used to translate 3D MR 
to 3D CT volumes. The cycleGAN generated CT 
images resulted in an average Dice coefficient of 
0.83  and  a  mean  landmark  error  of  2.2mm.  The 
authors  provided  baseline  comparison  for  these 
scores.  However,  they  did  qualitatively  compare 
another  GAN  [237]  single-slice  model  provide  a 
comparison  

 
 
 
 
 
 
 
Table 10 (cont’d) 

[238] 

Privately 
curated, 
same-day  MR  and 
images  were 
CT 
acquired. 

cycleGAN 

[239]  Atlas project 

cGAN 

cGAN 

[240] 

curated 
Privately 
from  169 
images 
patients 
with 
established  coronary 
artery 
disease 
undergoing  hybrid 
coronary  18F-NaF 
PET and contrast CT 
angiography. 

[241]  A  privately  curated 
dataset  contained  60 
CT  and  PET  was 
constrained  to  slices 
in  the  region  of  the 
liver. 
curated 
Privately 
dataset 
containing 
1935  brain  PET  and 
CT scans were taken. 

[242] 

T
C
d
n
a
T
E
P

The  proposed  modified  cycleGAN  generated  CT 
images were compared to a normal cycleGAN, a 
cGAN,  and  a  DCNN.  For  MAE  and  PSNR,  the 
modified  cycleGAN  was  superior  with  scores  of 
0.0416  and  36.11,  respectively.  The  next  best 
model was the normal cycleGAN with an MAE of 
0.0465 and a PSNR of 37.10. 
The  proposed  cGAN  borrows 
its  generator 
architecture from pix2pix architecture but has 6 or 
9 residual blocks. SSIM, MAE, PSNR, and MSE 
are compared between the cGAN, pix2pix, and a 
U-net. The pix2pix model is worse than both the 6, 
and  9  block  cGAN.  The  9  residual  block  cGAN 
produces the highest quality images, with PSNR, 
SSIM,  MAE,  and  MSE  of  21.4,  0.823,  .0.3,  and 
0.01, respectively. 
A  cGAN  was  trained  to  synthesize  CT  images 
from  PET.  Target-to-background  (TBR)  and 
standardized  update  values  (SUV)  were  used  to 
assess the quality of the CT image alignment given 
the ground truth CT. The correlation of TBR for 
the  cGAN  was  0.31  and  the  SUV  was  0.26  to 
human 
excellent 
correlation of observer and the proposed method. 
Additionally, the generation time of the GAN was 
only 27.5 seconds on average, which is 33 times 
faster than humans. 

registration, 

indicating 

cycleMEDGA
N 

FCN + cGAN  MAE  and  PSNR  was  used  to  evaluate  the 
performance of FCN, cGAN, and a fusion of both 
in generating PET from CT images. On average, 
the combined network achieved the lowest MAE 
(0.72) and highest PSNR (30.22), with FCN at a 
close second (MAE = 0.74 and PSNR = 30.05). 
Here,  authors  expanded  on 
their  previous 
MEDGAN  [130]  by  adding  a  cycle  consistency 
loss  as  well  as  a  cycleGAN  network  structure. 
Here, cycleMEDGAN aims to translate from PET 
to CT. It also outperformed normal cycleGAN and 
UNIT models in terms of SSIM and PSNR (0.911 
and  24.08,  respectively).  The  normal  cycleGAN 
achieved  the  second  best  performance  with  a 
SSIM of 0.896 and a PSNR of 23.35. 

In  conclusion,  despite  being  a  recent  innovation,  GANs  have  been  widely  applied  to 

numerous  medical  imaging  applications.  GANs  and  their  variants  have  had  great  success  in 

augmenting  data  in  multiple  modalities  and  have  been  used  for  downstream  reconstruction, 

segmentation, classification, and detection tasks, among others. Consistently, studies have shown 

71 

 
 
 
 
 
 
that  the  addition  of  synthetically  generated  images  to  the  training  set  of  other  NNs  improves 

performance. However, there are still challenges that need  to be addressed, such as the limited 

generalizability of models. The above mentioned GANs are highly specialized and often do not 

translate  well  to  other  image  generation  tasks.  Also,  very  few  studies  have  verified  the  image 

quality and realism of the generated medical images using a) human observers or b) a task-based 

assessment  against  human  performance  due  to  the  challenge  of  performing  these  comparisons. 

Finally,  only  a  few  studies  reported  whether  a  certain  ratio  of  synthetic  to  real  images  yielded 

optimal performance. Nonetheless, the progress made in the field of GANs in medical imaging is 

promising and holds significant potential for improving the accuracy and efficiency of medical 

diagnosis and treatment. 

72 

 
 
 
CHAPTER 3: NEAR-PAIR PATCH GENERATIVE ADVERSARIAL NETWORK  

These efforts were partially presented at the IEEE Nuclear Science Symposium, Medical 

Imaging Conference 2022, and has been submitted to the SPIE Journal of Medical Imaging.  

Data  scarcity  in  machine  learning  medical  imaging  applications  is  a  major  issue.  Deep 

learning-based methods require a large volume of training data, which may be difficult to acquire 

due to several reasons, such as lack of standardization, lengthy curation process, accessing HIPAA 

compliant images, and the need for expert labeling. Data augmentation is a common solution to 

mitigate this issue. Recently, sophisticated data augmentation methods are based on a class of NNs 

called Generative Adversarial Networks (GANs), which generate new images of high perceptual 

quality.  Here,  we  present  a  method  to  support  distant  supervision  of  object  detectors  using 

generated  synthetic  pathology-present  labeled  images.  Our  proposed  method,  named  near-pair 

patch cycleGAN (NPP-cycleGAN), employs the previously proposed cyclic generative adversarial 

network  (cycleGAN)  with  two  primary  innovations:  1)  use  of  “near-pair”  pathology-present 

regions and similar pathology-absent regions for training and 2) the addition of a realism metric 

(Fréchet  Inception  Distance)  to  the  generator  loss  term.  The  NPP-cycleGAN  is  then  used  to 

augment data by synthetically generating pathology on pathology-absent images, which can then 

be used to train object detectors. We train and test the method with 2800 fracture-present image 

patches from 1109 unique pediatric chest radiographs. In a blinded observer study, we presented 

four expert pediatric radiologists with either a real fracture absent image, a real fracture present 

image, or a synthetic fracture present image and asked them to score 1-5 the likelihood of a fracture 

(1 = Definitely not a fracture, 5 = Definitely a fracture). Results showed that real fracture absent 

images scored 1.71 ± 0.99, real fracture present images 4.14 ± 1.23, and synthetic fracture present 

a 2.51 ± 1.24. These results suggest that the proposed GAN can generate high quality fracture-

73 

 
present pediatric chest radiographs.  

3.1 Introduction 

Rib fractures in pediatric patients are a sentinel injury for non-accidental trauma, making 

accurate  and  timely  detection  of  these  injuries  crucial  in  protecting  the  well-being  of  children. 

Unfortunately, between 80-100% of rib fracture cases in young children are a result of child abuse 

[243], [244]. This statistic is particularly concerning because over two-thirds of rib fractures can 

be  missed  during  first  reads  by  radiologists  [245],  and  in  our  recently  published  study  expert 

radiologists  achieved  a  reader-to-reader  F2  score  of  only  0.73  [246].  While  the  importance  of 

detecting rib fractures is high, it is a particularly difficult task even for experienced radiologists. 

The difficulty of detection is partially due to the diverse presentation of fractures. While certain 

fractures are easy to detect, with obvious signs of bony displacement and/or healing (including 

subperiosteal  new  bone  formation,  callus  bridging,  and  medullary  sclerosis)  (Figure  23), 

challenging fractures are much less conspicuous and more difficult to diagnose, showing little to 

no signs of bony displacement or healing. 

Figure  23.  Examples  of  apparent  and  challenging  fractures.  Transverse  fractures  are  readily 
apparent vs. challenging fractures that show no sign of displacement. 

Recent studies have shown machine learning models can match human performance in the 

detection of rib fractures in children. One study trained a deep convolutional neural network on a 

dataset of 845 CT scans from children achieved a sensitivity of 43% and a specificity of 88% in 

74 

Figure 1: Examples of apparent and challenging fractures Apparent fracturesChallenging fracturesAutomatic Rib Fracture Detection in Pediatric Radiography to Identify Non-Accidental Trauma , R21HD097609 (PI: Alessio)Apparent fracturesChallenging fracturesAutomatic Rib Fracture Detection in Pediatric Radiography to Identify Non-Accidental Trauma , R21HD097609 (PI: Alessio)Apparent fracturesChallenging fracturesAutomatic Rib Fracture Detection in Pediatric Radiography to Identify Non-Accidental Trauma , R21HD097609 (PI: Alessio)Challenging FracturesApparent Fractures 
 
 
their  test  set  [247].  Another  study  used  similar  techniques  but  was  trained  on  a  dataset  of  300 

radiographs and achieved a sensitivity of 91.3% and a specificity of 90% [248]. Our previous study 

proposed  a  method  entitled  “avalanche  decision”  was  motivated  by  the  reality  that  pediatric 

patients  commonly  present  with  multiple  clustered  fractures  [246].  We  improved  two  leading 

single stage detectors, RetinaNet  and YOLOv5,  with  this decision scheme. The networks were 

then trained on 1109 radiographs and yielded RetinaNet and RetinaNet+Avalanche F2 scores of 

0.55 and 0.65, respectively. F2 scores of base YOLOv5 and YOLOv5+Avalanche were 0.58 and 

0.65, respectively. One underlying issue among all prior studies is that they were trained with a 

relatively small volume dataset.  

Small  training  datasets  are  commonly  augmented  with  a  variety  of  methods.  More 

sophisticated NN based  augmentation methods have recently become popular. Both  cycleGAN 

and pix2pix, belonging to a class of NN called Generative Adversarial Networks, have been used 

in a variety of medical imaging applications, such as image segmentation, lesion detection, and 

retinal image analysis [127], [131], [172], [211], [235] (see tables 4-6 in chapter 1).  

We propose a novel GAN approach, where near-pair image patches are used as inputs to a 

cycleGAN,  to  translate  image  patches  of  rib  fracture  absent  radiographs  to  rib  fracture  present 

radiographs. We hypothesize that the “near-pair” aspect of our training data will allow for more 

constrained training and ultimately more successful translation without the use of true 1-to-1 paired 

data. While our method is specifically tested with rib fracture radiographs, this methodology is 

potentially  generalizable  for  data  augmentation  of  any  image  dataset  that  seeks  to  detect  focal 

pathology. 

3.2 Methods 

Rib Fracture Dataset  

Our dataset was collected through an IRB-approved study at Seattle Children’s Hospital. 

75 

 
The dataset contains 1109 unique patients, of which 624 are fracture present and 485 are fracture 

absent. There are 241(34.2%) female and 463(65.8%) male patients. The average age of patients 

is 268.76 ± 784.93 days (range 0 − 6935, median 84, IQR 196). After removing outliers (missing, 

age = 0, or age ≥ Q3 + 1.5IQR), the average age of patients is 128.11 ± 111.43 days (range 1 − 

476, median 84, IQR 140). The images are chest radiographs in an anterior-posterior perspective, 

provided in DICOM file format. Ground truth annotations of rib fracture locations were provided 

by eight board-certified pediatric radiologists. When grading the fracture present section of the 

dataset the radiologists were given prior knowledge that at least one fracture was present in each 

image, thus there is a slight bias towards the labeling performance of the radiologists.  

3.2.1 Near-Pair Patch cycleGAN 

Training  a  model  on  localized  image  patches  instead  of  the  whole  image  poses  many 

benefits:  1)  computationally  cheaper,  2)  faster  training,  and  3)  multiple  training  pairs  can  be 

extracted from one radiograph. Finally, the generation of localized patches will greatly benefit the 

training of object detectors because we know the exact bounding box locations surrounding the 

pathology. 

Near-Pair Generation 

 Our dataset contains 2800 labeled fractures from 515 patients. The average bounding box 

size was 78 x 70 pixels. To give our model contextual information surrounding the fracture, we 

standardized the size of our patches to 2.5 cm x 2.5 cm (128x128 pixels), where the center of our 

patches  is  the  same  as  the  center  of  the  original  bounding  box.  The  near-pair  (fracture-absent) 

patch was manually selected from the same radiograph. The general rules for selecting a near pair 

was to  first  select  a patch on the contralateral  rib and horizontally  flip the image;  If that patch 

happened to contain a labeled fracture, then select a fracture absent patch with similar orientation, 

most commonly a couple ribs above or below the target patch. Examples of near pairs can be seen 

76 

 
in Figure 24. These near-pair patches are then normalized at the patch level using a robust standard 

scalar method using a 98% interquartile range (IQR). 

𝑆𝑐𝑎𝑙𝑒𝑑 𝑖𝑚𝑎𝑔𝑒 =  

𝑂𝑟𝑖𝑔𝑖𝑛𝑎𝑙 𝐼𝑚𝑎𝑔𝑒 − 𝑀𝑒𝑑𝑖𝑎𝑛 𝑝𝑖𝑥𝑒𝑙 𝑣𝑎𝑙𝑢𝑒
𝑂𝑟𝑖𝑔𝑖𝑛𝑎𝑙 𝐼𝑚𝑎𝑔𝑒 𝐼𝑄𝑅

(17) 

Normalization  of  the  patch  allows  our  pixel  values  to  be  in  the  same  range  as  our  activation 

functions,  usually  between  0  and  1.  This  allows  for  less  frequent  non-zero  gradients  during 

training, and therefore the neurons in our network will learn faster.  

Figure  24.  Examples  of  Near-Pair  Patches.  Each  patch  is  manually  selected  from  the  same 
radiograph. If no closely resembling patch within the same radiograph is fracture absent, then we 
select from an age, sex, and chest volume matched image.  

Training Details 

The  2800  near-pair  patches  were  then  used  to  train  a  cycleGAN,  with  the  overall  and 

generator  architecture  presented  in  Figures  25  &  26.  We  trained  the  cycleGAN  using  all  2800 

fracture-present and 2800 fracture-absent near-pair image patches in the training set. We used a 

batch size of 4, learn rate of 0.0002, a lambda of 10, and training was optimized by Adam. The 

network converged in approximately 90 epochs using an NVIDIA V100S GPU.  

77 

Near-PairsFracture Absent Patch, Domain AFracture Present Patch, Domain B 
 
 
Figure 25. NPP-GAN training flowchart. Two sets  of generator/discriminator pairs are trained 
simultaneously  using  near  pair  patches.  Generator  1  converts  real  fracture-absent  patches  to 
fracture-present patches. Discriminator 1 distinguishes between synthetic fracture-present and real 
fracture-present  patches.  Generator  2  removes  pathology  to  create  synthetic  fracture-absent 
patches.  Discriminator  2  distinguishes  between  synthetic  and  real  fracture-absent  patches.  The 
novelty of this work is the use of near-pair real patches derived from similar regions of the image 
from the same base image. 

Figure 26. NPP-GAN Generator architecture. An input layer of size 128x128x1 goes through an 
initial  2D  convolution  layer  with  64  filters  and  a  filter  size  of  7x7.  Then  it  goes  through  2 
downsampling blocks, 3 residual blocks, 2 upsampling blocks, and then a final convolution layer. 
An individual residual block’s architecture is presented on the left. Each block is followed by an 
instance normalization and a ReLU layer, but these have been omitted for simplicity. 

3.2.2 Unpaired cycleGAN 

To  assess  the  benefit  of  using  near-pair  patches,  we  trained  another  conventional 

cycleGAN  with  unpaired  data  to  compare  with  our  NPP  cycleGAN.  To  create  our  unpaired 

fracture-absent  patches,  we  randomly  selected  128x128  regions  of  known  fracture-absent 

78 

Real Fracture Present Patch, Domain BSynthetic Fracture Present Patch,Domain BReal Fracture Absent Patch, Domain ASynthetic Fracture Absent Patch, Domain AcycleGANGenerator 1cycleGANGenerator 2Discriminator 1Discriminator 2Real or Synthetic Fracture Present Patch, Domain BReal or Synthetic Fracture Absent Patch, Domain ACyclic loss (MSE)64 Filters, 7x7 Filter SizeS = 1128 Filters, 3x3 Filter Size,S = 2256, 3x3 S = 264 Filters, 7x7 Filter SizeS = 1128 Filters, 3x3 Filter Size, S = 264 Filters, 3x3 Filter SizeS = 2256, 3x3 S = 1256, 3x3 S = 1256, 3x3 S = 1Convolutional LayerResidual BlocksTranspose Convolutional LayerResidual BlockInput Layer 128x128x1 
 
 
 
 
 
radiographs. Training parameters and architecture of the unpaired cycleGAN were identical to our 

NPP cycleGAN, although it converged in approximately 100 epochs. 

3.2.3 Fréchet Inception Distance Near-Pair Patch cycleGAN 

As mentioned in Section 1.3.4, a common method to evaluate realism of the synthetically 

generated  radiographs  is  the  Fréchet  Inception  Distance  (FID)  [169].  The  Fréchet  Inception 

Distance functions by embedding a set of real and synthetic images in the final average pooling 

layer of an Inception Net [162] pre-trained on ImageNet [249]. The two sets are assumed to be 

multivariate Gaussian distributions with the average and covariance of each utilized to calculate 

the  Fréchet  distance,  also  known  as  the  Wasserstein-2  distance.  This  distance  reflects  the 

difference in the average features extracted from each image set based on the learned kernels of 

the  Inceptionv3  Net  model.  The  distance  has  been  demonstrated  to  be  consistent  with  human 

judgement of visual quality and more resistant to noise than prior approaches for natural images 

[168].  

We used the FID as a metric to evaluate the quality of the generated pathology and as an 

innovation in the proposed generator 1 by creating a new generator loss term using the sum of FID 

and the conventional cycleGAN loss (Equation 13). Therefore, our new loss function is now: 

Ltot(GA, GB, DA , DB) = LGAN(GA, DB, X, Y) + LGAN(GB, DA, Y, X) + λLCyc(G, F) + 𝛽L𝐹𝐼𝐷(G, F) 

(18) 

Where β controls the weight of the FID loss term. The intuition is that the FID loss term 

will enforce the transformed image from the first generator (fracture-present) to “look realistic” 

compared to a random real fracture-present patch. The training set images, training parameters, 

and generator architecture are identical to our NPP cycleGAN, but converged in approximately 

120 epochs. 

79 

 
 
 
3.2.4 Blinded Observer Study 

A  total  of  90  images  (30  Real  Fracture  Present,  30  Real  Fracture  Absent,  30  Synthetic 

Fracture Present) were randomly presented to four pediatric radiologists. The synthetic fracture 

present radiographs were generated using the FID-NPP model. Each image contains a 200x200 

pixel bounding box highlighting the portion of the radiograph we want the radiologists to focus 

on. The radiologists were asked on a 1-5 scale if there is a fracture within the box (1 = Definitely 

not a fracture, 2 = Unlikely a fracture, 3 = May be a fracture, 4 = Likely a fracture, 5 = Definitely 

a fracture).  

3.2.5 Full Radiograph Generation 

Cumulative Informed Fractures 

To generate full radiographs, we first select a fracture absent radiograph and then randomly 

sample 2.5 cm x 2.5 cm patches from the rib cage. The patch sample selection was guided by a 

heatmap of common fracture locations from our dataset (Figure 27). Approximately two-thirds of 

fractures were along the oblique ribs, and our generated radiographs reflect this distribution. Each 

radiograph had between 1-6 synthetic fractures added to follow typical fracture frequency in this 

patient population. 

80 

 
 
 
Figure  27.  Cumulative  distribution  of  the  common  locations  of  fractures  on  a  reference 
radiograph.  Approximately  2/3rds  of  fractures  appear  on  the  oblique  rib  cage.  Generated  from 
fracture-present radiographs where each fracture was mapped to a common atlas space. The sum 
of all fractures in the common space are presented on a representative chest radiograph. 

Poisson Inpainting 

After patch selection, generator 1 was then used to convert the selected fracture absent 

patch to a synthetic fracture present patch. The patch was reinserted into its original radiograph 

using a method we developed based on a Poisson blending technique (Figure 28) [250]. The 

intuition behind Poisson blending, also known as inpainting, is that to color match two different 

domains, the gradient of the images is more important than the intensity. Therefore, the method 

tries to replace the gradients of the target image with the gradients of the source image, while 

overall intensity is matched to the target image [251].  

81 

 
 
 
Figure 28. We use generator 1 to synthetically add a fracture to a selected fracture absent patch. 
The  patch  is  then  reinserted  back  into  the  original  location  and  then  blended  using  Poisson 
Inpainting techniques. 

In Poisson blending, we try to fill in missing regions of an image. Therefore, we first define 

a boundary using a mask. Pixels outside of the mask are known, pixels inside the mask are missing. 

Then, we apply a Laplacian operator to measure the local variations in an image (like edges and 

texture).  The  operator  propagates  image  textures  into  missing  regions  by  solving  a  boundary 

constrained optimization problem. The inpainting process often involves an iterative optimization 

procedure.  At  each  iteration,  the  estimated  pixel  values  are  refined  based  on  the  computed 

Laplacian and the boundary conditions. The process continues until all pixels within the mask are 

updated. 

Once the missing regions are filled in, post-processing techniques can be applied to smooth 

out any remaining inconsistencies or artifacts. Here, we chose to use a gradient filter that preserves 

the edge of our original patch and the center of our synthetic patch (Figure 29).  

82 

Original ImageSynthetic Fracture Present ImageBlended Image 
 
 
Figure 29. Patch blending process. After a fracture absent patch is translated into a fracture present 
patch, there remains some contrast issues. The pixel values are adjusted using Poisson inpainting. 
Then, we use a gradient filter to blend the edges of our patch to match the original surroundings. 

3.3 Results and Discussion 

Figure  30  shows  example  synthetic  fractures  for  all  three  trained  models  given  healthy 

patches as inputs. Qualitatively, each model can generate convincing apparent fractures with signs 

of subperiosteal new bone formation, callus bridging, and medullary sclerosis. Our two proposed 

GANs, NPP and FID-NPP, generally produces sharper images compared to the normal cycleGAN. 

However, visual inspection suggests that convincing fractures are only generated approximately 

one-third of the time. Examples of failed generation of fractures are shown in Figure 31, where no 

new  structure  seems  to  have  been  generated  and  with  negligible  changes  to  visual  attenuation 

properties. 

83 

FractureAbsentPatchSynthetic Present PatchPoisson Adjusted PatchFiltered Fracture Absent PatchFilteredPoisson PatchFinal Synthetic Patch 
 
Figure 30. Examples of synthetically generated fractures given fracture absent image patch inputs 
(Top Row). Each variation of cycleGAN is capable of generating new structure, however, visually 
the NPP and FID-NPP versions are less blurry and appear more realistic. 

Figure 31. Examples of NPP-GAN synthetically generated fractures (Bottom Row) given fracture 
absent image patch inputs (Top Row) with little to no evidence of fracture formation. 

Table 11 shows the average  FID scores  for all synthetically generated and real  fracture 

present  patches.  A  lower  FID  is  favorable  and  indicates  that  the  InceptionV3  network  feature 

vector of the synthetic fracture present patches are closer to the feature vectors from real fracture 

present patches. Both the proposed NPP and FID-NPP models produces images with lower FID 

scores than the normal cycleGAN, suggesting they produce more realistic fractures. Furthermore, 

84 

Fracture Absent Input: Domain AcycleGAN: Domain BNPP cycleGAN: Domain BFID-NPP cycleGAN:Domain BImage 1Image 3Image 4Image 5Image 2Synthetic Fracture Present Patches,Domain BOblique RibsFracture Absent Input, Domain AHorizontal Ribs 
 
 
the combined FID-NPP method yielded the lowest, best, score. Note that the FID score of a set of 

patches of real fractures is not zero considering it is being compared to a different set of reference 

patches of real fractures.  

Table  11.  Fréchet  Inception  Distance  (FID)  score  for  each  cycleGAN  variant.  A  set  of  100 
randomly  selected  patches  were  used  to  calculate  this  score.  The  mean  of  all  FID  scores  were 
significantly different from each other at a p-value <0.05.  

# of Fracture 
Absent patches 

normal cycleGAN 

NPP GAN 

FID-NPP 

Real Frac Present 
patches 

100 

26.3 ± 0.99 

24.0 ± 1.08 

23.5 ± 1.07 

20.6±1.02 

The  blinded  observer  study  results  are  displayed  in  Table  12,  which  suggests  that  the 

synthetic  fractures  are  indeed  realistic,  and  the  blending  process  produces  convincing  full 

radiographs. When presented with real fracture absent, real fracture present, and synthetic fracture 

present images, 4 expert pediatric radiologists scored 2.03 ± 0.84, 4.13 ± 1.23, and 2.73 ± 1.18, 

respectively. Overall, 10 of 30 synthetic fracture present images were scored at least 3 or higher 

by 3 radiologists and in 15 images at least one radiologist scored a 4 or higher. 

Table 12. Scores from the blinded observer study grading the full radiograph with no fractures, 
real fractures, with synthetic patch inserted into image. *On average across all readers and images, 
visual  appearance  of  fracture  for  synthetic  fracture  present  is  significantly  higher  than  fracture 
absent (p=0.01). 

Real Fracture 
Absent 

Real Fracture 
Present 

Synthetic Fracture 
Present 
(FID-NPP) 

All Images 

30 x 4 

30 x 4 

30 x 4 

90 x 4 

2.03 ± 0.84 

4.13 ± 1.23 

2.73 ± 1.18* 

2.79 ± 1.55 

0.604 

0.781 

0.731 

0.881 

Number of Images x 
Number of Readers 
Likelihood of 
Fracture 
Intraclass 
Correlation 
Coefficient 

Overall,  the  evaluation  of  the  FID  scores  and  blinded  observer  study  suggests  that  our 

proposed FID-NPP cycleGAN model can generate realistic fractures for many input patches. It 

particularly excels at creating fractures with calluses of a variety of shapes and sizes. Visually, the 

generator appears to fulfill our goal of improving the diversity of our object detector training set. 

85 

 
 
 
 
However,  we  acknowledge  that  visual  inspection  suggests  that  fractures  are  only  generated 

approximately 1/3rd of the time. This could indicate a failure of our generator, or it could indicate 

that our generator is creating challenging fractures like in Figure 23C and we are unable to visually 

discern them. The former statement is more likely and is supported by our blind observer study, 

where exactly 10/30 synthetic fracture present images were rated at least a 3 by three radiologists.  

86 

 
 
 
CHAPTER 4: DATA AUGMENTED NEURAL NETWORK PEDIATRIC RIB 

FRACTURE DETECTION 

The  ultimate  goal  of  data  augmentation,  in  general,  is  to  increase  a  dataset’s  size  and 

diversity  so  that  a  machine  learned  model  that  trains  on  the  augmented  dataset  improves 

performance.  To  assess  the  downstream  performance  of  a  rib  fracture  detector  we  adapted  the 

YOLOv5 network from Ultralytics [252]. Rather than training an object detector from scratch, we 

opted to utilize the YOLOv5|6 model for transfer learning with their pretrained weights. Compared 

to the popular YOLOv3 which operated with a Darknet backbone architecture [253], the YOLOv5 

architecture uses a CSPNet backbone based on DenseNet [254]. It additionally integrated a novel 

mosaic data augmentation method that aimed to improve detection performance of small objects 

in images. 

4.1 Introduction 

YOLOv5 is an object detection algorithm from the popular You Only Look Once (YOLO) 

series  of  real-time  object  detection  models.  YOLOv5  follows  a  one-stage  object  detection 

architecture.  It  divides  an  input  image  into  a  grid  and  predicts  bounding  boxes  and  class 

probabilities for objects  within each grid cell. It  uses a single CNN based off CSPNet to make 

these  predictions.  YOLOv5  employs  a  detection  head  on  top  of  the  backbone  network.  The 

detection head consists of additional convolutional layers that process the extracted features and 

predict bounding box coordinates and class probabilities for each object detected. One interesting 

innovation of YOLOv5 is the use of anchor boxes. Anchor boxes are predefined bounding boxes 

of various sizes and aspect ratios, to handle objects of different shapes and  sizes. These anchor 

boxes are associated with each grid cell, and YOLOv5 adjusts them to match the objects' shapes 

during training. 

The YOLO family of networks is constantly evolving, with the current model at v8. We 

87 

 
chose to use YOLOv5 because it is commonly used as a baseline detector in previous studies (See 

Section 1.4). Overall, YOLOv5 is a powerful object detection algorithm that offers state-of-the-art 

performance,  real-time  inference  capabilities,  and  a  user-friendly  framework  for  training  and 

deployment. 

4.2 Materials and Methods 

The  real  image  dataset  used  for  training  and  evaluation  of  the  YOLOv5  architecture 

contains 1,109 unique real images, of which 624 are fracture present and 485 are fracture absent. 

For full demographics, see Section 3.2.  

Evaluation Methods and Metrics: 

This study aimed to evaluate the performance gains from varying the amount of augmented 

data in our training set. Specifically, does augmented data provide higher performance gains in 

low  volume  scenarios.  Table  13  shows  the  different  training  conditions.  We  evaluated  and 

compared training stability, precision, recall, F2, and receiver operating characteristic for each of 

the conditions. We define these performance metrics as follows:  

Precision = 

𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑃𝑟𝑒𝑐𝑡𝑖𝑐𝑡𝑒𝑑 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒

Recall = 

𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝐶𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒

Fβ = (1 + 𝛽)

𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∙ 𝑟𝑒𝑐𝑎𝑙𝑙
(𝛽2 ∙ 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛) + 𝑟𝑒𝑐𝑎𝑙𝑙

Table 13. Combinations of real images with GAN generated synthetic fracture images used for 
training the YOLOv5 object detector. 

Training Condition 
1 
2 
3 
4 
5 
6 

7 

# Real Images 
0 
50 
50 
250 
250 

500 

500 

# Synthetic Fracture Images 
500 
0 
500 
0 
500 

0 

500 

While an F1 score is commonly used as an evaluation metric in classification and detection 

tasks, we opt to use the F2 score for a couple of reasons. Since the goal of this detection task is to 

88 

 
 
 
 
 
aid radiologists by flagging suspicious regions, we set our β term to weight recall more heavily 

than precision. We would rather have false positives than false negatives. Therefore, we evaluate 

all models by F2 score and placing twice as much weight on recall as precision. 

Training Details 

To avoid bias, any fracture-absent radiographs that were used as the base image during our 

synthetic radiograph generation were split so that they could only appear in the training set and 

not the testing set. The remaining real fracture-absent images were then used to generate the testing 

and validation sets. The test set contains 120 images evenly split with 60 fracture present and 60 

fracture absent images.  

Each YOLOv5 model was fine-tuned on our data for a maximum of 300 epochs with a 

batch size of 8 on NVIDIA V100S GPUs. An early stopping protocol was used to end training if 

performance on the validation set did not improve within 100 epochs. Test set performance was 

evaluated using weights with the highest validation metric on each respective training set. Error 

bars for each metric were calculated using a stratified bootstrapping method. In each of the 5,000 

iterations, the subsets of 60 fracture present and fracture absent images were randomly sampled 

with replacement, maintaining 60 images in each set for a total of 120 images to match the original 

test set size. 

4.3 Results and Discussion 

Table  14.  Performance  of  object  detector  trained  with  different  volumes  of  real  and  FID-NPP 
GAN synthetic images. Bold values represent the highest score for each training set size. 

Training Dataset 

0 Real 500 Synthetic 
50 Real 0 Synthetic 
50 Real 500 Synthetic 
250 Real 0 Synthetic 
250 Real 500 Synthetic 
500 Real 0 Synthetic 
500 Real 500 Synthetic 

Precision 

0.000 ± 0.000 
0.764 ± 0.051 
0.981 ± 0.019 
0.883 ± 0.031 
0.923 ± 0.027 
0.991 ± 0.009 
0.855 ± 0.034 

Recall 

0.000 ± 0.000 
0.327 ± 0.048 
0.201 ± 0.031 
0.434 ± 0.042 
0.408 ± 0.049 
0.412 ± 0.041 
0.488 ± 0.041 

F2 Score 

0.000 ± 0.000 
0.369 ± 0.051 
0.239 ± 0.036 
0.482 ± 0.042 
0.458 ± 0.050 
0.466 ± 0.042 
0.534 ± 0.040 

89 

 
Tables 14 show the performance of the YOLOv5 object detector trained on different sets 

of training data augmented with the FID-NPP GAN generated radiographs. Training solely with 

synthetically  generated  fractures  led  to  abysmal  object  detector  performance  with  the  detector 

unable to identify any real fractures. The use of the augmented data with different volumes of real 

training samples resulted in varying levels of performance gains with increased precision for the 

low volume conditions (50 Real+500 Synthetic; 250 Real+500 Synthetic) and increased recall and 

F2 Score (14.6% increase from  0.466 ± 0.042 to 0.534 ± 0.040) for the relatively high-volume 

condition  (500  Real+500  Synthetic).  Combined,  these  results  suggest  that,  in  this  current 

application,  synthetic  data  alone  is  not  sufficient  for  training  object  detectors  and  the  relative 

performance gains will be a function of the data augmentation mix (real+synthetic data).   

Our  modest  object  detector  improvement  in  performance  lines  up  with  similar  studies. 

Section  1.4  showed  that  generally  downstream  NN  only  has  a  1-4%  increase  in  performance 

metric. In best case scenarios, Hammami et al. showed an 8% increase in mean average precision 

(mAP) for multi-organ detection when adding cycleGAN augmented CT images [235]. Kanayama 

et al. showed a 6% increase in mAP for gastric cancer detection using a conditional GAN [255], 

and  a  10%  increase  in  sensitivity  was  seen  in  brain  metastases  detection  by  adding  PGGAN 

generated MR images by Han et al. [256]. The addition of 500 synthetic images to 500 real images 

using our best performing GAN shows an 18.5% increase in recall and a 14.6% increase in F2 

score,  which  is  a  promising  indication  that  our  NPP-FID  method  is  better  than  no  data 

augmentation. 

The variable behavior in object detector performance relative to synthetic images in the 

overall training data is also seen with other studies. The three mentioned previously also saw that 

either adding too little or too many synthetic images produce worse results than only real images. 

90 

 
Additionally,  an  increase  in  synthetic  image  realism  does  not  necessarily  improve  a  detection 

performance score [256]. The reason behind this behavior is not definitively clear, however the 

leading  theory  is  that  learning  from  data  produced  by  other  models  causes  model  collapse  –  a 

degenerative process whereby, over time, models forget the true underlying data distribution, even 

in the absence of a shift in the distribution over time [257]. Therefore, further studies need to be 

made to determine the best ratio of synthetic to real images for object detector training, as well as 

methods to minimize model collapse.  

In summary, we proposed a new technique that utilizes near-pair image patches along with 

an FID loss function to train a cycleGAN model. Our results show that this approach can generate 

realistic-looking pediatric chest radiographs containing rib fractures. The augmented fracture data 

led to a moderate improvement in the performance of a rib fracture detection model. This technique 

could potentially be generalizable to other medical imaging tasks where the goal is to synthesize 

realistic pathology. 

4.4 Perspective on the Future of Image Generation 

The field of generative AI is a fast-evolving field. GANs are less than 10 years old and new 

improvements  are  still  being  made.  It  has  been  used  for  a  wide  range  of  medical  imaging 

applications, indicating that the technology is versatile and adaptable. In the future, the proposed 

FID-NPP  cycleGAN  can  potentially  be  generalized  to  any  problem  that  seeks  to  localize 

pathology. The pipeline for this generalization is as simple as generating near pairs from a target 

dataset  and  using 

the 

images 

to 

train 

the  FID-NPP  network 

(available  here: 

https://github.com/tuethan/Pediatric-Chest-Radiograph-Data-Augmentation).  Similarly,  we  can 

then generate synthetic images using the same process as in section 3.2.5 and then train an object 

detector using the augmented data. There are two general improvements that can be made to the 

proposed method; improving fracture labels and changing the generator architecture. The size of 

91 

 
 
the patches was chosen based on the median size of our labeled bounding boxes. Large bounding 

boxes tend to contain multiple fractures, since physicians simply highlight an area if there are a 

cluster of fractures. Therefore, some fracture-present patches may contain off-centered fractures 

or  partial  fractures,  which  hinders  our  cycleGAN  training.  Additionally,  the  object  detector 

assumes  that  each  bounding  box  has  one  fracture  present,  so  training  with  a  few  samples  that 

contain multiple fracture bounding boxes does not benefit the model. By relabeling our images 

and standardizing the size of our bounding boxes, we may achieve better results.  

The  second  major  improvement  is  changing  the  design  of  our  generator  network.  This 

change can be as  simple as tuning hyper parameters (number of filters,  filter size,  convolution 

padding  style,  upsampling  method,  etc.),  or  as  complex  as  changing  the  architecture  (adding 

residual blocks, adding batch normalization layers, adding an auxiliary network, etc.). We may 

even choose to replace GANs altogether with a new type of image generation network which has 

recently  dominated  the field. Diffusion models such as DALL-E  [258],  Midjourney [259], and 

Stable  Diffusion  [260]  have  gained  notable  attention  due  to  their  remarkably  high-resolution 

outputs from a prompt in natural language (Figure 32). These models are trained using hundreds 

of  millions  of  images  and  have  the  flexibility  of  generating  diverse  content.  The  underlying 

network  is  fundamentally  different  from  GANs.  Rather  than  a  generator  learning  based  off  a 

discriminator’s ability to classify real from fake, diffusion models learn to map noise to an image 

in a progressive manner. Despite numerous copyright lawsuits claiming misuse of artists’ images, 

diffusion models are still advancing (Midjourney v5.2 just released in June 2023). This all begs 

the question; if diffusion models can generate images that are as high quality as GANs and can 

generate more diverse content than GANs, do we even need GANs anymore? 

92 

 
 
Figure 32. Images generated using DALL-E (A), Midjourney (B), and Stable Diffusion (C) with 
the text prompt “pediatric chest radiograph with rib fractures”. 

I believe that GANs be obsolete, soon. Currently, there are two main constraints that limit 

the use of diffusion models for medical imaging research. The first is computational resources and 

second is data. Diffusion models are notoriously known to be computationally expensive and slow 

to train while requiring huge volumes of data [261]. GANs are still valuable in research settings 

simply  because  they  are  cheaper  and  more  efficient.  They  still  have  immense  potential  in  the 

medical imaging field where we have highly specialized tasks. However, this may soon change as 

a prospective study has produced a model combining diffusion models and GANs [262], and a few 

prospective studies have shown that diffusion models are superior to GANs for specialized image 

generation [263], [264], including medical imaging tasks [265], [266].  

In  conclusion,  GANs  and  Diffusion  Models  represent  two  branches  of  generative  AI 

stemming from a large tree of neural networks. This field is ever growing and always exciting, it 

will be interesting to see what the new buds will bring.  

93 

ABC 
 
 
 
4.5 Supplemental Information 

The following Tables A1 and A2 are the object detector performance results using normal 

cycleGAN  and  NPP  cycleGAN  augmented  data.  These  tables  are  not  included  in  Section  4.3 

because they are not a fair comparison to each other. The set of real radiographs used for Table 14 

is different from Table A1 and A2, and therefore the baseline real-only evaluations do not match. 

Additionally, the base radiographs as well as selected patches used for synthetic image generation 

do not match from set to set. For future studies we would like to use matching images for both real 

and synthetic portions to offer a fair comparison between different models.  

Table  15.  Performance  of  object  detector  trained  with  different  volumes  of  real  and  normal 
cycleGAN synthetic images. Bold values represent the highest score for each training set size. 

Training Dataset 

Precision 

Recall 

F2 Score 

0 Real 500 Synthetic 

0.125 ± 0.128  0.004 ± 0.004  0.005 ± 0.005 

50 Real 0 Synthetic 

0.735 ± 0.070  0.214 ± 0.043  0.249 ± 0.048 

50 Real 500 Synthetic 

0.815 ± 0.091  0.169 ± 0.044  0.200 ± 0.050 

250 Real 0 Synthetic 

0.820 ± 0.035  0.452 ± 0.048  0.496 ± 0.047 

250 Real 500 Synthetic  0.816 ± 0.039  0.456 ± 0.053  0.499 ± 0.052 

500 Real 0 Synthetic 

0.901 ± 0.030  0.448 ± 0.051  0.498 ± 0.051 

500 Real 500 Synthetic  0.860 ± 0.033  0.493 ± 0.048  0.539 ± 0.047 

Table  16.  Performance  of  object  detector  trained  with  different  volumes  of  real  and  FID-NPP 
GAN synthetic images. Bold values represent the highest score for each training set size. 

Training Dataset 

Precision 

Recall 

F2 Score 

0 Real 500 Synthetic 

0.000 ± 0.000  0.000 ± 0.000  0.000 ± 0.000 

50 Real 0 Synthetic 

0.779 ± 0.059  0.254 ± 0.035  0.294 ± 0.038 

50 Real 500 Synthetic 

0.922 ± 0.046  0.179 ± 0.035  0.213 ± 0.040 

250 Real 0 Synthetic 

0.894 ± 0.047  0.468 ± 0.048  0.517 ± 0.048 

250 Real 500 Synthetic  0.909 ± 0.033  0.449 ± 0.050  0.499 ± 0.050 

500 Real 0 Synthetic 

0.825 ± 0.035  0.536 ± 0.044  0.576 ± 0.042 

500 Real 500 Synthetic  0.874 ± 0.036  0.524 ± 0.047  0.569 ± 0.046 

94 

 
 
 
 
 
 
 
 
REFERENCES 

[1]  E.  Tu,  M.  Azmat,  K.  Branch,  and  A.  Alessio,  “Machine  learned  approach  for  estimating 
myocardial blood flow from dynamic CT and coronary artery disease risk factors,” in Medical 
Imaging 2021: Biomedical Applications in Molecular, Structural, and Functional Imaging, 
SPIE, Feb. 2021, pp. 370–375. doi: 10.1117/12.2581703. 

[2]  K.  M.  Pawelec  et  al.,  “Incorporating  Tantalum  Oxide  Nanoparticles  into  Implantable 
Polymeric Biomedical Devices for Radiological Monitoring,” Adv. Healthc. Mater., vol. 12, 
no. 18, p. 2203167, 2023, doi: 10.1002/adhm.202203167. 

[3]  R. Nakazato, D. S. Berman, E. Alexanderson, and P. Slomka, “Myocardial perfusion imaging 
with PET,” Imaging Med., vol. 5, no. 1, pp. 35–46, Feb. 2013, doi: 10.2217/iim.13.1. 

[4]  R. S. Driessen, P. G. Raijmakers, W. J. Stuijfzand, and P. Knaapen, “Myocardial perfusion 
imaging with PET,” Int. J. Cardiovasc. Imaging, vol. 33, no. 7, pp. 1021–1031, 2017, doi: 
10.1007/s10554-017-1084-4. 

[5]  R. Sciagrà et al., “Clinical use of quantitative cardiac perfusion PET: rationale, modalities 
and possible indications. Position paper of the Cardiovascular Committee of the European 
Association of Nuclear Medicine (EANM),” Eur. J. Nucl. Med. Mol. Imaging, vol. 43, no. 8, 
pp. 1530–1545, Jul. 2016, doi: 10.1007/s00259-016-3317-5. 

[6]  O.  R.  Coelho-Filho,  C.  Rickers,  R.  Y.  Kwong,  and  M.  Jerosch-Herold,  “MR  myocardial 
perfusion  imaging,”  Radiology,  vol.  266,  no.  3,  pp.  701–715,  Mar.  2013,  doi: 
10.1148/radiol.12110918. 

[7]  F. Bamberg et al., “Dynamic myocardial CT perfusion imaging for evaluation of myocardial 
ischemia as determined by MR imaging,” JACC Cardiovasc. Imaging, vol. 7, no. 3, pp. 267–
277, Mar. 2014, doi: 10.1016/j.jcmg.2013.06.008. 

[8]  M.  Jerosch-Herold,  “Quantification  of  myocardial  perfusion  by  cardiovascular  magnetic 
resonance,” J. Cardiovasc. Magn. Reson., vol. 12, no. 1, p. 57, Oct. 2010, doi: 10.1186/1532-
429X-12-57. 

[9]  R. Y. Kwong et al., “Cardiac Magnetic Resonance Stress Perfusion Imaging for Evaluation 
of  Patients  With Chest Pain,”  J.  Am.  Coll.  Cardiol.,  vol.  74,  no.  14,  pp.  1741–1755,  Oct. 
2019, doi: 10.1016/j.jacc.2019.07.074. 

[10]  M. Bindschadler, D. Modgil, K. R. Branch, P. J. La Riviere, and A. M. Alessio, “Comparison 
of blood flow models and acquisitions for quantitative myocardial perfusion estimation from 
dynamic CT,” Phys. Med. Biol., vol. 59, no. 7, pp. 1533–1556, Apr. 2014, doi: 10.1088/0031-
9155/59/7/1533. 

[11]  M. Bindschadler, K. R. Branch, and A. M. Alessio, “Quantitative myocardial perfusion from 
static cardiac and dynamic arterial CT,” Phys. Med. Biol., vol. 63, no. 10, p. 105020, May 
2018, doi: 10.1088/1361-6560/aac0bd. 

95 

 
[12]  A. Varga-Szemes, F. G. Meinel, C. N. De Cecco, S. R. Fuller, R. R. Bayer, and U. J. Schoepf, 
“CT myocardial perfusion imaging,” AJR Am. J. Roentgenol., vol. 204, no. 3, pp. 487–497, 
Mar. 2015, doi: 10.2214/AJR.14.13546. 

[13]  C. Q. Davidson, C. P. Phenix, T. Tai, N. Khaper, and S. J. Lees, “Searching for novel PET 
radiotracers: imaging cardiac perfusion, metabolism and inflammation,” Am. J. Nucl. Med. 
Mol. Imaging, vol. 8, no. 3, pp. 200–227, Jun. 2018. 

[14]  T. H. Schindler, H. R. Schelbert, A. Quercioli, and V. Dilsizian, “Cardiac PET Imaging for 
the Detection and Monitoring of Coronary Artery Disease and Microvascular Health,” JACC 
Cardiovasc. Imaging, vol. 3, no. 6, pp. 623–640, Jun. 2010, doi: 10.1016/j.jcmg.2010.04.007. 

[15]  K. L. Gould et al., “Anatomic versus physiologic assessment of coronary artery disease. Role 
of coronary flow reserve, fractional flow reserve, and positron emission tomography imaging 
in revascularization decision-making,” J. Am. Coll. Cardiol., vol. 62, no. 18, pp. 1639–1653, 
Oct. 2013, doi: 10.1016/j.jacc.2013.07.076. 

[16]  R. A. P. Takx et al., “Diagnostic accuracy of stress myocardial perfusion imaging compared 
to  invasive  coronary  angiography  with  fractional  flow  reserve  meta-analysis,”  Circ. 
doi: 
Imaging, 
Cardiovasc. 
10.1161/CIRCIMAGING.114.002666. 

e002666, 

2015, 

Jan. 

vol. 

no. 

1, 

8, 

p. 

[17]  C.  Jaarsma  et  al.,  “Diagnostic  performance  of  noninvasive  myocardial  perfusion  imaging 
using  single-photon  emission  computed  tomography,  cardiac  magnetic  resonance,  and 
positron  emission  tomography  imaging  for  the  detection  of  obstructive  coronary  artery 
disease: a meta-analysis,” J. Am. Coll. Cardiol., vol. 59, no. 19, pp. 1719–1728, May 2012, 
doi: 10.1016/j.jacc.2011.12.040. 

[18]  B. A. Mc Ardle, T. F. Dowsley, R. A. deKemp, G. A. Wells, and R. S. Beanlands, “Does 
rubidium-82 PET have superior accuracy to SPECT perfusion imaging for the diagnosis of 
obstructive coronary disease?: A systematic review and meta-analysis,” J. Am. Coll. Cardiol., 
vol. 60, no. 18, pp. 1828–1837, Oct. 2012, doi: 10.1016/j.jacc.2012.07.038. 

[19]  C.  B.  Monti,  M.  Codari,  M.  van  Assen,  C.  N.  De  Cecco,  and  R.  Vliegenthart,  “Machine 
Learning and Deep Neural Networks Applications in Computed Tomography for Coronary 
Artery Disease and Myocardial Perfusion,” J. Thorac. Imaging, vol. 35 Suppl 1, pp. S58–
S65, May 2020, doi: 10.1097/RTI.0000000000000490. 

[20]  R. Arsanjani et al., “Prediction of revascularization after myocardial perfusion SPECT by 
machine learning in a large population,” J. Nucl. Cardiol. Off. Publ. Am. Soc. Nucl. Cardiol., 
vol. 22, no. 5, pp. 877–884, Oct. 2015, doi: 10.1007/s12350-014-0027-x. 

[21]  Y. Li et al., “Detection of Hemodynamically Significant Coronary Stenosis: CT Myocardial 
Perfusion versus Machine Learning CT Fractional Flow Reserve,” Radiology, vol. 293, no. 
2, pp. 305–314, Nov. 2019, doi: 10.1148/radiol.2019190098. 

96 

 
[22]  J.  Betancur  et  al.,  “Prognostic  Value  of  Combined  Clinical  and Myocardial  Perfusion 
Imaging  Data  Using  Machine  Learning,”  JACC  Cardiovasc.  Imaging,  vol.  11,  no.  7,  pp. 
1000–1009, Jul. 2018, doi: 10.1016/j.jcmg.2017.07.024. 

[23]  J.  M.  Wolterink,  T.  Leiner,  R.  A.  P.  Takx,  M.  A.  Viergever,  and  I.  Isgum,  “Automatic 
Coronary  Calcium  Scoring  in  Non-Contrast-Enhanced  ECG-Triggered  Cardiac  CT  With 
Ambiguity Detection,” IEEE Trans. Med. Imaging, vol. 34, no. 9, pp. 1867–1878, Sep. 2015, 
doi: 10.1109/TMI.2015.2412651. 

[24]  A.  M.  Alessio,  M.  Bindschadler,  J.  M.  Busey,  W.  P.  Shuman,  J.  H.  Caldwell,  and  K.  R. 
Branch, “Accuracy of Myocardial Blood Flow Estimation From Dynamic Contrast-Enhanced 
Cardiac CT Compared With PET,” Circ. Cardiovasc. Imaging, vol. 12, no. 6, p. e008323, 
Jun. 2019, doi: 10.1161/CIRCIMAGING.118.008323. 

[25]  M.  D.  Cerqueira  et  al.,  “Standardized  Myocardial  Segmentation  and  Nomenclature  for 
Tomographic Imaging of the Heart,” Circulation, vol. 105, no. 4, pp. 539–542, Jan. 2002, 
doi: 10.1161/hc0402.102975. 

[26]  L. J. Shaw et al., “Impact of ethnicity and gender differences on angiographic coronary artery 
disease prevalence and in-hospital mortality in the American College of Cardiology-National 
Cardiovascular Data Registry,” Circulation, vol. 117, no. 14, pp. 1787–1801, Apr. 2008, doi: 
10.1161/CIRCULATIONAHA.107.726562. 

[27]  F. Cademartiri et al., “Myocardial blood flow quantification for evaluation of coronary artery 
disease  by  computed  tomography,”  Cardiovasc.  Diagn.  Ther.,  vol.  7,  no.  2,  pp.  129–150, 
Apr. 2017, doi: 10.21037/cdt.2017.03.22. 

[28]  W. B. Meijboom et al., “Comprehensive assessment of coronary artery stenoses: computed 
tomography  coronary  angiography  versus  conventional  coronary  angiography  and 
correlation with fractional flow reserve in patients with stable angina,” J. Am. Coll. Cardiol., 
vol. 52, no. 8, pp. 636–643, Aug. 2008, doi: 10.1016/j.jacc.2008.05.024. 

[29]  C.  W.  White  et  al.,  “Does  visual  interpretation  of  the  coronary  arteriogram  predict  the 
physiologic importance of a coronary stenosis?,” N. Engl. J. Med., vol. 310, no. 13, pp. 819–
824, Mar. 1984, doi: 10.1056/NEJM198403293101304. 

[30]  C.  A.  Santana  et  al.,  “Diagnostic  performance  of  fusion  of  myocardial  perfusion  imaging 
(MPI) and computed tomography coronary angiography,” J. Nucl. Cardiol. Off. Publ. Am. 
Soc. Nucl. Cardiol., vol. 16, no. 2, pp. 201–211, 2009, doi: 10.1007/s12350-008-9019-z. 

[31]  G. Pontone et al., “Rationale and design of the PERFECTION (comparison between stress 
cardiac  computed  tomography  PERfusion  versus  Fractional  flow  rEserve  measured  by 
Computed Tomography angiography In the evaluation of suspected cOroNary artery disease) 
prospective study,” J. Cardiovasc. Comput. Tomogr., vol. 10, no. 4, pp. 330–334, 2016, doi: 
10.1016/j.jcct.2016.03.004. 

97 

 
[32]  A. Bol et al., “Direct comparison of [13N]ammonia and [15O]water estimates of perfusion 
with quantification of regional myocardial blood flow by microspheres,”  Circulation, vol. 
87, no. 2, pp. 512–525, Feb. 1993, doi: 10.1161/01.cir.87.2.512. 

[33]  S.  R.  Bergmann  et  al.,  “Quantification  of  regional  myocardial  blood  flow  in  vivo  with 

H215O,” Circulation, vol. 70, no. 4, pp. 724–733, Oct. 1984, doi: 10.1161/01.cir.70.4.724. 

[34]  T.  Kero,  J.  Nordström,  H.  J.  Harms,  J.  Sörensen,  H.  Ahlström,  and  M.  Lubberink, 
“Quantitative  myocardial  blood  flow  imaging  with  integrated  time-of-flight  PET-MR,” 
EJNMMI Phys., vol. 4, p. 1, Jan. 2017, doi: 10.1186/s40658-016-0171-2. 

[35]  J.  M.  Renaud  et  al.,  “Characterization  of  3-Dimensional  PET  Systems  for  Accurate 
Quantification of Myocardial Blood Flow,” J. Nucl. Med. Off. Publ. Soc. Nucl. Med., vol. 58, 
no. 1, pp. 103–109, Jan. 2017, doi: 10.2967/jnumed.116.174565. 

[36]  H. Schelbert, “Measurement of MBF by PET is ready for prime time as an integral part of 
clinical reports in diagnosis and risk assessment of patients with known or suspected CAD,” 
J. Nucl. Cardiol., vol. 25, no. 1, pp. 153–156, Feb. 2018, doi: 10.1007/s12350-016-0423-5. 

[37]  S. V. Nesterov et al., “Quantification of Myocardial Blood Flow in Absolute Terms u27sing 
82Rb  PET  Imaging:  Results  of  RUBY-10—a  multicenter  study  comparing  ten  computer 
analysis programs,” JACC Cardiovasc. Imaging, vol. 7, no. 11, pp. 1119–1127, Nov. 2014, 
doi: 10.1016/j.jcmg.2014.08.003. 

[38]  M. S. Ambrose, C. Valdiviezo, V. Mehra, A. C. Lardo, J. A. C. Lima, and R. T. George, “CT 
perfusion: ready for prime time,” Curr. Cardiol. Rep., vol. 13, no. 1, pp. 57–66, Feb. 2011, 
doi: 10.1007/s11886-010-0152-3. 

[39]  A. So et al., “Non-invasive assessment of functionally relevant coronary artery stenoses with 
quantitative CT perfusion: preliminary clinical experiences,” Eur. Radiol., vol. 22, no. 1, pp. 
39–50, Jan. 2012, doi: 10.1007/s00330-011-2260-x. 

[40]  A. So, J. Hsieh, J.-Y. Li, J. Hadway, H.-F. Kong, and T.-Y. Lee, “Quantitative myocardial 
perfusion  measurement  using  CT  perfusion:  a  validation  study  in  a  porcine  model  of 
reperfused acute myocardial infarction,” Int. J. Cardiovasc. Imaging, vol. 28, no. 5, pp. 1237–
1248, Jun. 2012, doi: 10.1007/s10554-011-9927-x. 

[41]  C.  A.  Cuenod  and  D.  Balvay,  “Perfusion  and  vascular  permeability:  basic  concepts  and 
measurement in DCE-CT and DCE-MRI,” Diagn. Interv. Imaging, vol. 94, no. 12, pp. 1187–
1204, Dec. 2013, doi: 10.1016/j.diii.2013.10.010. 

[42]  G.  Pontone  et  al.,  “Diagnostic  performance  of  non-invasive  imaging  for  stable  coronary 
artery  disease:  A  meta-analysis,”  Int.  J.  Cardiol.,  vol.  300,  pp.  276–281,  Feb.  2020,  doi: 
10.1016/j.ijcard.2019.10.046. 

[43]  M.  Lu,  S.  Wang,  A.  Sirajuddin,  A.  E.  Arai,  and  S.  Zhao,  “Dynamic  stress  computed 
tomography myocardial perfusion for detecting myocardial ischemia: A systematic review 

98 

 
and  meta-analysis,”  Int.  J.  Cardiol.,  vol.  258,  pp.  325–331,  May  2018,  doi: 
10.1016/j.ijcard.2018.01.095. 

[44]  F.  M.  A.  Nous  et  al.,  “Dynamic  Myocardial  Perfusion  CT  for  the  Detection  of 
Hemodynamically Significant Coronary Artery Disease,” JACC Cardiovasc. Imaging, vol. 
15, no. 1, pp. 75–87, Jan. 2022, doi: 10.1016/j.jcmg.2021.07.021. 

[45]  N.  P.  Johnson  and  K.  L.  Gould,  “Integrating  noninvasive  absolute  flow,  coronary  flow 
reserve, and ischemic thresholds into a comprehensive map of physiological severity,” JACC 
Cardiovasc. 
doi: 
10.1016/j.jcmg.2011.12.014. 

430–440,  Apr. 

Imaging, 

2012, 

vol. 

pp. 

no. 

5, 

4, 

[46]  C. R. R. N. Hunter, R. Klein, R. S. Beanlands, and R. A. deKemp, “Patient motion effects on 
the  quantification  of  regional  myocardial  blood  flow  with  dynamic  PET  imaging,”  Med. 
Phys., vol. 43, no. 4, pp. 1829–1840, 2016, doi: 10.1118/1.4943565. 

[47]  Y. Xia, Y. He, F. Zhang, Y. Liu, and J. Leng, “A Review of Shape Memory Polymers and 
Composites:  Mechanisms,  Materials,  and  Applications,”  Adv.  Mater.,  vol.  33,  no.  6,  p. 
2000713, 2021, doi: 10.1002/adma.202000713. 

[48]  M. Veletić et  al., “Implants  with  Sensing Capabilities,”  Chem. Rev., vol.  122, no. 21, pp. 

16329–16363, Nov. 2022, doi: 10.1021/acs.chemrev.2c00005. 

[49]  Y.-H. Shao, K. Tsai, S. Kim, Y.-J. Wu, and K. Demissie, “Exposure to Tomographic Scans 
and  Cancer  Risks,”  JNCI  Cancer  Spectr.,  vol.  4,  no.  1,  p.  pkz072,  Feb.  2020,  doi: 
10.1093/jncics/pkz072. 

[50]  K. M. Pawelec et al., “Design Considerations to Facilitate Clinical Radiological Evaluation 
of Implantable Biomedical Structures,” ACS Biomater. Sci. Eng., vol. 7, no. 2, pp. 718–726, 
Feb. 2021, doi: 10.1021/acsbiomaterials.0c01439. 

[51]  J.  Wallyn,  N.  Anton,  S.  Akram,  and  T.  F.  Vandamme,  “Biomedical  Imaging:  Principles, 
Technologies,  Clinical  Aspects,  Contrast  Agents,  Limitations  and  Future  Trends  in 
Nanomedicines,” Pharm. Res., vol. 36, no. 6, p. 78, Apr. 2019, doi: 10.1007/s11095-019-
2608-5. 

[52]  D.  A.  Szulc  and  H.-L.  M.  Cheng,  “One-Step  Labeling  of  Collagen  Hydrogels  with 
Polydopamine and Manganese Porphyrin for Non-Invasive Scaffold Tracking on Magnetic 
Resonance  Imaging,”  Macromol.  Biosci.,  vol.  19,  no.  4,  p.  e1800330,  Apr.  2019,  doi: 
10.1002/mabi.201800330. 

[53]  A. Erol, D. B. H. Rosberg, B. Hazer, and B. S. Göncü, “Biodegradable and biocompatible 
radiopaque  iodinated  poly-3-hydroxy  butyrate:  synthesis,  characterization  and  in  vitro/in 
vivo  X-ray  visibility,”  Polym.  Bull.,  vol.  77,  no.  1,  pp.  275–289,  Jan.  2020,  doi: 
10.1007/s00289-019-02747-6. 

[54]  J. M. Crowder, N. Bates, J. Roberts, A. S. Torres, and P. J. Bonitatibus, “Determination of 
tantalum from tantalum oxide nanoparticle X-ray/CT contrast agents in rat tissues and bodily 

99 

 
fluids  by  ICP-OES,”  J.  Anal.  At.  Spectrom.,  vol.  31,  no.  6,  pp.  1311–1317,  2016,  doi: 
10.1039/C5JA00446B. 

[55]  J.  W.  Lambert  et  al.,  “An  Intravascular  Tantalum  Oxide-based  CT  Contrast  Agent: 
Preclinical Evaluation Emulating Overweight and Obese Patient Size,” Radiology, vol. 289, 
no. 1, pp. 103–110, Oct. 2018, doi: 10.1148/radiol.2018172381. 

[56]  S. Chakravarty et al., “Tantalum oxide nanoparticles as versatile contrast agents for X-ray 
computed  tomography,”  Nanoscale,  vol.  12,  no.  14,  pp.  7720–7734,  Apr.  2020,  doi: 
10.1039/d0nr01234c. 

[57]  G.  Mohandas,  N.  Oskolkov,  M.  T.  McMahon,  P.  Walczak,  and  M.  Janowski,  “Porous 
tantalum and tantalum oxide nanoparticles for regenerative medicine,” Acta Neurobiol. Exp. 
(Warsz.), vol. 74, no. 2, pp. 188–196, 2014. 

[58]  A. Göpferich, “Mechanisms of polymer degradation and erosion,” Biomaterials, vol. 17, no. 

2, pp. 103–114, Jan. 1996, doi: 10.1016/0142-9612(96)85755-3. 

[59]  S. M. Pizer et al., “Adaptive histogram equalization and its variations,” Comput. Vis. Graph. 
Image Process., vol. 39, no. 3, pp. 355–368, Sep. 1987, doi: 10.1016/S0734-189X(87)80186-
X. 

[60]  N. Otsu, “A Tlreshold Selection Method from Gray-Level Histograms”. 

[61]  D. Bradley and G. Roth, “Adaptive Thresholding using the Integral Image,” J. Graph. Tools, 

vol. 12, no. 2, pp. 13–21, Jan. 2007, doi: 10.1080/2151237X.2007.10129236. 

[62]  L.  Lu  et  al.,  “In  vitro  and  in  vivo  degradation  of  porous  poly(DL-lactic-co-glycolic  acid) 
foams,”  Biomaterials,  vol.  21,  no.  18,  pp.  1837–1845,  Sep.  2000,  doi:  10.1016/s0142-
9612(00)00047-8. 

[63]  C. E. Rapier, K. J. Shea, and A. P. Lee, “Investigating PLGA microparticle swelling behavior 
reveals an interplay of expansive intermolecular forces,” Sci. Rep., vol. 11, no. 1, p. 14512, 
Jul. 2021, doi: 10.1038/s41598-021-93785-6. 

[64]  W. McCulloch and W. Pitts, “A LOGICAL CALCULUS OF THE IDEAS IMMANENT IN 

NERVOUS ACTIVITY,” Bull. Math. Biophys., vol. 5, pp. 115–133, 1943. 

[65]  D. Hebb, The Organization of Behavior. John Wiley & Sons Inc, 1949. 

[66]  F.  Rosenblatt,  “The  perceptron:  A  probabilistic  model  for  information  storage  and 
organization in the brain,” Psychol. Rev., vol. 65, pp. 386–408, 1958, doi: 10.1037/h0042519. 

[67]  P. S. Churchland, “How Do Neurons Know?,” Daedalus, vol. 133, no. 1, pp. 42–50, 2004. 

[68]  “How  Neural  Networks 

Learn 

from 

Experience,” 

Scientific 

American. 

https://www.scientificamerican.com/article/how-neural-networks-learn-from-expe/ 
(accessed Apr. 07, 2023). 

100 

 
[69]  H.  White,  “Learning  in  Artificial  Neural  Networks:  A  Statistical  Perspective,”  Neural 

Comput., vol. 1, no. 4, pp. 425–464, Dec. 1989, doi: 10.1162/neco.1989.1.4.425. 

[70]  S.  Sharma,  S.  Sharma,  and  A.  Athaiya,  “ACTIVATION  FUNCTIONS  IN  NEURAL 
NETWORKS,” Int. J. Eng. Appl. Sci. Technol., vol. 04, no. 12, pp. 310–316, May 2020, doi: 
10.33564/IJEAST.2020.v04i12.054. 

[71]  A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectiﬁer Nonlinearities Improve Neural Network 

Acoustic Models”. 

[72]  V. Nair and G. E. Hinton, “Rectified Linear Units Improve Restricted Boltzmann Machines”. 

[73]  A.  Krizhevsky,  I.  Sutskever,  and  G.  E.  Hinton,  “ImageNet  Classification  with  Deep 
Convolutional Neural Networks,” in  Advances in Neural Information Processing Systems, 
Curran  Associates, 
[Online].  Available: 
Inc.,  2012.  Accessed:  Apr.  07,  2023. 
https://proceedings.neurips.cc/paper_files/paper/2012/hash/c399862d3b9d6b76c8436e924a
68c45b-Abstract.html 

[74]  R.  Pramoditha,  “How  to  Choose  the  Right  Activation  Function  for  Neural  Networks,” 
Medium, Jan. 26, 2022. https://towardsdatascience.com/how-to-choose-the-right-activation-
function-for-neural-networks-3941ff0e6f9c (accessed Apr. 07, 2023). 

[75]  D. R. Cox, “The Regression Analysis of Binary Sequences,” J. R. Stat. Soc. Ser. B Methodol., 

vol. 20, no. 2, pp. 215–242, 1958. 

[76]  P.-T. de Boer, D. P. Kroese, S. Mannor, and R. Y. Rubinstein, “A Tutorial on the Cross-
Entropy  Method,”  Ann.  Oper.  Res.,  vol.  134,  no.  1,  pp.  19–67,  Feb.  2005,  doi: 
10.1007/s10479-005-5724-z. 

[77]  R.  Y.  Rubinstein  and  D.  P.  Kroese,  The  Cross  Entropy  Method:  A  Unified  Approach  To 
Combinatorial Optimization, Monte-carlo Simulation (Information Science and Statistics). 
Berlin, Heidelberg: Springer-Verlag, 2004. 

[78]  P. J. Huber, “Robust Estimation of a Location Parameter,” Ann. Math. Stat., vol. 35, no. 1, 

pp. 73–101, Mar. 1964, doi: 10.1214/aoms/1177703732. 

[79]  P. Werbos, “Beyond Regression: New Tools for Prediction and Analysis in the Behavioral 

Science. Thesis (Ph. D.). Appl. Math. Harvard University,” 1974. 

[80]  D.  E.  Rumelhart,  G.  E.  Hinton,  and  R.  J.  Williams,  “Learning  representations  by  back-
propagating  errors,”  Nature,  vol.  323,  no.  6088,  Art.  no.  6088,  Oct.  1986,  doi: 
10.1038/323533a0. 

[81]  “Learning  Internal  Representations  by  Error  Propagation.”  Accessed:  Apr.  07,  2023. 

[Online]. Available: https://apps.dtic.mil/sti/citations/ADA164453 

[82]  P. J. Werbos, “Backpropagation through time: what it does and how to do it,” Proc. IEEE, 

vol. 78, no. 10, pp. 1550–1560, Oct. 1990, doi: 10.1109/5.58337. 

101 

 
[83]  Kohonen,  Barna,  and  Chrisley,  “Statistical  pattern  recognition  with  neural  networks: 
benchmarking  studies,”  in  IEEE  1988  International  Conference  on  Neural  Networks,  Jul. 
1988, pp. 61–68 vol.1. doi: 10.1109/ICNN.1988.23829. 

[84]  F. F. Li, J. Johnson, and S. Yeung, “Detection and Segmentation,” May 10, 2017. [Online]. 

Available: http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf 

[85]  A.  P.  Dhawan,  “A  review  on  biomedical  image  processing  and  future  trends,”  Comput. 
Methods  Programs  Biomed.,  vol.  31,  no.  3,  pp.  141–183,  Mar.  1990,  doi:  10.1016/0169-
2607(90)90001-P. 

[86]  H. Wechsler and J. Sklansky, “Finding the rib cage in chest radiographs,” Pattern Recognit., 

vol. 9, no. 1, pp. 21–30, Jan. 1977, doi: 10.1016/0031-3203(77)90027-9. 

[87]  Ballard and Sklansky, “A Ladder-Structured Decision Tree for Recognizing Tumors in Chest 
Radiographs,”  IEEE  Trans.  Comput.,  vol.  C–25,  no.  5,  pp.  503–513,  May  1976,  doi: 
10.1109/TC.1976.1674638. 

[88]  Y.  P.  CHIEN,  “Preprocessing  and  Feature  Extraction  of  Picture  Patterns.,”  Ph.D.,  Purdue 
University,  United  States  --  Indiana.  Accessed:  Apr.  09,  2023.  [Online].  Available: 
https://www.proquest.com/docview/302740388/citation/33A85B01E15B49FCPQ/1 

[89]  M. Cocklin, A. Gourlay, P. Jackson, G. Kaye, I. Kerr, and P. Lams, “Digital processing of 
chest  radiographs,”  Image  Vis.  Comput.,  vol.  1,  no.  2,  pp.  67–78,  May  1983,  doi: 
10.1016/0262-8856(83)90044-6. 

[90]  K. Preston, M. J. B. Duff, S. Levialdi, P. E. Norgren, and J. Toriwaki, “Basics of cellular 
logic with some applications in medical image processing,” Proc. IEEE, vol. 67, no. 5, pp. 
826–856, May 1979, doi: 10.1109/PROC.1979.11331. 

[91]  R. H. Sherrier and G. A. Johnson, “Regionally adaptive histogram equalization of the chest,” 

IEEE Trans. Med. Imaging, vol. 6, no. 1, pp. 1–7, 1987, doi: 10.1109/TMI.1987.4307791. 

[92]  H.  P.  McAdams,  G.  A.  Johnson,  S.  A.  Suddarth,  and  C.  E.  Ravin,  “Histogram-directed 
processing of digital chest images,” Invest. Radiol., vol. 21, no. 3, pp. 253–259, Mar. 1986, 
doi: 10.1097/00004424-198603000-00011. 

[93]  G.  Davis  and  S.  T.  Wallenslager,  “Improvement  of  Chest  Region  CT  Images  through 
Automated Gray-Level Remapping,” IEEE Trans. Med. Imaging, vol. 5, no. 1, pp. 30–34, 
1986, doi: 10.1109/TMI.1986.4307736. 

[94]  S. M. Pizer, J. B. Zimmerman, and E. V. Staab, “Adaptive grey level assignment in CT scan 

display,” J. Comput. Assist. Tomogr., vol. 8, no. 2, pp. 300–305, Apr. 1984. 

[95]  R. W. Conners and C. A. Harlow, “Toward a structural textural analyzer based on statistical 
methods,”  Comput.  Graph.  Image  Process.,  vol.  12,  no.  3,  pp.  224–256,  Mar.  1980,  doi: 
10.1016/0146-664X(80)90013-1. 

102 

 
[96]  R. S. Ledley, H. K. Huang, and L. S. Rotolo, “A texture analysis method in classification of 
coal workers’ pneumoconiosis,” Comput. Biol. Med., vol. 5, no. 1, pp. 53–67, Jun. 1975, doi: 
10.1016/0010-4825(75)90018-9. 

[97]  H.  WECHSLER,  “Automatic  Detection  of  Rib  Contours  in  Chest  Radiographs.,”  Ph.D., 
University  of  California,  Irvine,  United  States  --  California.  Accessed:  Apr.  09,  2023. 
[Online]. 
Available: 
https://www.proquest.com/docview/302757560/citation/B837D40E4C3B452FPQ/1 

[98]  E.  Hall  and  R.  Turner,  “Automated  Measurements  from  Chest  X-Rays  for  Lung  Disease 

Classification,” Aug. 1975. 

[99]  K.  Fukushima,  “Neocognitron:  A  hierarchical  neural  network  capable  of  visual  pattern 
recognition,”  Neural  Netw.,  vol.  1,  no.  2,  pp.  119–130,  Jan.  1988,  doi:  10.1016/0893-
6080(88)90014-7. 

[100]  R. Yamashita, M. Nishio, R. K. G. Do, and K. Togashi, “Convolutional neural networks: 
an overview and application in radiology,” Insights Imaging, vol. 9, no. 4, Art. no. 4, Aug. 
2018, doi: 10.1007/s13244-018-0639-9. 

[101]  M.  Ahmad,  J.  Khan,  A.  Yousaf,  S.  Ghuffar,  and  K.  Khurshid,  “Deep  Learning:  A 
Breakthrough  in  Medical  Imaging,”  Curr.  Med.  Imaging  Rev.,  vol.  16,  pp.  946–956,  Oct. 
2020, doi: 10.2174/1573405615666191219100824. 

[102]  S. Balaji, “Binary Image classifier CNN using TensorFlow,” Techiepedia, Aug. 29, 2020. 

https://medium.com/techiepedia/binary-image-classifier-cnn-using-tensorflow-
a3f5d6746697 (accessed Jul. 18, 2023). 

[103]  Z.  J.  Wang  et  al.,  “CNN  101:  Interactive  Visual  Learning  for  Convolutional  Neural 
Networks,”  in  Extended  Abstracts  of  the  2020  CHI  Conference  on  Human  Factors  in 
Computing  Systems,  in  CHI  EA  ’20.  New  York,  NY,  USA:  Association  for  Computing 
Machinery, Apr. 2020, pp. 1–7. doi: 10.1145/3334480.3382899. 

[104]  O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical 

Image Segmentation.” arXiv, May 18, 2015. doi: 10.48550/arXiv.1505.04597. 

[105]  N. Siddique, S. Paheding, C. P. Elkin, and V. Devabhaktuni, “U-Net and Its Variants for 
Medical Image Segmentation: A Review of Theory and Applications,” IEEE Access, vol. 9, 
pp. 82031–82057, 2021, doi: 10.1109/ACCESS.2021.3086020. 

[106]  M. Drozdzal,  E. Vorontsov, G. Chartrand, S. Kadoury, and C. Pal,  “The  Importance of 
Skip Connections in Biomedical Image Segmentation,” in Deep Learning and Data Labeling 
for Medical Applications, G. Carneiro, D. Mateus, L. Peter, A. Bradley, J. M. R. S. Tavares, 
V. Belagiannis, J. P. Papa, J. C. Nascimento, M. Loog, Z. Lu, J. S. Cardoso, and J. Cornebise, 
Eds., in Lecture Notes in Computer Science. Cham: Springer International Publishing, 2016, 
pp. 179–187. doi: 10.1007/978-3-319-46976-8_19. 

103 

 
[107]  K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition.” 

arXiv, Dec. 10, 2015. doi: 10.48550/arXiv.1512.03385. 

[108]  S. Li, J. Jiao, Y. Han, and T. Weissman, “Demystifying ResNet.” arXiv, May 20, 2017. 

doi: 10.48550/arXiv.1611.01186. 

[109]  O. P. Sharma, M. F. Oswanski, S. Jolly, S. K. Lauer, R. Dressel, and H. A. Stombaugh, 
“Perils  of  Rib  Fractures,”  Am.  Surg.,  vol.  74,  no.  4,  pp.  310–314,  Apr.  2008,  doi: 
10.1177/000313480807400406. 

[110]  L.  Fabricant,  B.  Ham,  R.  Mullins,  and  J.  Mayberry,  “Prolonged  pain  and  disability  are 
common after rib fractures,” Am. J. Surg., vol. 205, no. 5, pp. 511–515; discusssion 515-516, 
May 2013, doi: 10.1016/j.amjsurg.2012.12.007. 

[111]  M.  Kaiume  et  al.,  “Rib  fracture  detection  in  computed  tomography  images  using  deep 
convolutional  neural  networks,”  Medicine  (Baltimore),  vol.  100,  no.  20,  p.  e26024,  May 
2021, doi: 10.1097/MD.0000000000026024. 

[112]  B. Zhang  et  al., “Improving  rib fracture detection accuracy  and reading  efficiency with 
deep  learning-based  detection  software:  a  clinical  evaluation,”  Br.  J.  Radiol.,  vol.  94,  no. 
1118, p. 20200870, Feb. 2021, doi: 10.1259/bjr.20200870. 

[113]  X. H. Meng et al., “A fully automated rib fracture detection system on chest CT images 
and its impact on radiologist performance,” Skeletal Radiol., vol. 50, no. 9, pp. 1821–1828, 
Sep. 2021, doi: 10.1007/s00256-021-03709-8. 

[114]  Y. Hu, X. He, R. Zhang, L. Guo, L. Gao, and J. Wang, “Slice grouping and aggregation 
network for auxiliary diagnosis of rib fractures,” Biomed. Signal Process. Control, vol. 67, 
p. 102547, May 2021, doi: 10.1016/j.bspc.2021.102547. 

[115]  L. Yao et al., “Rib fracture detection system based on deep learning,” Sci. Rep., vol. 11, 

no. 1, Art. no. 1, Dec. 2021, doi: 10.1038/s41598-021-03002-7. 

[116]  T.  Weikert  et  al.,  “Assessment  of  a  Deep  Learning  Algorithm  for  the  Detection  of  Rib 
Fractures on Whole-Body Trauma Computed Tomography,” Korean J. Radiol., vol. 21, no. 
7, pp. 891–899, Jul. 2020, doi: 10.3348/kjr.2019.0653. 

[117]  S. Wang, D. Wu, L. Ye, Z. Chen, Y. Zhan, and Y. Li, “Assessment of automatic rib fracture 
detection on chest CT using a deep learning algorithm,” Eur. Radiol., vol. 33, no. 3, pp. 1824–
1834, Mar. 2023, doi: 10.1007/s00330-022-09156-w. 

[118]  R.  Castro-Zunti,  K.  J.  Chae,  Y.  Choi,  G.  Y.  Jin,  and  S.-B.  Ko,  “Assessing  the  speed-
accuracy  trade-offs  of  popular  convolutional  neural  networks  for  single-crop  rib  fracture 
classification,” Comput. Med. Imaging Graph. Off. J. Comput. Med. Imaging Soc., vol. 91, 
p. 101937, Jul. 2021, doi: 10.1016/j.compmedimag.2021.101937. 

104 

 
[119]  L. Jin et al., “Deep-learning-assisted detection and segmentation of rib fractures from CT 
scans:  Development  and  validation  of  FracNet,”  eBioMedicine,  vol.  62,  p.  103106,  Dec. 
2020, doi: 10.1016/j.ebiom.2020.103106. 

[120]  M.  Wu  et  al.,  “Development  and  Evaluation  of  a  Deep  Learning  Algorithm  for  Rib                     

Segmentation and Fracture Detection from Multicenter Chest CT                     Images,” 
Radiol. Artif. Intell., vol. 3, no. 5, p. e200248, Sep. 2021, doi: 10.1148/ryai.2021200248. 

[121]  J. Zhang, Z. Li, S. Yan, H. Cao, J. Liu, and D. Wei, “An Algorithm for Automatic Rib 
Fracture  Recognition  Combined  with  nnU-Net  and  DenseNet,”  Evid.  Based  Complement. 
Alternat. Med., vol. 2022, p. e5841451, Feb. 2022, doi: 10.1155/2022/5841451. 

[122]  J. Wu et al., “Convolutional neural network for detecting rib fractures on chest radiographs: 
a feasibility study,” BMC Med. Imaging, vol. 23, no. 1, p. 18, Jan. 2023, doi: 10.1186/s12880-
023-00975-x. 

[123]  A.-C. Tsai, Y.-Y. Ou, C.-H. Lin, C.-W. Chen, and J.-F. Wang, “Rib Fracture Diagnosis 
System  on  Chest  X-Rays  with  Deep  Learning,”  in  2021  9th  International  Conference  on 
Orange Technology (ICOT), Dec. 2021, pp. 1–4. doi: 10.1109/ICOT54518.2021.9680611. 

[124]  Q.-Q.  Zhou  et  al.,  “Automatic  detection  and  classification  of  rib  fractures  based  on 
patients’ CT images and clinical information via convolutional neural network,” Eur. Radiol., 
vol. 31, no. 6, pp. 3815–3825, Jun. 2021, doi: 10.1007/s00330-020-07418-z. 

[125]  P.-H.  C.  Chen,  Y.  Liu,  and  L.  Peng,  “How  to  develop  machine  learning  models  for 
healthcare,” Nat. Mater., vol. 18, no. 5, pp. 410–414, May 2019, doi: 10.1038/s41563-019-
0345-0. 

[126]  P. Hamet and J. Tremblay, “Artificial intelligence in medicine,” Metabolism., vol. 69S, pp. 

S36–S40, Apr. 2017, doi: 10.1016/j.metabol.2017.01.011. 

[127]  L. Perez and J. Wang, “The Effectiveness of Data Augmentation in Image Classification 

using Deep Learning.” arXiv, Dec. 13, 2017. doi: 10.48550/arXiv.1712.04621. 

[128]  A. Mikołajczyk and M. Grochowski, “Data augmentation for improving deep learning in 
image  classification  problem,”  in  2018  International  Interdisciplinary  PhD  Workshop 
(IIPhDW), May 2018, pp. 117–122. doi: 10.1109/IIPHDW.2018.8388338. 

[129]  I.  J.  Goodfellow  et  al.,  “Generative  Adversarial  Networks.”  arXiv,  Jun.  10,  2014. 

Accessed: Apr. 12, 2023. [Online]. Available: http://arxiv.org/abs/1406.2661 

[130]  K. Armanious et al., “MedGAN: Medical Image Translation using GANs,” Comput. Med. 
Imaging Graph., vol. 79, p. 101684, Jan. 2020, doi: 10.1016/j.compmedimag.2019.101684. 

[131]  C. Andrade, L. F. Teixeira, M. J. M. Vasconcelos, and L. Rosado, “Data Augmentation 
Using  Adversarial  Image-to-Image  Translation  for  the  Segmentation  of  Mobile-Acquired 
Dermatological 
Imaging,  vol.  7,  no.  1,  p.  2,  Dec.  2020,  doi: 
10.3390/jimaging7010002. 

Images,”  J. 

105 

 
[132]  C. Han et al., “Combining Noise-to-Image and Image-to-Image GANs: Brain MR Image 
Augmentation for Tumor Detection,” IEEE Access, vol. 7, pp. 156966–156977, 2019, doi: 
10.1109/ACCESS.2019.2947606. 

[133]  P. Welander, S. Karlsson, and A. Eklund, “Generative Adversarial Networks for Image-to-
Image Translation on Multi-Contrast MR Images - A Comparison of CycleGAN and UNIT.” 
arXiv, Jun. 20, 2018. doi: 10.48550/arXiv.1806.07777. 

[134]  A. Radford, L. Metz, and S. Chintala, “Unsupervised Representation Learning with Deep 
Jan.  07,  2016.  doi: 

Convolutional  Generative  Adversarial  Networks.”  arXiv, 
10.48550/arXiv.1511.06434. 

[135]  L.  Metz,  B.  Poole,  D.  Pfau,  and  J.  Sohl-Dickstein,  “Unrolled  Generative  Adversarial 

Networks.” arXiv, May 12, 2017. doi: 10.48550/arXiv.1611.02163. 

[136]  M.  Arjovsky  and  L.  Bottou,  “Towards  Principled  Methods  for  Training  Generative 
Adversarial Networks.” arXiv, Jan. 17, 2017. Accessed: Apr. 13, 2023. [Online]. Available: 
http://arxiv.org/abs/1701.04862 

[137]  L. Bottou, F. E. Curtis, and J. Nocedal, “Optimization Methods for Large-Scale Machine 

Learning.” arXiv, Feb. 08, 2018. doi: 10.48550/arXiv.1606.04838. 

[138]  T.  Miyato,  T.  Kataoka,  M.  Koyama,  and  Y.  Yoshida,  “Spectral  Normalization  for 
Generative Adversarial Networks.” arXiv, Feb. 16, 2018. Accessed: Apr. 13, 2023. [Online]. 
Available: http://arxiv.org/abs/1802.05957 

[139]  X.  Mao,  Q.  Li,  H.  Xie,  R.  Y.  K.  Lau,  Z.  Wang,  and  S.  P.  Smolley,  “Least  Squares 

Generative Adversarial Networks”. 

[140]  S. Nowozin, B. Cseke, and R. Tomioka, “f-GAN: Training Generative Neural Samplers 
doi: 

arXiv, 

2016. 

Jun. 

02, 

using  Variational  Divergence  Minimization.” 
10.48550/arXiv.1606.00709. 

[141]  K.  Roth,  A.  Lucchi,  S.  Nowozin,  and  T.  Hofmann,  “Stabilizing  Training  of  Generative 
Adversarial  Networks  through  Regularization,”  in  Advances  in  Neural  Information 
Processing  Systems,  Curran  Associates,  Inc.,  2017.  Accessed:  Apr.  13,  2023.  [Online]. 
Available: 
https://proceedings.neurips.cc/paper/2017/hash/7bccfde7714a1ebadf06c5f4cea752c1-
Abstract.html 

[142]  C. K. Sønderby, J. Caballero, L. Theis, W. Shi, and F. Huszár, “Amortised MAP Inference 
for Image Super-resolution.” arXiv, Feb. 21, 2017. doi: 10.48550/arXiv.1610.04490. 

[143]  N. Kodali, J. Abernethy, J. Hays, and Z. Kira, “On Convergence and Stability of GANs.” 
[Online].  Available: 

2017.  Accessed:  Apr. 

2023. 

10, 

13, 

arXiv,  Dec. 
http://arxiv.org/abs/1705.07215 

106 

 
[144]  M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein GAN.” arXiv, Dec. 06, 2017. doi: 

10.48550/arXiv.1701.07875. 

[145]  I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, “Improved Training 

of Wasserstein GANs.” arXiv, Dec. 25, 2017. doi: 10.48550/arXiv.1704.00028. 

[146]  L. Mescheder, A. Geiger, and S. Nowozin, “Which Training Methods for GANs do actually 
Converge?”  arXiv,  Jul.  31,  2018.  Accessed:  Apr.  13,  2023.  [Online].  Available: 
http://arxiv.org/abs/1801.04406 

[147]  M. Mirza and S. Osindero, “Conditional Generative Adversarial  Nets.” arXiv, Nov. 06, 

2014. doi: 10.48550/arXiv.1411.1784. 

[148]  S. Ma, J. Fu, C. W. Chen, and T. Mei, “DA-GAN: Instance-level Image Translation by 
Deep Attention Generative Adversarial Networks (with Supplementary Materials).” arXiv, 
Feb. 18, 2018. Accessed: Apr. 14, 2023. [Online]. Available: http://arxiv.org/abs/1802.06454 

[149]  T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive Growing of GANs for Improved 
Quality, Stability, and Variation.” arXiv, Feb. 26, 2018. Accessed: Apr. 15, 2023. [Online]. 
Available: http://arxiv.org/abs/1710.10196 

[150]  T. Karras, S. Laine, and T. Aila, “A Style-Based Generator Architecture for Generative 

Adversarial Networks.” arXiv, Mar. 29, 2019. doi: 10.48550/arXiv.1812.04948. 

[151]  A. Odena, C. Olah, and J. Shlens, “Conditional Image Synthesis With Auxiliary Classifier 

GANs.” arXiv, Jul. 20, 2017. doi: 10.48550/arXiv.1610.09585. 

[152]  P.  Isola,  J.-Y.  Zhu,  T.  Zhou,  and  A.  A.  Efros,  “Image-to-Image  Translation  with 
Conditional Adversarial Networks.” arXiv, Nov. 26, 2018. doi: 10.48550/arXiv.1611.07004. 

[153]  Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, “StarGAN: Unified Generative 
Adversarial Networks for Multi-Domain Image-to-Image Translation.” arXiv, Sep. 21, 2018. 
Accessed: Apr. 15, 2023. [Online]. Available: http://arxiv.org/abs/1711.09020 

[154]  J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired Image-to-Image Translation using 
doi: 

arXiv,  Aug. 

2020. 

24, 

Cycle-Consistent  Adversarial  Networks.” 
10.48550/arXiv.1703.10593. 

[155]  Y.  Pang,  J.  Lin,  T.  Qin,  and  Z.  Chen,  “Image-to-Image  Translation:  Methods  and 
Applications.”  arXiv,  Jul.  03,  2021.  Accessed:  Apr.  14,  2023.  [Online].  Available: 
http://arxiv.org/abs/2101.08629 

[156]  S. Tripathy, J. Kannala, and E. Rahtu, “Learning Image-to-Image Translation Using Paired 
and Unpaired Training Samples,” in Computer Vision – ACCV 2018, C. V. Jawahar, H. Li, 
G.  Mori,  and  K.  Schindler,  Eds.,  in  Lecture  Notes  in  Computer  Science.  Cham:  Springer 
International Publishing, 2019, pp. 51–66. doi: 10.1007/978-3-030-20890-5_4. 

107 

 
[157]  E. Denton, S. Chintala, A. Szlam, and R. Fergus, “Deep Generative Image Models using a 
Jun.  18,  2015.  doi: 

arXiv, 

Laplacian  Pyramid  of  Adversarial  Networks.” 
10.48550/arXiv.1506.05751. 

[158]  A.  A.  Efros  and  T.  K.  Leung,  “Texture  synthesis  by  non-parametric  sampling,”  in 
Proceedings of the Seventh IEEE International Conference on Computer Vision, Sep. 1999, 
pp. 1033–1038 vol.2. doi: 10.1109/ICCV.1999.790383. 

[159]  “CycleGAN.” https://hardikbansal.github.io/CycleGANBlog/ (accessed Jul. 18, 2023). 

[160]  Y.  Chen  et  al.,  “Generative  Adversarial  Networks  in  Medical  Image  augmentation:  A 
doi: 
vol. 

105382,  May 

2022, 

144, 

p. 

review,”  Comput.  Biol.  Med., 
10.1016/j.compbiomed.2022.105382. 

[161]  X. Yi, E. Walia, and P. Babyn, “Generative Adversarial Network in Medical Imaging: A 
doi: 
vol. 

101552,  Dec. 

Image  Anal., 

2019, 

58, 

p. 

Review,”  Med. 
10.1016/j.media.2019.101552. 

[162]  C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the Inception 
Architecture  for  Computer  Vision,”  in  2016  IEEE  Conference  on  Computer  Vision  and 
Pattern Recognition (CVPR), Las Vegas, NV, USA: IEEE, Jun. 2016, pp. 2818–2826. doi: 
10.1109/CVPR.2016.308. 

[163]  A. Borji, “Pros and cons of GAN evaluation measures,” Comput. Vis. Image Underst., vol. 

179, pp. 41–65, Feb. 2019, doi: 10.1016/j.cviu.2018.10.009. 

[164]  T.  Salimans  et  al.,  “Improved  Techniques  for  Training  GANs,”  in  Advances  in  Neural 
Information  Processing  Systems,  Curran  Associates,  Inc.,  2016.  Accessed:  Apr.  14,  2023. 
[Online]. 
Available: 
https://proceedings.neurips.cc/paper/2016/hash/8a3363abe792db2d8761d6403605aeb7-
Abstract.html 

[165]  M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “GANs Trained by 
a  Two  Time-Scale  Update  Rule  Converge  to  a  Local  Nash  Equilibrium,”  in  Advances  in 
Neural Information Processing Systems, Curran Associates, Inc., 2017. Accessed: Apr. 14, 
2023. 
Available: 
https://proceedings.neurips.cc/paper_files/paper/2017/hash/8a1d694707eb0fefe6587136907
4926d-Abstract.html 

[Online]. 

[166]  B. Segal, D. M. Rubin, G. Rubin, and A. Pantanowitz, “Evaluating the Clinical Realism of 
Synthetic Chest X-Rays Generated Using Progressively Growing GANs,” Sn Comput. Sci., 
vol. 2, no. 4, p. 321, 2021, doi: 10.1007/s42979-021-00720-7. 

[167]  M. Lucic, K. Kurach, M. Michalski, S. Gelly, and O. Bousquet, “Are GANs Created Equal? 
A  Large-Scale  Study,”  in  Advances  in  Neural  Information  Processing  Systems,  Curran 
[Online].  Available: 
Associates, 
https://proceedings.neurips.cc/paper_files/paper/2018/hash/e46de7e1bcaaced9a54f1e9d0d2
f800d-Abstract.html 

2018.  Accessed:  Apr. 

2023. 

Inc., 

14, 

108 

 
[168]  Z. Zhou, W. Zhang, and J. Wang, “Inception Score, Label Smoothing, Gradient Vanishing 

and -log(D(x)) Alternative,” Aug. 2017. doi: 10.48550/arXiv.1708.01729. 

[169]  A.  Obukhov  and  M.  Krasnyanskiy,  “Quality  Assessment  Method  for  GAN  Based  on 
Modified Metrics Inception Score and Fréchet Inception Distance,” in Software Engineering 
Perspectives  in  Intelligent  Systems,  R.  Silhavy,  P.  Silhavy,  and  Z.  Prokopova,  Eds.,  in 
Advances in Intelligent Systems and Computing. Cham: Springer International Publishing, 
2020, pp. 102–114. doi: 10.1007/978-3-030-63322-6_8. 

[170]  Y. Chen et al., “AI-Based Reconstruction for Fast MRI—A Systematic Review and Meta-
IEEE,  vol.  110,  no.  2,  pp.  224–245,  Feb.  2022,  doi: 

Analysis,”  Proc. 
10.1109/JPROC.2022.3141367. 

[171]  A.  Waheed,  M.  Goyal,  D.  Gupta,  A.  Khanna,  F.  Al-Turjman,  and  P.  R.  Pinheiro, 
“CovidGAN: Data Augmentation Using Auxiliary Classifier GAN for Improved Covid-19 
doi: 
8, 
Access, 
Detection,” 
10.1109/ACCESS.2020.2994762. 

91916–91923, 

2020, 

IEEE 

vol. 

pp. 

[172]  S. A. Kamran, K. F. Hossain, A. Tavakkoli, S. L. Zuckerbrod, K. M. Sanders, and S. A. 
Baker, “RV-GAN: Segmenting Retinal Vascular Structure in Fundus Photographs Using a 
Novel  Multi-scale  Generative  Adversarial  Network,”  in  Medical  Image  Computing  and 
Computer Assisted Intervention – MICCAI 2021, M. de Bruijne, P. C. Cattin, S. Cotin, N. 
Padoy, S. Speidel, Y. Zheng, and C. Essert, Eds., in  Lecture  Notes in  Computer Science. 
Cham: Springer International Publishing, 2021, pp. 34–44. doi: 10.1007/978-3-030-87237-
3_4. 

[173]  S. U. H. Dar, M. Yurt, M. Shahdloo, M. E. Ildız, B. Tınaz, and T. Çukur, “Prior-Guided 
Image  Reconstruction  for  Accelerated  Multi-Contrast  MRI  via  Generative  Adversarial 
Networks,” IEEE J. Sel. Top. Signal Process., vol. 14, no. 6, pp. 1072–1087, Oct. 2020, doi: 
10.1109/JSTSP.2020.3001737. 

[174]  E. Cha, H. Chung, E. Y. Kim, and J. C. Ye, “Unpaired Training of Deep Learning tMRA 
for  Flexible  Spatio-Temporal  Resolution,”  IEEE  Trans.  Med.  Imaging,  vol.  40,  no.  1,  pp. 
166–179, Jan. 2021, doi: 10.1109/TMI.2020.3023620. 

[175]  V. Edupuganti, M. Mardani, S. Vasanawala, and J. Pauly, “Uncertainty Quantification in 

Deep MRI Reconstruction.” arXiv, Apr. 25, 2020. doi: 10.48550/arXiv.1901.11228. 

[176]  M.  Jiang  et  al.,  “Accelerating  CS-MRI  Reconstruction  With  Fine-Tuning  Wasserstein 
Generative  Adversarial  Network,”  IEEE  Access,  vol.  7,  pp.  152347–152357,  2019,  doi: 
10.1109/ACCESS.2019.2948220. 

[177]  G.  Oh,  B.  Sim,  H.  Chung,  L.  Sunwoo,  and  J.  C.  Ye,  “Unpaired  Deep  Learning  for 
Accelerated MRI using Optimal Transport Driven CycleGAN.” arXiv, Aug. 29, 2020. doi: 
10.48550/arXiv.2008.12967. 

109 

 
[178]  R. Shaul, I. David, O. Shitrit, and T. Riklin Raviv, “Subsampled brain MRI reconstruction 
by generative adversarial neural networks,” Med. Image Anal., vol. 65, p. 101747, Oct. 2020, 
doi: 10.1016/j.media.2020.101747. 

[179]  S.  Bera  and  P.  K.  Biswas,  “Noise  Conscious  Training  of  Non  Local  Neural  Network 
Powered by Self Attentive Spectral Normalized Markovian Patch GAN for Low Dose CT 
Denoising,”  IEEE  Trans.  Med.  Imaging,  vol.  40,  no.  12,  pp.  3663–3673,  Dec.  2021,  doi: 
10.1109/TMI.2021.3094525. 

[180]  A. B. Qasim et al., “Red-GAN: Attacking class imbalance via conditioned generation. Yet 
another perspective on medical image synthesis for skin lesion dermoscopy and brain tumor 
[Online].  Available: 
MRI.”  arXiv,  Mar.  27,  2021.  Accessed:  Jul.  16,  2023. 
http://arxiv.org/abs/2004.10734 

[181]  Z. Huang, J. Zhang, Y. Zhang, and H. Shan, “DU-GAN: Generative Adversarial Networks 
With Dual-Domain U-Net-Based Discriminators for Low-Dose CT Denoising,” IEEE Trans. 
Instrum. Meas., vol. 71, pp. 1–12, 2022, doi: 10.1109/TIM.2021.3128703. 

[182]  M.  Li,  W.  Hsu,  X.  Xie,  J.  Cong,  and  W.  Gao,  “SACNN:  Self-Attention  Convolutional 
Neural  Network  for  Low-Dose  CT  Denoising  With  Self-Supervised  Perceptual  Loss 
Network,”  IEEE  Trans.  Med.  Imaging,  vol.  39,  no.  7,  pp.  2289–2301,  Jul.  2020,  doi: 
10.1109/TMI.2020.2968472. 

[183]  Y. Ma, B. Wei, P. Feng, P. He, X. Guo, and G. Wang, “Low-Dose CT Image Denoising 
Using a Generative Adversarial Network With a Hybrid Loss Function for Noise Learning,” 
IEEE Access, vol. 8, pp. 67519–67529, 2020, doi: 10.1109/ACCESS.2020.2986388. 

[184]  Z.  Li,  S.  Zhou,  J.  Huang,  L.  Yu,  and  M.  Jin,  “Investigation  of  Low-Dose  CT  Image 
Denoising Using Unpaired Deep Learning Methods,” IEEE Trans. Radiat. Plasma Med. Sci., 
vol. 5, no. 2, pp. 224–234, Mar. 2021, doi: 10.1109/TRPMS.2020.3007583. 

[185]  Z.  Yin,  K.  Xia,  Z.  He,  J.  Zhang,  S.  Wang,  and  B.  Zu,  “Unpaired  Image  Denoising  via 
Wasserstein GAN in Low-Dose CT Image with Multi-Perceptual Loss and Fidelity Loss,” 
Symmetry, vol. 13, no. 1, Art. no. 1, Jan. 2021, doi: 10.3390/sym13010126. 

[186]  Q. Yang et al., “Low-Dose CT Image Denoising Using a Generative Adversarial Network 
With Wasserstein Distance and Perceptual Loss,” IEEE Trans. Med. Imaging, vol. 37, no. 6, 
pp. 1348–1357, Jun. 2018, doi: 10.1109/TMI.2018.2827462. 

[187]  M.  Gholizadeh-Ansari,  J.  Alirezaie,  and  P.  Babyn,  “Deep  Learning  for  Low-Dose  CT 
Denoising Using Perceptual Loss and Edge Detection Layer,” J. Digit. Imaging, vol. 33, no. 
2, pp. 504–515, Apr. 2020, doi: 10.1007/s10278-019-00274-4. 

[188]  W.  Lingle  et  al.,  “The  Cancer  Genome  Atlas  Breast  Invasive  Carcinoma  Collection 
doi: 

Archive, 

Imaging 

Cancer 

2016. 

(TCGA-BRCA).” 
10.7937/K9/TCIA.2016.AB2NAZRP. 

The 

110 

 
[189]  H. Xue et al., “A 3D attention residual encoder–decoder least-square GAN for low-count 
PET denoising,” Nucl. Instrum. Methods Phys. Res. Sect. Accel. Spectrometers Detect. Assoc. 
Equip., vol. 983, p. 164638, Dec. 2020, doi: 10.1016/j.nima.2020.164638. 

[190]  Y. Luo et al., “3D Transformer-GAN for High-Quality PET Reconstruction,” in Medical 
Image Computing and Computer Assisted Intervention – MICCAI 2021, M. de Bruijne, P. C. 
Cattin, S. Cotin, N. Padoy, S. Speidel, Y. Zheng, and C. Essert, Eds., in Lecture Notes in 
Computer  Science.  Cham:  Springer  International  Publishing,  2021,  pp.  276–285.  doi: 
10.1007/978-3-030-87231-1_27. 

[191]  J.  Ouyang,  K.  T.  Chen,  E.  Gong,  J.  Pauly,  and  G.  Zaharchuk,  “Ultra-low-dose  PET 
reconstruction using generative adversarial network with feature matching and task-specific 
perceptual loss,” Med. Phys., vol. 46, no. 8, pp. 3555–3564, 2019, doi: 10.1002/mp.13626. 

[192]  K. T. Chen et al., “Ultra-Low-Dose 18F-Florbetaben Amyloid PET Imaging Using Deep 
Learning with Multi-Contrast MRI Inputs,” Radiology, vol. 290, no. 3, pp. 649–656, Mar. 
2019, doi: 10.1148/radiol.2018180940. 

[193]  Z. Xie et al., “Generative adversarial network based regularized image reconstruction for 
PET,”  Phys.  Med.  Biol.,  vol.  65,  no.  12,  p.  125016,  Jun.  2020,  doi:  10.1088/1361-
6560/ab8f72. 

[194]  H.  Xue  et  al.,  “LCPR-Net:  low-count  PET  image  reconstruction  using  the  domain 
transform and cycle-consistent generative adversarial networks,” Quant. Imaging Med. Surg., 
vol. 11, no. 2, pp. 749–762, Feb. 2021, doi: 10.21037/qims-20-66. 

[195]  J.  Ouyang,  G.  Wang,  E.  Gong,  K.  Chen,  J.  Pauly,  and  G.  Zaharchuk,  “Task-GAN: 
Improving Generative Adversarial Network for Image Reconstruction,” in Machine Learning 
for Medical Image Reconstruction, F. Knoll, A. Maier, D. Rueckert, and J. C. Ye, Eds., in 
Lecture Notes in Computer Science. Cham: Springer International Publishing, 2019, pp. 193–
204. doi: 10.1007/978-3-030-33843-5_18. 

[196]  C.  Decourt  and  L.  Duong,  “Semi-supervised  generative  adversarial  networks  for  the 
segmentation of the left ventricle in pediatric MRI,” Comput. Biol. Med., vol. 123, p. 103884, 
Aug. 2020, doi: 10.1016/j.compbiomed.2020.103884. 

[197]  M. Yang et al., “Automated knee cartilage segmentation for heterogeneous clinical MRI 
using generative adversarial networks with transfer learning,” Quant. Imaging Med. Surg., 
vol. 12, no. 5, pp. 2620633–2622633, May 2022, doi: 10.21037/qims-21-459. 

[198]  M. M. R. Siddiquee et al., “Learning Fixed Points in Generative Adversarial Networks: 
From Image-to-Image Translation to Disease Detection and Localization.” arXiv, Aug. 29, 
2019. Accessed: Apr. 15, 2023. [Online]. Available: http://arxiv.org/abs/1908.06965 

[199]  C. Han et al., “Infinite Brain MR Images: PGGAN-Based Data Augmentation for Tumor 
Detection,”  in  Neural  Approaches  to  Dynamics  of  Signal  Exchanges,  A.  Esposito,  M. 
Faundez-Zanuy,  F.  C.  Morabito,  and  E.  Pasero,  Eds.,  in  Smart  Innovation,  Systems  and 
Technologies. Singapore: Springer, 2020, pp. 291–303. doi: 10.1007/978-981-13-8950-4_27. 

111 

 
[200]  G.  Kwon,  C.  Han,  and  D.  Kim,  “Generation  of  3D  Brain  MRI  Using  Auto-Encoding 
Generative  Adversarial  Networks,”  in  Medical  Image  Computing  and  Computer  Assisted 
Intervention – MICCAI 2019, D. Shen, T. Liu, T. M. Peters, L. H. Staib, C. Essert, S. Zhou, 
P.-T.  Yap,  and  A.  Khan,  Eds.,  in  Lecture  Notes  in  Computer  Science.  Cham:  Springer 
International Publishing, 2019, pp. 118–126. doi: 10.1007/978-3-030-32248-9_14. 

[201]  S. Kaur, H. Aggarwal, and R. Rani, “MR Image Synthesis Using Generative Adversarial 
Networks for Parkinson’s Disease Classification,” P. Bansal, M. Tushir, V. E. Balas, and R. 
Srivastava, Eds., in Advances in Intelligent Systems and Computing, vol. 1164. Singapore: 
Springer Singapore, 2021, pp. 317–327. doi: 10.1007/978-981-15-4992-2_30. 

[202]  X. Gu, H. Knutsson, M. Nilsson, and A. Eklund, “Generating Diffusion MRI Scalar Maps 
from T1 Weighted Images Using Generative Adversarial Networks,” in Image Analysis, M. 
Felsberg,  P.-E.  Forssén,  I.-M.  Sintorn,  and  J.  Unger,  Eds.,  in  Lecture  Notes  in  Computer 
Science. Cham: Springer International Publishing, 2019, pp. 489–498. doi: 10.1007/978-3-
030-20205-7_40. 

[203]  W. Tu, W. Hu, X. Liu, and J. He, “DRPAN: A novel Adversarial Network Approach for 
Retinal Vessel Segmentation,” in 2019 14th IEEE Conference on Industrial Electronics and 
Applications (ICIEA), Jun. 2019, pp. 228–232. doi: 10.1109/ICIEA.2019.8833908. 

[204]  J. Son, S. J. Park, and K.-H. Jung, “Towards Accurate Segmentation of Retinal Vessels 
and the Optic Disc in Fundoscopic Images with Generative Adversarial Networks,” J. Digit. 
Imaging, vol. 32, no. 3, pp. 499–512, Jun. 2019, doi: 10.1007/s10278-018-0126-3. 

[205]  H. Xie et al., “AMD-GAN: Attention encoder and multi-branch structure based generative 
adversarial  networks  for  fundus  disease  detection  from  scanning  laser  ophthalmoscopy 
images,” Neural Netw., vol. 132, pp. 477–490, Dec. 2020, doi: 10.1016/j.neunet.2020.09.005. 

[206]  Z.  Yu,  Q.  Xiang,  J.  Meng,  C.  Kou,  Q.  Ren,  and  Y.  Lu,  “Retinal  image  synthesis  from 
multiple-landmarks input with generative adversarial networks,” Biomed. Eng. OnLine, vol. 
18, no. 1, p. 62, May 2019, doi: 10.1186/s12938-019-0682-x. 

[207]  Y.  Zhou,  X.  He,  S.  Cui,  F.  Zhu,  L.  Liu,  and  L.  Shao,  “High-Resolution  Diabetic 
Retinopathy  Image  Synthesis  Manipulated  by  Grading  and  Lesions,”  in  Medical  Image 
Computing  and  Computer  Assisted  Intervention  –  MICCAI  2019,  D.  Shen,  T.  Liu,  T.  M. 
Peters, L. H. Staib, C. Essert, S. Zhou, P.-T. Yap, and A. Khan, Eds., in Lecture Notes in 
Computer  Science.  Cham:  Springer  International  Publishing,  2019,  pp.  505–513.  doi: 
10.1007/978-3-030-32239-7_56. 

[208]  T.  Zhang  et  al.,  “SkrGAN:  Sketching-Rendering  Unconditional  Generative  Adversarial 
Networks  for  Medical  Image  Synthesis,”  in  Medical  Image  Computing  and  Computer 
Assisted Intervention – MICCAI 2019, D. Shen, T. Liu, T. M. Peters, L. H. Staib, C. Essert, 
S.  Zhou,  P.-T.  Yap,  and  A.  Khan,  Eds.,  in  Lecture  Notes  in  Computer  Science.  Cham: 
Springer International Publishing, 2019, pp. 777–785. doi: 10.1007/978-3-030-32251-9_85. 

112 

 
[209]  B.  Lei  et  al.,  “Skin  lesion  segmentation  via  generative  adversarial  networks  with  dual 
Image  Anal.,  vol.  64,  p.  101716,  Aug.  2020,  doi: 

discriminators,”  Med. 
10.1016/j.media.2020.101716. 

[210]  P. Sedigh, R. Sadeghian, and M. T. Masouleh, “Generating Synthetic Medical Images by 
Using  GAN  to  Improve  CNN  Performance  in  Skin  Cancer  Classification,”  in  2019  7th 
International Conference on Robotics and Mechatronics (ICRoM), Nov. 2019, pp. 497–502. 
doi: 10.1109/ICRoM48714.2019.9071823. 

[211]  A. Gupta, S. Venkatesh, S. Chopra, and C. Ledig, “Generative Image Translation for Data 
Augmentation  of  Bone  Lesion  Pathology,”  in  Proceedings  of  The  2nd  International 
Conference  on  Medical  Imaging  with  Deep  Learning,  PMLR,  May  2019,  pp.  225–235. 
Accessed: 
Available: 
https://proceedings.mlr.press/v102/gupta19b.html 

[Online]. 

2023. 

Apr. 

16, 

[212]  N. Saffari et al., “Fully Automated Breast Density Segmentation and Classification Using 
Deep  Learning,”  Diagnostics,  vol.  10,  no.  11,  p.  988,  Nov.  2020,  doi: 
10.3390/diagnostics10110988. 

[213]  T. Shen, C. Gou, F.-Y. Wang, Z. He, and W. Chen, “Learning from adversarial medical 
images for X-ray breast mass segmentation,” Comput. Methods Programs Biomed., vol. 180, 
p. 105012, Oct. 2019, doi: 10.1016/j.cmpb.2019.105012. 

[214]  L. Han, Y. Lyu, C. Peng, and S. K. Zhou, “GAN-based disentanglement learning for chest 
X-ray  rib  suppression,”  Med.  Image  Anal.,  vol.  77,  p.  102369,  Apr.  2022,  doi: 
10.1016/j.media.2022.102369. 

[215]  S.  Sundaram  and  N.  Hulkund,  “GAN-based  Data  Augmentation  for  Chest  X-ray 

Classification.” arXiv, Jul. 06, 2021. doi: 10.48550/arXiv.2107.02970. 

[216]  H. Wang, H. Gu, P. Qin, and J. Wang, “U-shaped GAN for Semi-Supervised Learning and 
Unsupervised  Domain  Adaptation  in  High  Resolution  Chest  Radiograph  Segmentation,” 
Front.  Med.,  vol.  8,  2022,  Accessed:  Apr.  16,  2023. 
[Online].  Available: 
https://www.frontiersin.org/articles/10.3389/fmed.2021.782664 

[217]  G.  Rani,  A.  Misra,  V.  S.  Dhaka,  E.  Zumpano,  and  E.  Vocaturo,  “Spatial  feature  and 
resolution maximization GAN for bone suppression in chest radiographs,” Comput. Methods 
Programs Biomed., vol. 224, p. 107024, Sep. 2022, doi: 10.1016/j.cmpb.2022.107024. 

[218]  I.  Guha  et  al.,  “Deep  learning  based  high-resolution  reconstruction  of  trabecular  bone 
microstructures from low-resolution  CT scans using GAN-CIRCLE,” in  Medical Imaging 
2020: Biomedical Applications in Molecular, Structural, and Functional Imaging, SPIE, Feb. 
2020, pp. 204–214. doi: 10.1117/12.2549318. 

[219]  V.  Sandfort,  K.  Yan,  P.  J.  Pickhardt,  and  R.  M.  Summers,  “Data  augmentation  using 
generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation 
tasks,” Sci. Rep., vol. 9, no. 1, Art. no. 1, Nov. 2019, doi: 10.1038/s41598-019-52737-x. 

113 

 
[220]  Z. Shi, Q. Hu, Y. Yue, Z. Wang, O. M. S. AL-Othmani, and H. Li, “Automatic Nodule 
Segmentation  Method  for  CT  Images  Using  Aggregation-U-Net  Generative  Adversarial 
Networks,” Sens. Imaging, vol. 21, no. 1, p. 39, Jul. 2020, doi: 10.1007/s11220-020-00304-
4. 

[221]  Z. Xu et al., “Tunable CT Lung Nodule Synthesis Conditioned on Background Image and 
Semantic Features,” in Simulation and Synthesis in Medical Imaging, N. Burgos, A. Gooya, 
and D. Svoboda, Eds., in Lecture Notes in Computer Science. Cham: Springer International 
Publishing, 2019, pp. 62–70. doi: 10.1007/978-3-030-32778-1_7. 

[222]  Y. Luo et al., “Adaptive rectification based adversarial network with spectrum constraint 
for high-quality PET image synthesis,” Med. Image Anal., vol. 77, p. 102335, Apr. 2022, doi: 
10.1016/j.media.2021.102335. 

[223]  J. Islam and Y. Zhang, “GAN-based synthetic brain PET image generation,” Brain Inform., 

vol. 7, no. 1, p. 3, Mar. 2020, doi: 10.1186/s40708-020-00104-2. 

[224]  J.  Liang  et  al.,  “Sketch  guided  and  progressive  growing  GAN  for  realistic  and  editable 
ultrasound  image  synthesis,”  Med.  Image  Anal.,  vol.  79,  p.  102461,  Jul.  2022,  doi: 
10.1016/j.media.2022.102461. 

[225]  D. Zhai, B. Hu, X. Gong, H. Zou, and J. Luo, “ASS-GAN: Asymmetric semi-supervised 
GAN for breast ultrasound image segmentation,” Neurocomputing, vol. 493, pp. 204–216, 
Jul. 2022, doi: 10.1016/j.neucom.2022.04.021. 

[226]  T.  Pang,  J.  H.  D.  Wong,  W.  L.  Ng,  and  C.  S.  Chan,  “Semi-supervised  GAN-based 
Radiomics  Model  for  Data  Augmentation  in  Breast  Ultrasound  Mass  Classification,” 
Comput.  Methods  Programs  Biomed.,  vol.  203,  p.  106018,  May  2021,  doi: 
10.1016/j.cmpb.2021.106018. 

[227]  Y. Hang, “Thyroid Nodule Classification in Ultrasound Images by Fusion of Conventional 
Features and Res-GAN Deep Features,” J. Healthc. Eng., vol. 2021, p. e9917538, Jul. 2021, 
doi: 10.1155/2021/9917538. 

[228]  A. Z. Alsinan, C. Rule, M. Vives, V. M. Patel, and I. Hacihaliloglu, “GAN-Based Realistic 
Bone Ultrasound Image and Label Synthesis for Improved Segmentation,” in Medical Image 
Computing  and  Computer  Assisted  Intervention  –  MICCAI  2020,  A.  L.  Martel,  P. 
Abolmaesumi, D. Stoyanov, D. Mateus, M. A. Zuluaga, S. K. Zhou, D. Racoceanu, and L. 
Joskowicz,  Eds.,  in  Lecture  Notes  in  Computer  Science.  Cham:  Springer  International 
Publishing, 2020, pp. 795–804. doi: 10.1007/978-3-030-59725-2_77. 

[229]  A. Shokraei Fard, D. C. Reutens, and V. Vegh, “From CNNs to GANs for cross-modality 
medical  image  estimation,”  Comput.  Biol.  Med.,  vol.  146,  p.  105556,  Jul.  2022,  doi: 
10.1016/j.compbiomed.2022.105556. 

[230]  S.  Hu,  Y.  Shen,  S.  Wang,  and  B.  Lei,  “Brain  MR  to  PET  Synthesis  via  Bidirectional 
Generative  Adversarial  Network,”  in  Medical  Image  Computing  and  Computer  Assisted 
Intervention – MICCAI 2020, A. L. Martel, P. Abolmaesumi, D. Stoyanov, D. Mateus, M. A. 

114 

 
Zuluaga, S. K. Zhou, D. Racoceanu, and L. Joskowicz, Eds., in Lecture Notes in Computer 
Science. Cham: Springer International Publishing, 2020, pp. 698–707. doi: 10.1007/978-3-
030-59713-9_67. 

[231]  A. Sikka, Skand, J. S. Virk, and D. R. Bathula, “MRI to PET Cross-Modality Translation 
using  Globally  and  Locally  Aware  GAN  (GLA-GAN)  for  Multi-Modal  Diagnosis  of 
Alzheimer’s Disease.” arXiv, Aug. 04, 2021. Accessed: Apr. 16, 2023. [Online]. Available: 
http://arxiv.org/abs/2108.02160 

[232]  F. Bazangani, F. J. P. Richard, B. Ghattas, and E. Guedj, “FDG-PET to T1 Weighted MRI 
Translation with 3D Elicit Generative Adversarial Network (E-GAN),” Sensors, vol. 22, no. 
12, Art. no. 12, Jan. 2022, doi: 10.3390/s22124640. 

[233]  X.  Gao,  F.  Shi,  D.  Shen,  and  M.  Liu,  “Task-Induced  Pyramid  and  Attention  GAN  for 
Multimodal  Brain  Image  Imputation  and  Classification  in  Alzheimer’s  Disease,”  IEEE  J. 
Jan.  2022,  doi: 
Inform.,  vol.  26,  no.  1,  pp.  36–43, 
Biomed.  Health 
10.1109/JBHI.2021.3097721. 

[234]  H.-C.  Shin  et  al.,  “GANBERT:  Generative  Adversarial  Networks  with  Bidirectional 
Encoder  Representations  from  Transformers  for  MRI  to  PET  synthesis.”  arXiv,  Aug.  10, 
2020. doi: 10.48550/arXiv.2008.04393. 

[235]  M. Hammami, D. Friboulet, and R. Kechichian, “Cycle GAN-Based Data Augmentation 
For Multi-Organ Detection In CT Images Via Yolo,” in 2020 IEEE International Conference 
on 
doi: 
10.1109/ICIP40778.2020.9191127. 

Processing 

390–393. 

(ICIP), 

Image 

2020, 

Oct. 

pp. 

[236]  R. Oulbacha and S. Kadoury, “MRI to CT Synthesis of the Lumbar Spine from a Pseudo-
3D Cycle GAN,” in 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), 
Apr. 2020, pp. 1784–1787. doi: 10.1109/ISBI45749.2020.9098421. 

[237]  J. M. Wolterink, A. M. Dinkla, M. H. F. Savenije, P. R. Seevinck, C. A. T. van den Berg, 
and I. Išgum, “Deep MR to CT Synthesis Using Unpaired Data,” in Simulation and Synthesis 
in Medical Imaging, S. A. Tsaftaris, A. Gooya, A. F. Frangi, and J. L. Prince, Eds., in Lecture 
Notes in Computer Science. Cham: Springer International Publishing, 2017, pp. 14–23. doi: 
10.1007/978-3-319-68127-6_2. 

[238]  Y. Liu et al., “CT synthesis from MRI using multi-cycle GAN for head-and-neck radiation 
therapy,”  Comput.  Med.  Imaging  Graph.,  vol.  91,  p.  101953,  Jul.  2021,  doi: 
10.1016/j.compmedimag.2021.101953. 

[239]  A. Ranjan, D. Lalwani, and R. Misra, “GAN for synthesizing CT from T2-weighted MRI 
data towards MR-guided radiation treatment,” Magn. Reson. Mater. Phys. Biol. Med., vol. 
35, no. 3, pp. 449–457, Jun. 2022, doi: 10.1007/s10334-021-00974-5. 

[240]  A. Singh  et  al., “Automated nonlinear  registration of  coronary PET to CT angiography 
using  pseudo-CT  generated  from  PET  with  generative  adversarial  networks,”  J.  Nucl. 
Cardiol., Jun. 2022, doi: 10.1007/s12350-022-03010-8. 

115 

 
[241]  A.  Ben-Cohen  et  al.,  “Cross-modality  synthesis  from  CT  to  PET  using  FCN  and  GAN 
networks for improved automated lesion detection,” Eng. Appl. Artif. Intell., vol. 78, pp. 186–
194, Feb. 2019, doi: 10.1016/j.engappai.2018.11.013. 

[242]  K. Armanious, C. Jiang, S. Abdulatif, T. Küstner, S. Gatidis, and B. Yang, “Unsupervised 
Medical  Image  Translation  Using  Cycle-MedGAN,”  in  2019  27th  European  Signal 
Processing 
doi: 
10.23919/EUSIPCO.2019.8902799. 

(EUSIPCO), 

Conference 

2019, 

1–5. 

Sep. 

pp. 

[243]  S. E. Darling, S. L. Done, S. D. Friedman, and K. W. Feldman, “Frequency of intrathoracic 
injuries in children younger than 3 years with rib fractures,” Pediatr. Radiol., vol. 44, no. 10, 
pp. 1230–1236, Oct. 2014, doi: 10.1007/s00247-014-2988-y. 

[244]  C. W. Paine, O. Fakeye, C. W. Christian, and J. N. Wood, “Prevalence of Abuse Among 
Young Children With Rib Fractures: A Systematic Review,” Pediatr. Emerg. Care, vol. 35, 
no. 2, pp. 96–103, Feb. 2019, doi: 10.1097/PEC.0000000000000911. 

[245]  D.  F.  Merten,  M.  A.  Radkowski,  and  J.  C.  Leonidas,  “The  abused  child:  a  radiological 
reappraisal,”  Radiology,  vol.  146,  no.  2,  pp.  377–381,  Feb.  1983,  doi: 
10.1148/radiology.146.2.6849085. 

[246]  J. Burkow, G. Holste, J. Otjen, F. Perez, J. Junewick, and A. Alessio, “Avalanche decision 
schemes to improve pediatric rib fracture detection,” in Medical Imaging 2022: Computer-
Aided Diagnosis, SPIE, Apr. 2022, pp. 611–618. doi: 10.1117/12.2611013. 

[247]  A. Ghosh et al., “A Patch-Based Deep Learning Approach for Detecting Rib Fractures on 
Frontal Radiographs in Young Children,” J. Digit. Imaging, Mar. 2023, doi: 10.1007/s10278-
023-00793-1. 

[248]  D. Hayashi et al., “Automated detection of acute appendicular skeletal fractures in pediatric 
patients using deep learning,” Skeletal Radiol., vol. 51, no. 11, pp. 2129–2139, Nov. 2022, 
doi: 10.1007/s00256-022-04070-0. 

[249]  J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale 
hierarchical  image  database,”  in  2009  IEEE  Conference  on  Computer  Vision  and  Pattern 
Recognition, Jun. 2009, pp. 248–255. doi: 10.1109/CVPR.2009.5206848. 

[250]  P. Perez, M. Gangnet, and A. Blake, “Poisson image editing,” ACM Trans. Graph., vol. 

22, no. 3, pp. 313–318, Jul. 2003, doi: https://doi.org/10.1145/882262.882269. 

[251]  J.  Kim,  “Gradient  Domain  Fusion,”  2021.  https://www.andrew.cmu.edu/course/16-

726/projects/juyongk/proj2/ (accessed Jul. 05, 2023). 

[252]  G. Jocher, “YOLOv5 by Ultralytics.” May 2020. doi: 10.5281/zenodo.3908559. 

[253]  G. Jocher, “YOLOv3 in PyTorch.” May 2020. doi: 10.5281/zenodo.3908559. 

116 

 
[254]  C.-Y. Wang, H.-Y. M. Liao, I.-H. Yeh, Y.-H. Wu, P.-Y. Chen, and J.-W. Hsieh, “CSPNet: 
A New Backbone that can Enhance Learning Capability of CNN.” arXiv, Nov. 26, 2019. doi: 
10.48550/arXiv.1911.11929. 

[255]  T. Kanayama et al., “Gastric Cancer Detection from Endoscopic Images Using Synthesis 
by  GAN,”  in  Medical  Image  Computing  and  Computer  Assisted  Intervention  –  MICCAI 
2019, D. Shen, T. Liu, T. M. Peters, L. H. Staib, C. Essert, S. Zhou, P.-T. Yap, and A. Khan, 
Eds., in Lecture Notes in Computer Science. Cham: Springer International Publishing, 2019, 
pp. 530–538. doi: 10.1007/978-3-030-32254-0_59. 

[256]  C. Han et al., “Learning More with Less: Conditional PGGAN-based Data Augmentation 
for  Brain  Metastases  Detection  Using  Highly-Rough  Annotation  on  MR  Images,”  in 
Proceedings  of  the  28th  ACM  International  Conference  on  Information  and  Knowledge 
Management, in CIKM ’19. New York, NY, USA: Association for Computing Machinery, 
Nov. 2019, pp. 119–127. doi: 10.1145/3357384.3357890. 

[257]  I. Shumailov, Z. Shumaylov, Y. Zhao, Y. Gal, N. Papernot, and R. Anderson, “The Curse 
of Recursion: Training on Generated Data Makes Models Forget.” arXiv, May 31, 2023. doi: 
10.48550/arXiv.2305.17493. 

[258]  “DALL·E 2.” https://openai.com/dall-e-2 (accessed Jul. 20, 2023). 

[259]  “Midjourney,” 

Midjourney. 

https://www.midjourney.com/home/?callbackUrl=%2Fapp%2F (accessed Jul. 20, 2023). 

[260]  “Stable  Diffusion  Public  Release,”  Stability  AI.  https://stability.ai/blog/stable-diffusion-

public-release (accessed Jul. 20, 2023). 

[261]  L.  Yang  et  al.,  “Diffusion  Models:  A  Comprehensive  Survey  of  Methods  and 

Applications.” arXiv, Mar. 23, 2023. doi: 10.48550/arXiv.2209.00796. 

[262]  Z. Wang, H. Zheng, P. He, W. Chen, and M. Zhou, “Diffusion-GAN: Training GANs with 

Diffusion.” arXiv, Oct. 08, 2022. doi: 10.48550/arXiv.2206.02262. 

[263]  P.  Dhariwal  and  A.  Nichol,  “Diffusion  Models  Beat  GANs  on  Image  Synthesis,”  in 
Advances  in  Neural  Information  Processing  Systems,  Curran  Associates,  Inc.,  2021,  pp. 
8780–8794. 
Available: 
20, 
https://proceedings.neurips.cc/paper_files/paper/2021/hash/49ad23d1ec9fa4bd8d77d02681
df5cfa-Abstract.html 

Accessed: 

[Online]. 

2023. 

Jul. 

[264]  M. Stypułkowski, K. Vougioukas, S. He, M. Zięba, S. Petridis, and M. Pantic, “Diffused 
Heads:  Diffusion Models Beat  GANs on Talking-Face Generation.” arXiv,  Jan. 06, 2023. 
doi: 10.48550/arXiv.2301.03396. 

[265]  G. Müller-Franzes et al., “Diffusion Probabilistic Models beat GANs on Medical Images.” 

arXiv, Dec. 14, 2022. doi: 10.48550/arXiv.2212.07501. 

117 

 
[266]  M. U. Akbar, M. Larsson, and A. Eklund, “Brain tumor segmentation using synthetic MR 
images  --  A  comparison  of  GANs  and  diffusion  models.”  arXiv,  Jun.  05,  2023.  doi: 
10.48550/arXiv.2306.02986. 

118 

 
 
 
 
 APPENDIX: DATA, CODE, AND SUPPLEMENTAL INFORMATION 

Data Repositories 

dCTP Myocardial Perfusion Studies 

Raw PET and CT scans, myocardial  time attenuation curves, coronary CT angiography 

data, 

and 

patient 

demographs/risk 

factors 

can 

be 

found 

here: 

https://doi.org/10.7910/DVN/VUP5TC. 

Implantable TaOx Polymeric Biomedical Devices 

Raw µCT images of scaffolds are available upon request. 

Pediatric Chest Radiographs 

Due to the legal requirements, original radiographs is not available. 

Code Repositories 

dCTP Myocardial Perfusion Studies 

Matlab code to generate perfusion estimation, as well as the trained models can be found 

here: https://github.com/tuethan/Machine-Learned-CT-Perfusion-Estimation. 

Matlab code to generate SFFR-CTPA scores and to assess diagnostic accuracy can be found 

here: https://github.com/tuethan/FFR-CTPA-Diagnostic-Accuracy. 

Implantable TaOx Polymeric Biomedical Devices 

Matlab  code  for  segmentation  and  for  metric  calculations  can  be  found  here: 

https://github.com/tuethan/TaOx-Scaffold-Segmentation. 

Pediatric Chest Radiographs 

Matlab  code  to  generate  near-pairs,  to  generate  full  synthetic  radiographs,  to  train 

normal/NPP/FID-NPP  cycleGANs,  as  well  as  the  trained  models  are  available  here: 

https://github.com/tuethan/Pediatric-Chest-Radiograph-Data-Augmentation. Python code to train 

119 

 
 
the YOLOv5 object detector can be found in the same repository. 

120