COMPARISON OF MULTIVARIATE STATISTICAL MODELS FOR CLASSIFICATION OF 

FENTANYL ANALOGS 

 

By  

Amber Gerheart 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A THESIS  

Submitted to  

Michigan State University 

in partial fulfillment of the requirements  

for the degree of  

 

Forensic Science – Master of Science  

2020  

 

 

COMPARISON OF MULTIVARIATE STATISTICAL MODELS FOR CLASSIFICATION OF 

ABSTRACT 

 

FENTANYL ANALOGS 

 

By 

Amber Gerheart 

 

Novel psychoactive substances (NPS) have been a challenge in forensic laboratories in 

the United States. Typical analysis of controlled substances is by gas chromatography-mass 

spectrometry (GC-MS), in which the GC retention time and mass spectrum are compared to a 

reference standard to make an identification. With the emergence of NPS compounds, reference 

standards for new compounds may not be readily available. Multivariate statistical methods have 

been investigated to classify NPS compounds.  

This work explored linear discriminant analysis (LDA) and soft independent modelling 

of class analogies (SIMCA) as methods to classify fentanyl analogs according to structural 

subclass. Four fentanyl subclasses were investigated and were categorized by the location of the 

substituent on the core fentanyl structure. Three factors were investigated to improve the 

robustness of the LDA and SIMCA models: variation within a chromatographic peak, instrument 

variation, and the application of neutral loss data. Overall, the LDA models performed with a 

100% successful classification rate for mass spectral data and a 100% successful classification 

rate for neutral loss data. The SIMCA models performed with a 91% successful classification 

rate for mass spectral data and an 87% successful classification rate for neutral loss data. Both 

models were compared to highlight benefits and limitations to each classification method. This 

work supports the application of multivariate statistical models in forensic laboratories to obtain 

structural information when reference materials are not available.

 

ACKNOWLEDGEMENTS 

 
 

I would first like to acknowledge my advisor, Dr. Ruth Smith, for guiding me through 

graduate school. I appreciate all of the advice and knowledge I have gained from working under 

her the last two years. I am confident I have been prepared for a life-long career in forensic 

science credit to the information she has provided during my time at Michigan State University. 

 

I would like to acknowledge Michigan State University College of Social Science Faculty 

Initiatives Fund and the Michigan State Forensic Science Program for funding for this work, as 

well as funding to present this research. Additionally, I would like to acknowledge the MSU 

RTSF Mass Spectrometry & Metabolomics Core for instrument access and Hannah Clause for 

assistance in sample collection. Supplemental data for this work were provided by Amanda 

Setser, Emma Stuhmer, and Kimberly Venuk. I would like to acknowledge Sergey 

Kucheryavskiy for assistance with data analysis. I would also like to acknowledge my thesis 

committee: Dr. Ruth Smith, Dr. Victoria McGuffin, Dr. Charles Corley, and Kimberly Venuk, 

for helping me achieve one of my greatest goals. 

I would also like to thank my family, friends, and other forensic science colleagues: 

Otyllia Abraham, Rebecca Boyea, and Briana Capistran. Above all, I want to thank my mom. 

She has been my constant support and I can never express how grateful I am to have a 

mom/friend/therapist/advocate/idol like her. 

 

I have truly never understood the meaning of this quote more than I do now: 

“If you aren’t in over your head, how do you know how tall you are?” – T.S. Eliot 

iii 

TABLE OF CONTENTS 

 
 
LIST OF TABLES ........................................................................................................................ vii 

 
LIST OF FIGURES ...................................................................................................................... iix 

 
1. INTRODUCTION ...................................................................................................................... 1 

1.1 FENTANYL ANALOGS ..................................................................................................... 1 

1.2 FORENSIC ANALYSIS OF SEIZED DRUGS ................................................................... 2 

1.2.1 SWGDRUG Recommendations for Analysis ............................................................... 2 

1.2.2 Gas Chromatography-Mass Spectrometry .................................................................... 3 

1.2.3 Current Challenges in Seized Drug Analysis................................................................ 4 

1.3 ADDRESSING NPS IDENTIFICATION CHALLENGES IN FORENSIC 

LABORATORIES ................................................................................................................... 4 

1.3.1 Instrumental Methods for NPS Identification and Differentiation ............................... 4 

1.3.2 Multivariate Statistical Methods for NPS Identification and Differentiation ............... 6 

1.3.2.1 Principal Components Analysis .............................................................................. 8 

1.3.2.2 Linear Discriminant Analysis ............................................................................... 11 

1.3.2.3 Soft Independent Modelling of Class Analogies .................................................. 14 

1.3.3 Neutral Losses as an Alternative to Mass Spectra ................................................... 18 

1.4 RESEARCH OBJECTIVE ................................................................................................. 20 

REFERENCES ............................................................................................................................ 21 

 
2. MATERIALS AND METHODS .............................................................................................. 25 

2.1 FENTANYL ANALOG REFERENCE MATERIALS ..................................................... 25 

2.2 GAS CHROMATOGRAPHY-MASS SPECTROMETRY (GC-MS) ANALYSIS .......... 26 

2.3 DATA ANALYSIS ............................................................................................................ 27 

2.3.1 Neutral Loss Spectra Development ............................................................................ 27 

2.5 STATISTICAL MODELLING .......................................................................................... 28 

2.5.1 Principal Components Analysis (PCA) ...................................................................... 31 

2.5.2 Linear Discriminant Analysis (LDA) ......................................................................... 32 

2.5.3 Soft Independent Modelling of Class Analogies (SIMCA) ........................................ 32 

APPENDIX ................................................................................................................................. 34 

REFERENCES ............................................................................................................................ 53 

iv 

 
3. LINEAR DISCRIMINANT ANALYSIS (LDA) FOR CLASSIFICATION OF FENTANYL 
ANALOGS ACCORDING TO STRUCTURAL SUBCLASS .................................................... 55 

3.1 MASS SPECTRAL ANALYSIS OF FENTANYL ANALOGS ....................................... 55 

3.2 INITIAL LINEAR DISCRIMINANT ANALYSIS (LDA) MODELS TO ASSESS 

VARIATION WITHIN A CHROMATOGRAPHIC PEAK ................................................ 59 

3.2.1 Principal Components Analysis (PCA) for Variable Selection .................................. 60 

3.2.2 Linear Discriminant Analysis (LDA) Models ............................................................ 69 

3.3 REFINED LINEAR DISCRIMINANT ANALYSIS (LDA) MODEL TO 

INCORPORATE INSTRUMENT VARIATION ................................................................. 77 

3.3.1 Refined LDA Model for Classification of Fentanyl Analogs ..................................... 78 

3.3.2 Additional Test Sets to Validate the Linear Discriminant Analysis (LDA) Model .... 84 

3.4 APPLICATION OF NEUTRAL LOSS SPECTRA TO REFINE THE LINEAR 

DISCRIMINANT ANALYSIS (LDA) MODEL .................................................................. 90 

3.4.1 Neutral Loss Spectra of Fentanyl Analogs ................................................................. 90 

3.4.2 Application of Linear Discriminant Analysis (LDA) to Neutral Loss Spectra for 

Classification of Fentanyl Analogs ............................................................................. 94 

3.5 SUMMARY OF LINEAR DISCRIMINANT ANALYSIS (LDA) MODELS ............... 106 

APPENDIX ............................................................................................................................... 108 

REFERENCES .......................................................................................................................... 129 

 
4. SOFT-INDEPENDENT MODELLING OF CLASS ANALOGIES (SIMCA) FOR 
CLASSIFICATION OF FENTANYL ANALOGS ACCORDING TO STRUCTURAL 
SUBCLASS ................................................................................................................................ 132 

4.1 INITIAL SOFT INDEPENDENT MODELLING OF CLASS ANALOGIES (SIMCA) 

MODELS TO ASSESS VARIATION WITHIN A CHROMATOGRAPHIC PEAK ....... 132 

4.2 REFINED SOFT INDEPENDENT MODELLING OF CLASS ANALOGIES (SIMCA) 

MODEL TO INCORPORATE INSTRUMENT VARIATION ......................................... 141 

4.2.1 Additional Test Sets to Validate the Classification Models ..................................... 147 

4.3 APPLICATION OF NEUTRAL LOSS SPECTRA FOR CLASSIFICATION OF 

FENTANYL ANALOGS .................................................................................................... 147 

4.4 SUMMARY OF SOFT INDEPENDENT MODELLING OF CLASS ANALOGIES 

(SIMCA) MODELS ............................................................................................................ 153 

APPENDIX ............................................................................................................................... 155 

REFERENCES .......................................................................................................................... 164 

 
5. CONCLUSIONS AND FUTURE WORK ............................................................................. 166 

v 

5.1 CONCLUSIONS .............................................................................................................. 166 

5.2 FUTURE WORK ............................................................................................................. 168 

REFERENCES .......................................................................................................................... 171 

 

 

 

 

vi 

LIST OF TABLES 

 
 
Table 2.1 Fentanyl analogs used in this work, separated by structural subclass .......................... 26 
 
Table 2.2 Training set for the initial models (all analog spectra in n = 2) ................................... 29 
 
Table 2.3 Test set for the initial models, replicate spectra indicated ........................................... 30 
 
Table 2.4 Training set for the refined models and neutral loss models (all analog spectra in ..... 31 
 
Table 2.5 Test set for the refined models and neutral loss models (all analog spectra in n = 4) . 31 
 
Table A2.1 PCA R Code4 ............................................................................................................. 49 
 
Table A2.2 LDA R Code4 ............................................................................................................ 50 
 
Table A2.3 SIMCA R Code ......................................................................................................... 51 
 
Table 3.1 Variables retained for LDA based on a relative loadings threshold of 2% .................. 68 
 
Table 3.2 Variables retained for the refined LDA model, as determined by PCA ...................... 81 
 
 Table 3.3 List of non-fentanyl NPS compounds in the external test set ..................................... 84 
 
Table 3.4 Variables retained for neutral loss LDA model, as determined by the 3.5% threshold of 
the PCA data ............................................................................................................................... 100 
 
Table A3.1 Chemical names of non-fentanyl NPS compounds ................................................. 122 
 
Table 4.1 Conditions for each subclass in SIMCA for both apex and average models ............. 133 
 
Table 4.2 Variables contributing most to the AG subclass ........................................................ 139 
 
Table 4.3 Conditions for each subclass in refined SIMCA model ............................................. 141 
 
Table 4.4 Comparison of variables contributing most the initial and refined AG subclass SIMCA 
models ......................................................................................................................................... 144 
 
Table 4.5 Conditions for each subclass in the neutral loss SIMCA model ................................ 148 
 
Table A4.1 Variables contributing most to the AA subclass in the apex initial SIMCA model 156 
 
Table A4.2 Variables contributing most to the AA subclass in the apex initial SIMCA model 158 
 

vii 

Table A4.3 Variables contributing most to the AN subclass in the apex initial SIMCA model 159 
 

 

viii 

LIST OF FIGURES 

 
 
Figure 1.1 Core structure of fentanyl with substitution sites indicated ......................................... 1 
 
Figure 1.2 Intensity plot of m/z 189 versus m/z 146 ...................................................................... 9 
 
Figure 1.3 Example of a PCA scores plot of PC1 vs PC2 ........................................................... 10 
 
Figure 1.4 Example of a loadings plot ......................................................................................... 11 
 
Figure 1.5 Example LDA scores plot of LD1 vs LD2 ................................................................. 13 
 
Figure 1.6 Example of A) a SIMCA class with only one PC retained, and B) residuals plot for 
the class ......................................................................................................................................... 16 
 
Figure 1.7 Example Cooman’s plot from a SIMCA model ......................................................... 18 
 
Figure A2.1 Structures of all fentanyl analogs used in this work ................................................ 44 
 
Figure 3.1 Initial cleavage sites of fentanyl analogs A) cleavage of the amide group, B) cleavage 
on the piperidine ring, C) cleavage of the n-alkyl chain ............................................................... 56 
 
Figure 3.2 Mass spectra and chemical structures of selected fentanyl analogs A) thiofentanyl 
representing the AN subclass, B) ortho-methylfentanyl representing the AR subclass, C) 
cyclopropyl fentanyl representing the AG subclass, and D) para-fluorobutyrylfentanyl 
representing the AA subclass. ....................................................................................................... 58 
 
Figure 3.3 PCA scores plots of A) principal component 1 (PC1) vs principal component 2 
(PC2), B) PC1 vs principal component 3 (PC3), and C) PC1 vs principal component 4 (PC4) .. 61 
 
Figure 3.4 Loadings plot for A) principal component 1 (PC1), B) principal component 2 (PC2), 
and C) principal component 4 (PC4) ............................................................................................ 64 
 
Figure 3.5 Predicted structure of fragment ion at m/z 207 ........................................................... 67 
 
Figure 3.6 Scores plots for the apex data A) linear discriminant 1 (LD1) vs linear discriminant 2 
(LD2), B) LD1 vs linear discriminant 3 (LD3), and scores plots for the average data C) LD1 vs 
LD2, D) LD1 vs LD3 .................................................................................................................... 70 
 
Figure 3.7 Coefficients of A) linear discriminant 1 (LD1), B) linear discriminant 2 (LD2), and 
C) linear discriminant 3 (LD3) ..................................................................................................... 72 
 
Figure 3.8 Predicted fragments of A) para-fluorofentanyl from the AR subclass and B) para-
fluorobutyryl fentanyl from the AA subclass ............................................................................... 76 

ix 

 
Figure 3.9 Principal components analysis scores plot of A) PC1 vs PC2, B) PC1 vs PC3, C) PC1 
vs PC4 ........................................................................................................................................... 79 
 
Figure 3.10 Scores plot for the refined LDA model A) LD1 vs LD2, B) LD1 vs LD3 ............... 83 
 
Figure 3.11 Scores plot for the refined LDA model A) LD1 vs LD2, B) enlarged LD1 vs LD2, 
C) LD1 vs LD3, D) enlarged LD1 vs LD3 ................................................................................... 86 
 
Figure 3.12 Representative spectrum of A) 2-EMC and B) 2-FMA ........................................... 87 
 
Figure 3.13 Structures and spectra of case samples for A) carfentanil, B) methoxy acetyl 
fentanyl, C) furanyl fentanyl, D) valeryl fentanyl, E) acetyl fentanyl, F) 3’-methylfentanyl ...... 88 
 

Figure 3.14 Mass spectrum with common neutral losses highlighted for A) ortho-methylfentanyl 
and B) para-methoxy fentanyl ...................................................................................................... 92 
 
Figure 3.15 Neutral loss spectra and chemical structures of selected fentanyls A) thiofentanyl 
representing the AN subclass, B) ortho-methylfentanyl representing the AR subclass, C) 
cyclopropyl fentanyl representing the AG subclass, and D) para-fluorobutyrylfentanyl 
representing the AA subclass. ....................................................................................................... 93 
 
Figure 3.16 PCA scores plot for neutral loss LDA model A) PC1 vs PC2, B) PC1 vs PC3, C) 
PC1 vs PC4 ................................................................................................................................... 95 
 
Figure 3.17 Neutral loss PCA loadings plots for A) PC1, B) PC2, C) PC3, and D) PC4 ........... 98 
 
Figure 3.18 Scores plot for neutral loss LDA model A) LD1 vs LD2 and B) LD1 vs LD3 ...... 101 
 
Figure 3.19 Coefficients for neutral loss LDA model A) LD1, B) LD2, and C) LD3 .............. 104 
 
Figure A3.1 Mass spectra of all fentanyl analogs ...................................................................... 109 
 
Figure A3.2 Initial average model PCA scores plots for A) PC1 vs PC2, B) PC1 vs PC3, and C) 
PC1 vs PC4 ................................................................................................................................. 114 
 
Figure A3.3 Initial average model PCA loadings plots for A) PC1, B) PC2, C) PC3, and D) PC4
..................................................................................................................................................... 116 
 
Figure A3.4 Initial average LDA model coefficients of A) LD1 and B) LD3 .......................... 118 
 
Figure A3.5 Refined PCA loadings plots for A) PC1, B) PC2, C) PC3, and D) PC4 ............... 119 
 
Figure A3.6 Refined LDA model coefficients of A) LD1 and B) LD3 ..................................... 121 
 
Figure A3.7 Neutral loss spectra of all fentanyl analogs ........................................................... 124 

x 

 
Figure 4.1 Residuals plot for the AR subclass from the apex model ......................................... 135 
 
Figure 4.2 Cooman’s plots for the apex model A) amide and aniline ring (AA) subclass vs amide 
group (AG) subclass, B) AA vs aniline ring (AR) subclass, C) AA vs n-alkyl chain (AN) 
subclass, and Cooman’s plots for the average model D) AA vs AG, E) AA vs AR, F) AA vs AN
..................................................................................................................................................... 137 
 
Figure 4.3 Modelling power plot for the AG subclass from the apex model ............................ 139 
 
Figure 4.4 Modelling power plots for the AG subclass SIMCA model A) with instrument 
variation incorporated, B) without instrument variation incorporated ....................................... 143 
 
Figure 4.5 Residuals plot for the AR subclass in the refined SIMCA model ............................ 145 
 
Figure 4.6 Residuals plot for the AG subclass in the refined SIMCA model ............................ 146 
 
Figure 4.7 Cooman’s plots for the A) AA subclass vs AG subclass, B) AA subclass vs AR 
subclass, and C) AA subclass vs AN subclass ........................................................................... 150 
 
Figure 4.8 Residuals plot for the AG subclass in the neutral loss SIMCA model ..................... 153 
 
Figure A4.1 Modelling power plot for the AA subclass from the apex initial SIMCA model.. 156 
 
Figure A4.2 Modelling power plot for the AR subclass from the apex initial SIMCA model .. 157 
 
Figure A4.3 Modelling power plot for the AN subclass from the initial apex SIMCA model.. 159 
 
Figure A4.4 Modelling power plots from the refined SIMCA model for the A) AA subclass, B) 
AR subclass, and C) AN subclass............................................................................................... 160 
 
Figure A4.5 Modelling power plots from the neutral loss SIMCA models for the A) AA 
subclass, B) AR subclass, and C) AN subclass .......................................................................... 162 

xi 

1. INTRODUCTION 

1.1 FENTANYL ANALOGS  

Novel psychoactive substances (NPS) are drugs synthesized to circumvent legal 

ramifications.1 Synthetic opioids are a type of NPS which are man-made drugs developed to 

mimic the pharmacological and analgesic effects of opiates. As overdose deaths in the United 

States rise, synthetic opioids accounted for 67% (>31,000) of opioid deaths in 2018. According 

to the Center for Disease Control, fentanyl was involved in the majority of overdose cases.2 

Fentanyl is a synthetic opioid that is 50-100 times stronger than morphine and is used 

medicinally to treat severe pain.3 Fentanyl analogs are synthesized to increase the potency and 

pharmacological effects of fentanyl. An analog is a substance that shares a core structure with a 

pre-existing drug but has a structural substitution.4 Figure 1.1 shows the core structure of 

fentanyl with the locations indicated to show where structural substitutions can be made.5  

 

Figure 1.1 Core structure of fentanyl with substitution sites indicated  

1 

n-Alkyl ChainPiperidineRingAmide GroupAniline RingThe Controlled Substances Act (CSA), enforced by the U.S. Drug Enforcement 

Administration (DEA), categorizes controlled substances into five schedules (Schedules I-V). 

Schedule I including substances with no accepted medical use and high potential for abuse and 

Schedule V including substances with accepted medical use and low potential for abuse.6 Due to 

its medicinal applications and high potential for abuse, fentanyl is listed as a Schedule II 

substance in the CSA. In February 2018, as a method to reduce overdose rates in the United 

States, the DEA created an emergency scheduling order to schedule all illicit fentanyl analogs as 

Schedule I substances because they have no approved medical use.7 Fentanyl analogs could not 

be listed by name in the order, instead they were referred to as ‘any fentanyl-related substance’, 

due to their ever-changing structures. 

1.2 FORENSIC ANALYSIS OF SEIZED DRUGS 

1.2.1 SWGDRUG Recommendations for Analysis 

The Scientific Working Group for the Analysis of Seized Drugs (SWGDRUG) is an 

international working group that aims to improve the quality of seized drug analysis and 

establish standards for forensic laboratories to follow.8 In 2019, SWGDRUG published the most 

recent guidelines detailing minimum recommendations for the forensic identification of seized 

drugs. These recommendations require the use of multiple uncorrelated techniques within an 

analytical scheme, with various techniques ranked into three categories (A, B and C) based on 

discriminating power and the structural information the technique provides. Category A 

techniques are the most discriminatory and provide selectivity through structural information. 

Category B techniques provide a lower level of selectivity through chemical or physical 

characteristics, while Category C techniques are the least discriminatory and provide only 

2 

general or class information. When a category A technique is used, only one additional technique 

must be used to verify identification: either A, B, or C.8  

The gold standard for the analysis of seized drugs in forensic laboratories is gas 

chromatography-mass spectrometry (GC-MS). Analysis by GC-MS fulfills SWGDRUG 

recommendations for identification, as mass spectrometry is a category A technique and gas 

chromatography is a category B technique.8  

1.2.2 Gas Chromatography-Mass Spectrometry 

Gas chromatography-mass spectrometry is a two-part technique providing separation and 

identification of organic molecules. Gas chromatography (GC) is a separation technique that uses 

varying temperatures, along with the interaction between the mobile and stationary phases, to 

separate analytes in a complex mixture. Due to the thermal separation, a high volatility is 

required of all analytes and they are commonly in the liquid phase at room temperature. Once 

mixtures are injected, analytes are moved through the column by a carrier gas (e.g., He, H2, N2, 

or Ar) and are separated based on volatility and interaction with the mobile and stationary 

phases. Analytes reach the detector at varying rates, with smaller, more volatile analytes moving 

more quickly through the column than larger, less volatile analytes. The time it takes for an 

analyte to move through the column and reach the detector is called the retention time.9 

The analyte is transferred from the GC to the MS with a transfer line. The mass 

spectrometer functions as the detector in GC-MS and consists of three main parts: an ion source, 

mass analyzer, and ion detector. In electron ionization (EI), high-energy electrons (typically 70 

eV) collide with gas-phase analyte molecules, resulting in reproducible fragmentation. The 

quadrupole mass analyzer consists of four parallel metal rods with oscillating radio frequency 

voltage and DC voltage. It separates ions according to the m/z ratios, allowing only ions of a 

3 

certain m/z value to reach the detector. A range of m/z values can be scanned through by varying 

the applied voltage. The detection of ions, commonly using an electron multiplier tube, results in 

a mass spectrum consisting of the various m/z ratios detected and the intensity at which they 

were detected.9 Due to the reproducible nature of EI-MS, spectral libraries are widely available 

to aid in analyte identification. 

1.2.3 Current Challenges in Seized Drug Analysis 

To identify controlled substances, the GC retention time and mass spectrum of an analyte 

are compared to that of a reference standard analyzed under equivalent conditions. If the 

retention times are similar and the mass spectra show the same ions in a similar pattern of 

intensities, the unknown sample is identified. However, NPS analogs, such as fentanyl analogs, 

create challenges for forensic laboratories because reference standards may not be available for a 

specific analog. Availability of reference standards may be limited because a laboratory has not 

purchased newly synthesized analogs or there may not yet be a reference standard available for a 

particular analog. As differentiating between structural analogs is necessary for many 

laboratories to make an identification, methods to circumvent this challenge 

have been explored, such as new instrumental techniques and multivariate statistical methods. 

1.3 ADDRESSING NPS IDENTIFICATION CHALLENGES IN FORENSIC 

LABORATORIES 

1.3.1 Instrumental Methods for NPS Identification and Differentiation 

While GC-MS is the typical method of analysis, other instrumental techniques that can 

provide further discrimination between NPS compounds have been investigated. Nuclear 

magnetic resonance (NMR) spectroscopy, which is a Category A technique, has been 

investigated for NPS. Duffy et al. used low-field NMR spectroscopy to successfully differentiate 

4 

65 fentanyl analogs, while Bogun and Moore identified organic precursors in methamphetamine 

production.10,11 In both studies, smaller, benchtop NMR spectrometers were used, which is more 

practical for forensic applications. However, both studies analyzed pure standards rather than 

case samples, which are often mixtures of the controlled substances, cutting agents, and other 

additives. As mixtures are prominent in forensic casework samples, a prior separation step would 

be necessary before NMR analysis. 

Gas chromatography-vacuum ultraviolet (GC-VUV) spectroscopy has been explored as 

an alternative instrumental technique to the traditional GC-MS for NPS identification. 

Kranenburg et al. used GC-VUV to differentiate six sets of amphetamine isomers. 

Amphetamines have a high degree of conjugation, making them optimal compounds for VUV 

detection, as high conjugation compounds are the most UV-active.12 Of the compounds 

investigated, only 3,4-methylenedioxymethamphetamine (MDMA) and 3,4-

methylenedioxyamphetamine (MDA) were not differentiated using GC-VUV. However, these 

compounds were differentiated by GC-MS. As such, Kranenburg et al. proposed that GC-VUV 

be used in conjunction with GC-MS as a tool for NPS and isomer identification.12 Roberson and 

Goodpaster also used GC-VUV to differentiate eight structurally similar phenethylamines, and 

similarly they recommended GC-VUV be used as a complimentary technique to GC-MS.13 

Kranenburg et al. also explored modifying GC-MS to provide more specific structural 

information for cathinone and fluoroamphetamine isomers. Low-energy EI (15 eV) GC-MS 

provided more discriminating mass spectra for isomers.14 The disadvantage to this method was 

that low-energy EI could not be conducted on a conventional GC-MS instrument with a single 

quadrupole; instead, a GC-time of flight (TOF)-MS had to be used. As most forensic laboratories 

5 

do not have high-resolution mass spectrometers, the use of low-energy EI is not currently a 

viable option for all laboratories. 

1.3.2 Multivariate Statistical Methods for NPS Identification and Differentiation 

In addition to investigating different instrumentation, multivariate statistical methods 

have also been explored to differentiate structurally similar NPS compounds.15-20 Principal 

components analysis (PCA) and linear discriminant analysis (LDA) are two of the methods that 

have been investigated for classification of NPS compounds.17-19 Theory of these multivariate 

methods is discussed in Section 1.3.2.1 and 1.3.2.2, respectively. Setser and Waddell Smith used 

EI mass spectral data to classify phenethylamines and tryptamines using LDA.18 The mass 

spectra were collected at the apex of the chromatographic peak and one collection of the training 

set compounds was used to develop the LDA model. Two approaches were investigated to 

determine which variables (in this case, m/z values) should be retained and used in the LDA 

model. In the first approach, chemically significant m/z values present in the training set spectra 

were identified manually whereas, in the second approach, PCA was applied to identify the m/z 

values describing most variance in the training set data. The chemically informed LDA model 

had a 93% successful classification rate, while the LDA model developed using PCA for variable 

selection had an 86% successful classification rate. Setser and Waddell Smith determined that 

while both methods for variable selection produced comparable results, PCA was a more 

efficient way to identify chemically significant m/z values to use in LDA.18  

Other researchers have investigated more structurally similar compounds.13,19,20 Using EI-

MS data collected at the apex of the chromatographic peak, Bonetti developed LDA models to 

classify fluoromethcathinone (FMC) isomers and fluorofentanyl isomers.19 These isomers cannot 

be differentiated based only on visual comparison of mass spectra due to similarities between 

6 

their structures and resulting EI mass spectra. Bonetti analyzed the FMC and fluorofentanyl 

isomers on six different GC-MS instruments twice a day over a five-day period to incorporate 

instrument variation. Separate LDA models were developed for each set of isomers. Principal 

components analysis was used to select the variables applied in the LDA model development. 

The models were tested with blind samples, case samples, and diluted samples. The LDA models 

successfully classified all test samples except some of the diluted samples, which were too dilute 

and did not have representative spectra. This work demonstrated the application of PCA and 

LDA for the differentiation of FMC and fluorofentanyl isomers based on mass spectral data. 

Bonetti concluded that when instrument variation was incorporated into the development of the 

LDA models, case samples and unknowns did not need to be analyzed on the same instrument 

used to develop the model.19  

Davidson and Jackson differentiated 2,5-dimethoxy-N-(N-

methoxybenzyl)phenethylamine (NBOMe) isomers by applying canonical discriminant analysis 

(CDA) to mass spectral data.20 Canonical discriminant analysis is similar to LDA, which 

classifies unknown samples into an available class. Davidson and Jackson explored instrument 

variation by using data collected on two different GC-MS instruments, with three different GC 

columns. The analogs were prepared at three different concentrations and analyzed twice a week 

for one or two months. All data were obtained from the apex of the chromatographic peak. When 

data from all three instrument collections and varying concentrations were used, the model had a 

99.5% successful classification rate. When high abundance spectra collected on the same 

instrument were used to develop the model, a 99.9% successful classification rate was reported.20 

Bonetti and Davidson and Jackson highlighted the need to incorporate variation in the 

development of multivariate statistical classification methods.19,20 These authors investigated 

7 

classifying isomers using LDA, or statistically similar CDA. Isomers share the same core 

structure with the same substituent at varying positions. Other analogs, compounds that share a 

core structure but with varying substituents, of the same drug class have not been investigated as 

closely.  

In other areas of forensic science, additional multivariate statistical methods have been 

explored. One method that has been used to differentiate and classify forensic samples is soft 

independent modelling of class analogies (SIMCA). This method has been applied in gunshot 

residue analysis, soil analysis, blood stain analysis, and fire debris analysis.21-24 Successful 

classification using SIMCA in other forensic disciplines supports the idea that it could be used to 

differentiate NPS compounds. See Section 1.3.2.3 for more information about SIMCA. 

1.3.2.1 Principal Components Analysis 

Principal components analysis (PCA) is used to reduce the number of variables and 

dimensionality of a data set. This method is not used to classify new samples, but rather to 

visualize the variance within a data set. As an unsupervised technique, any separation of samples 

is a result of natural variance among the samples. Principal components (PCs) are linear 

combinations of variables and, for a given data set, there will be one less PC defined than there 

are number of variables. The variables have a weighting coefficient ranging from -1 to +1 for 

each PC, which indicates the sign and extent of contribution of the variable to a specific PC. As 

an example, the intensities of two m/z variables, m/z 146 and m/z 189, were plotted against each 

other for six samples (Figure 1.2). The first PC is shown in red on Figure 1.2. For the purposes 

of this discussion, the focus will be on PC1 only.  

 

8 

Figure 1.2 Intensity plot of m/z 189 versus m/z 146 

 

 

 

A score is calculated for each sample on the new axis set. Scores are calculated for each 

sample for each PC, based on the variables contributing to each of the PCs. The dotted line in 

Figure 1.2 shows the projection of one sample onto PC1. This illustrates the projection for only 

one sample, but it would be done for every sample in the data set. The PCs can be plotted against 

one another to generate a scores plot, which shows separation among samples. The score for 

each PC determines where samples are positioned on a scores plot, allowing similarities and 

differences of each sample to be observed in relation to the other samples. Figure 1.3 is an 

example of a PCA scores plot for PC1 versus PC2 for the six samples shown in Figure 1.2.  

9 

PC1 

 

Figure 1.3 Example of a PCA scores plot of PC1 vs PC2 

 

 

Three samples are positioned negatively, while the other three samples are positioned 

positively on PC1. Loadings plots show which variables contribute to a particular PC and the 

positioning of samples on that PC. Figure 1.4 shows the loadings plot for PC1 and PC2. Three 

samples are positioned negatively on PC1 due to the negative loading of m/z 146 and the high 

intensity of m/z 146 for those samples (Figure 1.4). Conversely, three samples are positioned 

positively on PC1 due to the positive loading of m/z 189 and the high intensity of m/z 189 for 

those samples (Figure 1.4). More detailed information about this method can be found in 

reference 25.25  

 

10 

-0.200.2-0.400.4PC2PC1 

 

Figure 1.4 Example of a loadings plot 

 

 

1.3.2.2 Linear Discriminant Analysis 

Linear discriminant analysis (LDA) is a hard classification multivariate statistical 

method. This method is supervised, which means it has class knowledge and, as a hard 

classification method, must classify any new sample into one of the available classes.  Because 

LDA is a classification method, the model must be developed with a training set to identify 

which variables differentiate the classes. Cross validation is performed to first assess the validity 

of the model. Leave-one-out cross validation is a common method where each sample of the 

training set is removed one at a time and applied to the model. The number of successful 

11 

classifications represents the classification success rate of the model. When a classification 

method has been developed using a training set with successful cross validation, a test set is then 

applied, and samples are classified by the model into an available class.  

The objective of LDA is to minimize within-class variance and maximize between-class 

variance. To do so, linear discriminants (LDs) are calculated that are linear combinations of the 

variables, similar to PCs. There is always one less LD than there are number of classes, for 

example, if there were three classes there would be only two LDs. Coefficients of linear 

discriminants are also similar to loadings plots for PCs and show the contributions of variables to 

separation along an LD. Similar to PCA, these LDs can be plotted against each other to form 

LDA scores plots, where the separation between classes included in the training set can be seen. 

LDA scores plots can then be used to visualize the positioning of new samples in relation to the 

training set.  

For each of the classes, a centroid is calculated, which is the center (or average) of all the 

scores of training set samples in that class. Classification is determined by calculating the 

distance from an unknown sample to the centroid of each class, the shortest distance results in 

classification to a particular class. This distance is statistically referred to as a Mahalanobis 

distance, which is a standardized Euclidean distance. The standardization allows for the scores of 

each LD to have the same variance before the Euclidean distance is calculated. Figure 1.5 shows 

an example of an LDA scores plot with three classes. The triangle signifies an unknown sample 

applied to the model and the dashed lines show the Mahalanobis distance to the centroid of each 

class, shown as a diamond. In this hypothetical example, the unknown sample would be 

classified as a member of the blue class, as the Mahalanobis distance to this class is the shortest. 

12 

For all unknown samples a posterior probability is calculated, which is the probability of class 

membership, and is calculated to each class. 

 

 

 

 

 

Figure 1.5 Example LDA scores plot of LD1 vs LD2 

13 

Yellow ClassBlue ClassRed ClassCentroidUnknown SampleDistancetoCentroidLD1LD21.3.2.3 Soft Independent Modelling of Class Analogies 

Like LDA, soft independent modelling of class analogies (SIMCA) is also a supervised 

classification method; however, SIMCA is a soft, rather than hard, classification method. This 

means SIMCA can classify new samples to one class, more than one class, or none of the classes. 

To develop a SIMCA model, PCA is performed on each of the classes individually and 

optimized to retain a specific number of PCs representing the variance within the class. 

Residuals plots are generated for each class by plotting Hotelling’s T2 residuals versus the 

squared residuals distance (Q). Figure 1.6A shows an example of a class with only one PC 

retained. The Q and T2 distances for the sample positioned the lowest on the Y axis are 

highlighted by blue arrows. The Q distance is statistically defined as a squared orthogonal 

Euclidean distance from a sample to the PCA model, in this case the point of the sample to the 

point on the PC. The Q distance describes lack of fit to the model: a small Q distance indicates 

better fit to the model than a large Q value. The T2 distance is statistically defined as a squared 

Mahalanobis distance between the projection of the sample and the origin in PCA space, which 

is the centroid of the mean-centered data. The T2 distance describes how far from the origin a 

sample is (or how extreme it is) in relation to the other samples in the training set. A high T2 

distance indicates a sample is more extreme than the samples represented by the training set. 

Each of these distances is calculated for every training set sample and test sample applied to this 

class’s PCA space.  

A critical limit for both the Q and T2 parameters is determined for a class by a 

significance level (α) defined by the user. If a training set sample or test set sample falls outside 

the critical limit for a particular class, it is not classified to that class. The cylinder in Figure 

1.6A signifies the critical limit for this example class. The model is optimized by adjusting the 

14 

significance value to include as many of the training set compounds as possible. Leave-one-out 

cross validation of each class can also help to determine which significance level is optimal. 

Figure 1.6B shows an example of a residuals plot for the class represented by Figure 1.6A. The 

one sample that falls outside the critical limit (dashed line) would not be recognized as being a 

member of the class unless the significance level, or critical limit, was expanded. When multiple 

classes are utilized in SIMCA, this process is repeated for each class individually. 

 

15 

A) 

B) 

 

Figure 1.6 Example of A) a SIMCA class with only one PC retained, and B) residuals plot for 

the class 

16 

ResidualsQCritical limitCritical limitPC1QCooman’s plots are used in multi-class SIMCA models to examine classes in relation to 

each other. As described above the Q distance is the lack of fit measurement to a particular class, 

the Cooman’s plot shows the Q distances calculated to two classes, plotted against each other. 

This comparison also requires further optimization, or adjustment, of the critical limit to enhance 

separation between classes while also ensuring all samples within a class are below the critical 

limit. Figure 1.7 shows an example of a Cooman’s plot for two classes, red class and blue class. 

The triangle represents the Q distance to the blue class and the Q distance to the red class of an 

unknown sample applied to the SIMCA model. As the sample falls outside both critical limits for 

the red class and blue class, this sample would be classified as ‘none’ meaning not belonging to 

either of these classes. 

 

 

17 

 

Figure 1.7 Example Cooman’s plot from a SIMCA model 

 

 

1.3.3 Neutral Losses as an Alternative to Mass Spectra 

The work investigated by Davidson and Jackson, as well as Bonetti, used mass spectral 

data to differentiate isomers.19,20 Analogs have the same core structure, but rather than having the 

same substituent at different positions (isomers), they have different substituents. This results in 

differences in the m/z values for fragment ions, which may limit the ability to classify based on 

common fragments. Neutral loss spectra have been used as an alternative to overcome the 

challenges of analogs with structural differences based on the location of a substituent on the 

18 

Squared Residual Distance (Q) to Red ClassSquared Residual Distance (Q) to Blue ClassRed ClassBlueClassUnknown SampleCritical Limitcore structure.26,27 A neutral loss is defined as being the loss of an uncharged species from an ion 

during rearrangement or direct dissociation.9 Fowble et al. differentiated synthetic cathinones 

into seven classes by principal components analysis (PCA) and hierarchal clustering analysis 

(HCA) applied to neutral loss spectra.26 The neutral loss spectra were derived from the mass 

spectra obtained from collision-induced dissociation (CID) direct analysis in real time high-

resolution mass spectrometry (DART-MS) and required the presence of a molecular ion.26 

Although this method was rapid and required minimal sample preparation, it did not have a 

separation method prior to analysis which is problematic in forensic laboratories where mixtures 

are common. Additionally, DART-MS is a soft ionization method, as opposed to EI (a hard 

ionization method), which produces fewer fragment ions. DART-MS also makes identification 

of neutral losses easier to identify because accurate mass data are available.  

Moorthy et al. investigated incorporating neutral loss matching in the National Institute 

of Standards and Technology (NIST) library search algorithm.27 In this work, the authors 

proposed not only comparing fragment ions between mass spectra, but also comparing the 

neutral losses between spectra as well, in what they developed to be the Hybrid Similarity Search 

(HSS). This search algorithm requires the molecular mass of both the library compound and the 

unknown compound in order to compare and align the spectra to compare neutral losses. The use 

of the HSS for fentanyl-related compounds showed success in obtaining further structural 

information about an analog: that is, where the substitution occurred on the core fentanyl 

structure (n-alkyl chain, amide group, aniline ring, or piperidine ring, Figure 1.1).27 The greatest 

obstacle with this work was that the molecular mass must be known or easily obtained. 

Typically, fentanyl analogs do not have a molecular ion present in the mass spectrum when 

analyzed by EI so it is unlikely this information would be available for an unknown. 

19 

1.4 RESEARCH OBJECTIVE 

The objective in this work was to develop, validate, and compare multivariate statistical 

models for classification of fentanyl analogs according to structural subclass. The first goal was 

to develop LDA models for classification of fentanyl analogs according to structural subclass 

while the second goal was to develop SIMCA models for similar classification. In the 

development of each model, three factors were considered to optimize classification. The first 

factor investigated was the effect of mass spectral variation within a peak on the classification 

success. This was achieved by developing models based on spectra collected at the apex of the 

chromatographic peak and spectra averaged across the chromatographic peak. The second factor 

investigated was the effect of spectral variation over time on the classification success. This was 

achieved by developing models using a training set collected within two months and across four 

months. The third factor investigated was the potential of using neutral losses as variables to 

improve classification success. Overall, this work aimed to contribute to the forensic science 

community by investigating methods to enhance characterization of structurally similar fentanyl 

analogs, for which reference spectra are not readily available, using conventional GC-MS 

methods already employed in laboratories. 

 

20 

REFERENCES

21 

REFERENCES 

 

 
(1) Novel Psychoactive Substances. https://www.nmslabs.com/forensic-testing/novel-

psychoactive-substances. (accessed June 2020) 
 

(2) Synthetic Opioid Overdose Data. https://www.cdc.gov/drugoverdose/data/fentanyl.html. 

(accessed June 2020) 

 
(3) National Institute on Drug Abuse. Fentanyl DrugFacts. 

https://www.drugabuse.gov/publications/drugfacts/fentanyl. (accessed June 2020) 

 
(4) Hartney, E. The Fentanyl Crisis-The Drug's Analogs and Derivatives. 

https://www.verywellmind.com/fentanyl-analogs-and-derivatives-4165882. (accessed June 
2020) 

 
(5) Cayman Chemical. Fentanyl Identification Cayman Currents. 28, Ann Arbor (2017). 

 
(6) Drug Enforcement Administration. Drug Scheduling 

https://www.deadiversion.usdoj.gov/synthetic_drugs/about_sd.html. (accessed June 2020) 

 
(7) U.S. Drug Enforcement Administration Emergency Schedules All Illicit Fentanyls in an 

Effort to Reduce Overdose Deaths. https://www.dea.gov/press-releases/2018/02/07/us-drug-
enforcement-administration-emergency-schedules-all-illicit. (accessed June 2020) 

 
(8) Scientific Working Group for the Analysis of Seized Drugs (SWGDRUG) 

Recommendations, 2019. 
http://swgdrug.org/Documents/SWGDRUG%20Recommendations%20Version%208_FINAL
_ForPosting_092919.pdf. (accessed June 2020) 

 
(9) Watson, J. T.; Sparkman, O. D. Introduction to Mass Spectrometry: Instrumentation, 

Applications and Strategies for Data Interpretation; Wiley: Chichester, 2011. 

 
(10) Duffy, J.; Urbas, A.; Niemitz, M.; Lippa, K.; Marginean, I. Differentiation of Fentanyl 

Analogues by Low-Field NMR Spectroscopy. Analytica Chimica Acta. 2019. 1049, 161–169. 

 
(11) Bogun, B.; Moore, S. 1H And 31P Benchtop NMR of Liquids and Solids Used in and/or 

Produced during the Manufacture of Methamphetamine by the HI Reduction of 
Pseudoephedrine/Ephedrine. Forensic Science International. 2017, 278, 68–77. 

 

22 

(12) Kranenburg, R. F.; García-Cicourel, A. R.; Kukurin, C.; Janssen, H.-G.; Schoenmakers, P. 

J.; Asten, A. C. V. Distinguishing Drug Isomers in the Forensic Laboratory: GC–VUV in 
Addition to GC–MS for Orthogonal Selectivity and the Use of Library Match Scores as a 
New Source of Information. Forensic Science International. 2019, 302, 109900. 

 
(13) Roberson, Z. R.; Goodpaster, J. V. Differentiation of Structurally Similar Phenethylamines 

via Gas Chromatography–Vacuum Ultraviolet Spectroscopy (GC–VUV). Forensic 
Chemistry. 2019, 15, 100172. 

 
(14) Kranenburg, R. F.; Peroni, D.; Affourtit, S.; Westerhuis, J. A.; Smilde, A. K.; Asten, A. C. 

V. Revealing Hidden Information in GC–MS Spectra from Isomeric Drugs: Multivariate 
statisticals Based Identification from 15 EV and 70 EV EI Mass Spectra. Forensic 
Chemistry. 2020, 18, 100225. 

 
(15) Bodnar Willard, M.A.; McGuffin, V. L.; Waddell Smith, R. Statistical Comparison of Mass 
Spectra for Identification of Amphetamine-Type Stimulants. Forensic Science International. 
2017, 270, 111–120. https://doi.org/10.1016/j.forsciint.2016.11.013. 

 

(16) Stuhmer, E.L.; McGuffin, V.L.; Waddell Smith, R. Discrimination of Seized Drug 

Positional Isomers based on Statistical Comparison of Electron-Ionization Mass Spectra. 
Forensic Chemistry. 2020, 20, 100261.  

 
(17) Quinn, M.; Brettell, T.; Joshi, M.; Bonetti, J.; Quarino, L. Identifying PCP and Four PCP 

Analogs Using the Gold Chloride Microcrystalline Test Followed by Raman 
Microspectroscopy and Multivariate Statisticals. Forensic Science Internationa.l 2020, 307, 
110135. 

 
(18) Setser, A. L.; Waddell Smith, R. Comparison of Variable Selection Methods Prior to Linear 

Discriminant Analysis Classification of Synthetic Phenethylamines and Tryptamines. 
Forensic Chemistry. 2018, 11, 77–86. 

 
(19) Bonetti, J. Mass Spectral Differentiation of Positional Isomers using Multivariate Statistics 

Forensic Chemistry. 2018, 9, 50–61. 

 
(20) Davidson, J. T.; Jackson, G. P. The differentiation of 2,5-dimethoxy-N-(N-

methoxybenzyl)phenethylamine (NBOMe) isomers using GC retention indices and 
multivariate analysis of ion abundances in electron ionization mass spectra Forensic 
Chemistry, 2019, 14, 100160. 

 
(21) Álvarez, Á.; Yáñez, J.; Contreras, D.; Saavedra, R.; Sáez, P.; Amarasiriwardena, D. 

Propellant’s Differentiation Using FTIR-Photoacoustic Detection for Forensic Studies of 
Improvised Explosive Devices. Forensic Science International. 2017, 280, 169–175. 

 

23 

(22) Kaniu, M.; Angeyo, K. Challenges in Rapid Soil Quality Assessment and Opportunities 

Presented by Multivariate Multivariate statistical Energy Dispersive X-Ray Fluorescence and 
Scattering Spectroscopy. Geoderma. 2015, 241-242, 32–40. 

 
(23) Pereira, J. F.; Silva, C. S.; Vieira, M. J. L.; Pimentel, M. F.; Braz, A.; Honorato, R. S. 

Evaluation and Identification of Blood Stains with Handheld NIR 
Spectrometer. Microchemical Journal. 2017, 133, 561–566. 

 
(24) Waddell, E. E.; Williams, M. R.; Sigman, M. E. Progress Toward the Determination of 

Correct Classification Rates in Fire Debris Analysis II: Utilizing Soft Independent Modelling 
of Class Analogy (SIMCA). Journal of Forensic Sciences. 2014, 59 (4), 927–935. 

 
(25) Brereton, R. G. Chemometrics: Data Driven Extraction for Science; John Wiley & Sons, 

Incorporated: Newark, 2018. 

 
(26) Fowble, K. L.; Shepard, J. R.; Musah, R. A. Identification and Classification of Cathinone 

Unknowns by Statistical Analysis Processing of Direct Analysis in Real Time-High 
Resolution Mass Spectrometry-Derived “Neutral Loss” Spectra. Talanta. 2018, 179, 546–553. 

 
(27) Moorthy, A. S.; Wallace, W. E.; Kearsley, A. J.; Tchekhovskoi, D. V.; Stein, S. E. 
Combining Fragment-Ion and Neutral-Loss Matching during Mass Spectral Library 
Searching: A New General Purpose Algorithm Applicable to Illicit Drug Identification. 
Analytical Chemistry. 2017, 89 (24), 13261–13268. 

24 

2. MATERIALS AND METHODS 

2.1 FENTANYL ANALOG REFERENCE MATERIALS 

A standard operating procedure (SOP) for handling and disposing of fentanyl was 

developed and approved by the Environmental Health and Safety (EHS) and is shown in the 

appendix (A2). Twenty-eight fentanyl analogs were obtained from Cayman Chemical (Ann 

Arbor, MI). The fentanyl analogs were representative of four structural subclasses1: n-alkyl chain 

substituted (AN) subclass, amide group substituted (AG) subclass, aniline ring substituted (AR) 

subclass, and amide and aniline ring substituted (AA) subclass (Table 2.1). All structures for the 

analogs are shown in the appendix (Figure A2.1). All reference materials were prepared in a 1 

mg/mL solution of methanol (ACS Grade, Sigma Aldrich, St. Louis, MO) and analyzed by gas 

chromatography-mass spectrometry (GC-MS).  

 

 

 

 

 

 

 

 

 

25 

Table 2.1 Fentanyl analogs used in this work, separated by structural subclass 

Amide and Aniline Ring 

Aniline Ring 

Amide Group 

n-Alkyl Chain 

para-fluorobutyryl 

para-

fentanyl 

methylfentanyl 

cyclohexyl fentanyl 

furanylethyl 

fentanyl 

meta-fluorobutyryl 

meta-

tetrahydrofuran 

α-methyl acetyl 

fentanyl 

methylfentanyl 

fentanyl 

fentanyl 

ortho-fluorobutyryl 

ortho-

fentanyl 

methylfentanyl 

isobutyryl fentanyl 

4’-methylfentanyl 

para-fluoro methoxyacetyl 

para-

fentanyl 

methoxyfentanyl 

cyclopropyl 

fentanyl 

thio fentanyl 

meta-fluoro methoxyacetyl 

fentanyl 

ortho-fluoro 

methoxyacetyl fentanyl 
para-fluoroisobutyryl 

fentanyl 

meta-fluoroisobutyryl 

fentanyl 

ortho-fluoroisobutyryl 

fentanyl 

para-fluorofentanyl 

acrylfentanyl 

α-methylfentanyl 

para-

chlorofentanyl 

 

 

 

butyryl fentanyl 

cyclopentyl 

fentanyl 

 

 

α-methyl thio 

fentanyl 

 

 

 

 

 

2.2 GAS CHROMATOGRAPHY-MASS SPECTROMETRY (GC-MS) ANALYSIS 

All fentanyl analog reference materials were analyzed by gas chromatography-mass 

spectrometry (GC-MS), using an Agilent Technologies 7890A GC and 5975C Inert XL MSD 

with Triple-Axis Detector (Agilent Technologies, Santa Clara, CA). A CTC-PAL autosampler 

(CTC Analytics, Zwingen, Switzerland) was used to inject 1µL of each sample into the GC. A 

column was used with a 5%-diphenyl-95%-dimethylpolysiloxane column (VF-5ms, 30 m x 0.25 

mm inner diameter x 0.25 µm film thickness, Agilent Technologies). The carrier gas was helium 

at a nominal flow rate of 1 mL/min. The injection temperature was 220 ℃ with a 100:1 split 

ratio and there was a solvent delay of 2.5 min. The GC oven temperature program was as 

follows: 200 ℃ for 1 min, 30 ℃/min to 300 ℃, with a final hold of 8 min. The transfer line was 

26 

kept at 300 ℃ and the mass spectrometer was operated in electron ionization mode at 70 eV. The 

scan range was m/z 40-450 with a scan rate of 4.51 scans/s. The quadrupole temperature was 150 

℃ and the source temperature was 230 ℃. The MS was tuned using the auto tune function in 

ChemStation software prior to each analysis. Each analog was analyzed once per month for four 

consecutive months.  

2.3 DATA ANALYSIS 

 

The mass spectrum for each analog was collected at the apex of the chromatographic 

peak and as an average across the chromatographic peak. The average was taken across the width 

of the peak at half maximum. Mass spectral data were exported from ChemStation (version 

E.01.00.237, Agilent Technologies) to Microsoft Excel (version 16.0, Microsoft Corporation, 

Redmond, WA). For each analog, the mass spectral intensity was normalized to the base peak 

and zero-filled from m/z 40-450. The data were input into Origin (version 9.0 OriginLab 

Corporation, Northampton, MA) for further visualization of the mass spectra.  

2.3.1 Neutral Loss Spectra Development 

 

Because low-resolution MS was used in this work, only the mass of neutral losses could 

be identified rather than the chemical identity. Instead, neutral loss data were hypothetically 

determined and were developed to be as objective as possible. All fentanyl analogs used in this 

work did not produce a molecular ion when subjected to electron ionization. The neutral loss 

spectra were developed by subtracting each m/z value from the base peak in the spectrum. The 

intensity that represented each neutral loss was the normalized intensity of the m/z value from 

which the neutral loss was derived. For example, if the base peak was m/z 245, then neutral loss 

m/z 99 and its intensity would be derived from m/z 146 in the mass spectrum. 

27 

2.5 STATISTICAL MODELLING 

All statistical modelling in this work was performed in R (version 3.5.1, The R Project 

for Statistical Computing). All spectra were divided into a training set and a test set. Two 

classification methods were investigated: linear discriminant analysis (LDA) and soft 

independent modelling of class analogies (SIMCA). For these procedures additional packages 

were downloaded.2,3 

For each classification method, four models were developed to investigate the effect of 

spectral variation within a peak and over time on the classification success. The first two models 

(one for apex spectra, one for average spectra) to investigate the effect of spectral variation 

within a peak were developed using an initial training set that contained 44 spectra (Table 2.2) 

and tested using a test set containing 68 spectra (Table 2.3). To investigate the effect of spectral 

variation over time, the training and test sets were re-defined. For these models, the training set 

contained 88 spectra (Table 2.4) and the test set contained 24 spectra (Table 2.5). The refined 

models were only developed using mass spectra collected at the apex of the chromatographic 

peak. Finally, LDA and SIMCA models were developed, optimized, and tested using the neutral 

loss spectra (Section 2.3.1), using the same training set and test set as the refined model, albeit 

with neutral loss, rather than mass spectral, data.  

 

 

 

 

 

28 

Table 2.2 Training set for the initial models (all analog spectra in n = 2) 

Amide and Aniline Ring 

Aniline Ring 

Amide Group 

n-Alkyl Chain 

para-fluorobutyryl 

meta-

fentanyl 

methylfentanyl 

cyclohexyl fentanyl 

furanylethyl 

fentanyl 

meta-fluorobutyryl 

ortho-

tetrahydrofuran 

α-methyl acetyl 

fentanyl 

methylfentanyl 

fentanyl 

fentanyl 

ortho-fluorobutyryl 

para-

fentanyl 

methoxyfentanyl 

meta-fluoro methoxyacetyl 

fentanyl 

ortho-fluoro 

methoxyacetyl fentanyl 
para-fluoroisobutyryl 

fentanyl 

ortho-fluoroisobutyryl 

fentanyl 

para-fluorofentanyl 

para-

chlorofentanyl 

 

 

 

 

isobutyryl fentanyl 

4’-methylfentanyl 

cyclopropyl 

fentanyl 

thiofentanyl 

acrylfentanyl 

α-methylfentanyl 

 

 

 

 

 

 

 

 

 

 

 

 

 

29 

Table 2.3 Test set for the initial models, replicate spectra indicated 

Amide and Aniline 

Ring 

Aniline Ring 

Amide Group 

n-Alkyl Chain 

para-fluorobutyryl 

para-methylfentanyl 

cyclohexyl fentanyl 

furanylethyl fentanyl 

fentanyl (n = 2) 

(n = 4) 

meta-fluorobutyryl 

meta-methylfentanyl 

fentanyl (n = 2) 

ortho-fluorobutyryl 

fentanyl (n = 2) 

para-fluoro 

(n = 2) 
ortho-

methylfentanyl 

 (n = 2) 
para-

methoxyacetyl fentanyl 

methoxyfentanyl  

(n =2) 

tetrahydrofuran 
fentanyl (n = 2) 

(n = 2) 

α-methyl acetyl 
fentanyl (n = 2) 

isobutyryl fentanyl 

4’-methylfentanyl  

(n = 2) 

(n = 2) 

cyclopropyl fentanyl 

thio fentanyl 

(n = 2) 

 (n = 2) 

 (n = 4) 

meta-fluoro 

methoxyacetyl fentanyl  

(n = 2) 

ortho-fluoro 

methoxyacetyl fentanyl  

(n = 2) 

para-fluoroisobutyryl 

fentanyl (n = 2) 

meta-fluoroisobutyryl 

fentanyl (n = 4) 

ortho-fluoroisobutyryl 

fentanyl (n = 2) 

 

(n = 2) 

para-fluorofentanyl 

acrylfentanyl  

α-methylfentanyl 

(n = 2) 

(n = 2) 

 (n = 2) 

para-chlorofentanyl 

butyryl fentanyl 

(n = 2) 

 (n = 4) 

α-methyl thio 
fentanyl (n = 4) 

 

 

 

cyclopentyl fentanyl 

(n = 4) 

 

 

 

 

 

 

 

 

 

 

 

 

30 

Table 2.4 Training set for the refined models and neutral loss models (all analog spectra in  

n = 4) 

Amide and Aniline Ring 

Aniline Ring 

Amide Group 

n-Alkyl Chain 

para-fluorobutyryl 

meta-

fentanyl 

methylfentanyl 

cyclohexyl fentanyl 

furanylethyl 

fentanyl 

meta-fluorobutyryl 

ortho-

tetrahydrofuran 

α-methyl acetyl 

fentanyl 

methylfentanyl 

fentanyl 

fentanyl 

ortho-fluorobutyryl 

para-

fentanyl 

methoxyfentanyl 

meta-fluoro methoxyacetyl 

fentanyl 

ortho-fluoro 

methoxyacetyl fentanyl 
para-fluoroisobutyryl 

fentanyl 

ortho-fluoroisobutyryl 

fentanyl 

para-fluorofentanyl 

para-

chlorofentanyl 

isobutyryl fentanyl 

4’-methylfentanyl 

cyclopropyl 

fentanyl 

thio fentanyl 

acrylfentanyl 

α-methylfentanyl 

 

 

 

 

 

 

 

 

 

Table 2.5 Test set for the refined models and neutral loss models (all analog spectra in n = 4) 

Amide and Aniline Ring 
para-fluoro methoxyacetyl 

fentanyl 

Aniline Ring 

Amide Group 

para-

methylfentanyl 

cyclopentyl 

fentanyl 

meta-fluoroisobutyryl fentanyl 

 

butyryl fentanyl 

n-Alkyl Chain 
α-methyl thio 

fentanyl 

 

 

2.5.1 Principal Components Analysis (PCA) 

Principal components analysis (PCA) was applied to the full mass spectra (m/z 40-450) of 

all analogs selected for the training set. The scores plots were examined to determine the number 

of principal components (PCs) to retain based on separation of structural subclasses. The 

31 

loadings plots for the retained PCs were examined and the absolute values of the loadings were 

normalized to the largest loading value across all PCs retained to generate relative loadings. The 

relative loadings were then filtered at various thresholds (i.e., 1.5%, 2%, 2.5%, and 3%) to 

determine an optimal number of variables to retain for LDA. The thresholds were a percent of 

the relative loadings value and used to reduce the number of variables, as LDA requires the 

number of variables be less than the number of samples. The optimal threshold, and resulting 

number of variables, was determined by applying LDA to the selected variables for each 

threshold and assessing the leave-one-out cross validation success. The variables that resulted in 

the optimal leave-one-out cross validation were used for model development. All R codes for 

PCA are shown in the appendix (Table A2.1). 

2.5.2 Linear Discriminant Analysis (LDA) 

The various relative loadings thresholds were used with LDA to determine the optimal 

variables (m/z values) by assessing the leave-one-out cross validation for the training set. The 

threshold and resulting variables with the best cross validation results were retained for model 

development and validation. Four LDA models were developed in this work: an initial model 

with apex data, an initial model with average data, a refined model, and a neutral loss model. 

Test sets were applied to these models to assess LDA classification accuracy. All R codes for 

LDA are shown in the appendix (Table A2.2). 

2.5.3 Soft Independent Modelling of Class Analogies (SIMCA) 

Four SIMCA models were also developed: an initial model with apex data, an initial 

model with average data, a refined model, and a neutral loss model. Unlike LDA, SIMCA does 

not require variable reduction so SIMCA was applied to the full mass spectra. Test sets were also 

32 

applied to these models to assess SIMCA classification accuracy. All R codes for SIMCA 

are shown in the appendix (Table A2.3)

33 

 

APPENDIX

34 

A2 Standard Operating Procedure for Sample Preparation of Fentanyl and Analogs 

 

STANDARD OPERATING PROCEDURE 

Sample Preparation of Fentanyl and Analogs 

_______________________________________________ 

 

 

 

 

Research Group: ____Ruth Smith – Forensic Chemistry_____________________________ 

Author: ______Amber Gerheart and Hannah Clause_______________________________ 

Last revision date: _____06/28/2019_____________________________________________ 

Room and Building: ___204 and 205 Chemistry__________________________________________ 

Contact information: ___517-353-5283_________________________________________ 

 

Section 1: This standard operating procedure is for 

□  

□  

The generic use of a chemical 

A specific laboratory procedure involving a chemical 

Section 2: Chemical information 

Fentanyl – Solid white powder, odorless,  

Fatal if swallowed. Fatal if inhaled. Call 911 upon any potential exposure 

Do not breathe {dust/fume/gas/mist/vapors/spray}. 

35 

Wash hands thoroughly after handling. Wear respiratory protection.  

Symptoms of exposure may include: Contracted or pinpoint pupils (miosis) (may later become dilated), 
reduced level of consciousness (CNS depression), reduced respiratory function (respiratory depression), 
reduced blood oxygen content (hypoxia), accumulation of acid in the blood (acidosis), low blood 
pressure (hypotension), slow heart rate (bradycardia), shock, slowing of muscular movement of the 
stomach (gastric hypomotility) with intestinal obstruction due to lack of normal muscle function (ileus), 
accumulation of fluid in the lungs (pulmonary edema), lethargy, coma, and death. 

All standard protocols for handling and use of DEA Controlled Substances are required when using this 
product. 

Section 3: Potential Hazards 

Chemical Dangers – No hazardous polymerization will occur 
Explosion Hazards – None determined 
Fire Fighting Information – Burning may produce carbon monoxide, carbon dioxide, and nitrogen oxides 
Physical Exposure – Fatal if inhaled or swallowed 
MSDS: https://www.caymanchem.com/msdss/14719m.pdf 
 
Section 4: Personal Protective Equipment 

All work in laboratories must be performed under the guidelines for appropriate laboratory attire, as 
defined by the MSU Chemical Hygiene Plan: 

Long pants or long skirt covering the legs from the waist to the top of shoes 
Safety goggles 
Laboratory coat 

-  Closed-toe shoes 
- 
- 
- 
-  Disposable laboratory coat 
-  Nitrile gloves (double glove when handling) 
-  N-95 respirator 
PPE will be regularly stocked in the lab (room 204, Chemistry Building).  

http://home.iape.org/resourcesPages/IAPE_Downloads/Drugs/Evidence_Unit_Safety_Protocols_
in_Light_of_Fentanyl.pdf 

Section 5: Engineering Controls 

The eye wash and emergency shower are located to the right of the designated fume hood in room 204. 
The eye wash is at the sink by the door and the shower is between the door and the sink. The current 
lab safety coordinator will designate a student to be responsible for checking the condition of eye 
washes on a weekly basis. 

The fume hood where this work will be done is located in the back, right side of the room. 

36 

Section 6: Special Handling and Storage Requirements 

Fentanyl and 30 analogs (see attached Appendix) will be purchased. With the exception of fentanyl 
itself, only 1 mg of each analog will be purchased. For fentanyl, 10 mg will be purchased. On receipt of 
the fentanyl analogs, each will be assigned a unique identifier and will be logged in our electronic 
Controlled Substances log. All fentanyl analogs will be stored in the controlled substances safe in room 
205. The safe is a combination-type safe that only Dr. Smith has access to. The safe is housed within an 
enclosed area that is accessible by key-card access only and again, only Dr. Smith has key card access to 
the area. When not in use, all fentanyl analogs will be stored in the safe. 

Prior to analysis, each analog will be prepared in solution. A 1 milliliter aliquot of suitable solvent 
(methanol or chloroform) will be added to the vial containing the analog. The solution will then be 
transferred using a glass pipet to a gas chromatography (GC) vial for analysis. The capped GC vial will 
then be transferred into a scintillation vial that will be used as secondary containment. The capped GC 
vial and the scintillation vial will be labeled with the analog name, the concentration of the solution, and 
appropriate hazard labels (Health: 4 Flammability: 1). The scintillation vial will be color coded according 
to the structural subclass of each analog to minimize handling of samples.  

Section 7: Accidental Release and Decontamination Procedures 

A mixture of 1 tablespoon OxiClean Versatile Stain Remover (main components are sodium 
percarbonate and hydrogen peroxide) and 500 mL water will be prepared each day of analysis. Prior to 
any sample handling, work surfaces will be cleaned three times using this solution. After preparation, 
the work area will again be cleaned three times with this solution.  

At any given time, 1 mg or less of the analog will be handled. If a spill occurs, the area will be cleaned 
with the OxiClean solution (minimum of three wipes with the solution to increase decontamination 
efficiency) and any solid waste (e.g., paper towels used to clean the spill) will be disposed of in a ziplock-
type bag. The ziplock bag will be sealed and transferred into double-bagged 10 gallon-size ziplock-type 
bags. The hazardous waste tag will be placed between the two 10-gallon size bags. The smaller sealed 
bag containing the solid waste will be placed inside the larger bags, which will be sealed and stored in 
the secure, enclosed area, next to the controlled substances safe.  

As soon as reasonably possible after the spill, the lab safety coordinator, Dr. Smith, and EHS will all be 
notified. Spills not contained in a fume hood or spills leading to contamination of personnel or 
equipment will be reported to 911 immediately.  

Section 8: Exposure Procedures 

At all times that fentanyl analogs are being handled, two people will be in the lab, one to handle the 
analogs and the other as a safety measure. Any potential exposure to skin, eyes, or inhalation will be 
immediately reported to 911. It is important that any person assisting the victim does not contaminate 
themselves. Therefore, call 911 immediately and then don double gloves, lab coat, and safety goggles 
before assisting the victim. 

37 

- 

- 

If skin exposure occurs, the area should be washed immediately with soap and water while 
waiting for paramedics to arrive.  
If swallowed, the individual will be instructed to rinse out their mouth with water while waiting 
for paramedics to arrive. If, after any exposure, the individual exhibits signs of overdose (e.g., 
drowsiness, disorientation, sedation, pinpoint pupils, skin rash, clammy skin, or respiratory 
depression or arrest), nasal naloxone (Narcan) will be administered according to the 
manufacturer instructions. 

Narcan will be stored in 204 Chemistry, next to the fume hood where the samples will be prepared. The 
date of receipt and the listed expiration date will be noted on a log on the side of the fume hood. Prior 
to beginning any work with fentanyl, personnel must ensure the appropriate quantity of Narcan is 
available and that the Narcan has not expired.  

NOTE: in cases of skin exposure, DO NOT use hand sanitizers. These products penetrate the skin which 
may increase the absorption of fentanyl through the skin.  

Section 9: Waste Disposal Procedures 

All uncontaminated packaging, boxes, or other items that may indicate the presence of controlled 
substances should not be recycled or placed into trash cans for routine disposal. These items will be 
placed into a double-bagged garbage bag, labeled with a hazardous waste tag, for incineration.  

When working with the samples, solid waste will be placed into a gallon-size or smaller ziplock-type bag 
within the fume hood. At the end of sample preparation, the ziplock bag will be sealed and transferred 
into double-bagged 10 gallon-size ziplock-type bags. The hazardous waste tag will be placed between 
the two 10-gallon size bags. The smaller sealed bag containing the solid waste will be placed inside the 
larger bags, which will be sealed and stored in the secure, enclosed area, next to the controlled 
substances safe.  

Any liquid waste will be treated as “controlled-substance containing waste” and will be stored in a 250 
mL amber bottle, labeled with the appropriate hazardous waste tags. The waste bottle will be stored in 
the secure, enclosed area next to the controlled substances safe. Contact Amber Bitters, EHS Hazardous 
Waste Coordinator at 517-432-5262 when ready for final disposal.  

Section 10: Material Safety Data Sheets / Safety Data Sheets 

Lab 205 – in safety binder in the drawer with all other safety coordinator information 

Also found online at https://www.caymanchem.com/msdss/14719m.pdf 

Section 11: Training and Awareness 

Employees working with chemicals must complete the following training: 

□  Chemical Hygiene and Hazardous Waste Initial / Refresher 

38 

□  Site Specific Training with PI or lab manager 

□  Review and signature of this completed SOP 

□  Other: 

Controlled Substances Training 

    

Biohazard Training 

Naloxone Training_(online training in the form of webpage instructions and video are 
available at https://www.narcan.com/patients/how-to-use-narcan/) 

Completion of the training will be recorded in the Training Folder located in 205 Chemistry and an 
electronic version of the completed training will be maintained by Dr. Smith.  

Section 11: Protocols 

The objective in this research is to characterize fentanyl and related analogs based on the corresponding 
mass spectral data. Prior to mass spectral analysis, each fentanyl analog will be prepared at a maximum 
concentration of 1 mg in 1 mL of appropriate solvent (methanol or chloroform).  

The protocol for preparing the analogs is as follows: 

1.  Each analyst working with fentanyl will wear appropriate PPE (lab coat, disposable lab coat, 

double gloves, safety goggles, and a disposable N-95 respirator). 

2.  The working area within the designated fume hood will be sprayed with OxiClean Versatile Stain 

Remover solution and cleaned. The area will be wiped a minimum of three times with this 
solution to increase decontamination efficiency. 

3.  A sheet of bench paper will be placed in the fume hood. 
4.  For each analog, the cap of the sample bottle will be removed and 1 mL of appropriate solvent 

(methanol or chloroform) will be transferred to the bottle using an automated pipet. 

5.  The solution will then be transferred to a glass GC vial using an automated pipet.  
6.  The GC vial will be capped and labeled with the analog name and concentration, along with 

hazard warning labels.  

7.  The capped GC vial will be placed in a scintillation vial that will be capped and labeled with the 
analog name and concentration, along with hazard warning labels. A color-coded label will also 
be adhered to the cap of the scintillation vial to readily identify the structural subclass of analog 
within the vial and thereby minimize sample handling. 

8.  For fentanyl, as 10 mg will be purchased, appropriate aliquots will be weighed into GC vials prior 
to the addition of solvent. In these cases, an analytical balance will be transferred into the fume 
hood and used to weigh the appropriate aliquots of fentanyl. Steps 4 – 7 will then be followed to 
prepare the fentanyl sample. The balance will be wiped down with OxiClean Versatile Stain 
Remover. The surface of the balance will be wiped a minimum of three times with the OxiClean 

39 

solution to increase the decontamination efficiency. The balance will then be returned to the lab 
bench. 

9.  Following preparation, the scintillation vials containing the prepared solutions will be returned 

to the controlled substances safe. 

10. The bench paper will be folded and placed in the gallon-size solid waste bag, along with any 
other solid waste produced during the sample preparation procedure (e.g., disposable pipet 
tips). The solid waste bag will be sealed and placed in a double-lined 10-gallon bag and sealed. 
The sealed solid waste bag will be appropriately labeled and stored in the secure, enclosed area 
next to the controlled substances safe. 

11. Any liquid waste will be transferred to the “Controlled Substances Hazardous Waste” bottle, 

which will be appropriately labeled and stored in the secure, enclosed area next to the 
controlled substances safe.  

12. The fume hood will again be wiped down with the OxiClean Versatile Stain Remover solution. As 

before, the surface will be wiped down a minimum of three times to increase the 
decontamination efficiency. The solid waste generated during this cleaning will be disposed of 
as described in step 10. 

13. Disposable PPE will be disposed of as solid waste (as described in step 10) and safety goggles will 

be wiped down with the OxiClean solution (minimum of three wipes). 

The samples will be analyzed by gas chromatography-mass spectrometry, using an instrument available 
in the Mass Spectrometry and Metabolomics Core (MSMC) on campus. The protocol for transferring the 
analogs to and analyzing the analogs in the MSMC is as follows: 

1.  The prepared analogs will be transferred from 205 Chemistry to 11 Biochemistry in secondary 

containment and only the specific samples to be analyzed each time will be transferred. 

2.  Two personnel will always transport the samples, with one wearing double gloves and carrying 

the analogs in a tray, while the second will carry a spill kit containing Narcan, a spray bottle 
containing a freshly prepared solution of OxiClean Versatile Stain Remover, and paper towels. 
3.  Once at the MSMC, the instrument septum, liner, and syringe will be replaced by MSMC staff 

and a series of solvent blanks will be analyzed.  

4.  Once the instrument is deemed sufficient for analysis (no contamination in solvent blanks), the 

analogs to be analyzed will be transferred to the autosampler tray and the sequence set up. 
Each sequence will include a minimum of three solvent blanks at the end, which will be used to 
assess the cleanliness of the GC column at the end of the analysis (no residual analogs present). 
5.  At the end of the sequence, the liner, septum, and syringe will be replaced by MSMC staff. The 

potentially contaminated liner and septum will be placed in a ziplock-type bag and treated as 
solid waste. The syringe will be rinsed thoroughly with solvent, placed in its original box, and 
returned with the analogs to 205 Chemistry. 

6.  The liquid waste from the syringe rinsing in Step 5 as well as liquid in the autosampler waste vial 

will be transferred to a scintillation vial clearly labeled as “Controlled Substances Hazardous 
Waste.” The scintillation vial will be transported back to 205 Chemistry along with the analogs. 
The liquid waste will be transferred into the Controlled Substances Hazardous Waste bottle, 

40 

which will be appropriately labeled and stored in the secure, enclosed area next to the 
controlled substances safe.  

7.  One laboratory personnel will remain with the analogs at all times during analysis in the MSMC.  

Section 12: SOP Review and Prior Approval 

I, the PI/Supervisor, grant the following laboratory personnel approval to perform the above SOP 

Name: _____Amber Gerheart_____________________________________________________ 

Name: _____Hannah Clause______________________________________________________ 

Name: _____Amanda Setser______________________________________________________ 

PI/Laboratory Supervisor signature: _____________________________________ Date: _____________ 

I have reviewed and understood this Standard Operating Procedure, and agree to abide by the protocols 
described herein: 

Signature: _____________________________________________________ Date: __________________ 

Signature: _____________________________________________________ Date: __________________ 

Signature: _____________________________________________________ Date: __________________ 

A completed copy of this Standard Operating Procedure has been reviewed and approved by MSU Office 
of Environmental Safety: 

MSU EHS Staff: __________________________________________________ Date: _______________ 

41 

Additional Reading 

1.  Froelich NM, Sprague JE, Worst TJ. Letter to the Editor – Elbow Grease and OxiClean™ for 

Cleaning fentanyl- and Acetylfentanyl-contaminated Surfaces. Journal of Forensic Sciences 2018 
63 (1) 336. 

2.  Fentanyl. A Briefing Guide for First Responders. US Department of Justice, Drug Enforcement 

Administration. Available at https://www.nvfc.org/wp-content/uploads/2018/03/Fentanyl-
Briefing-Guide-for-First-Responders.pdf (Accessed February 15, 2019). 

3.  Fentanyl. Safety Recommendations for First Responders. US Department of Justice, Drug 

Enforcement Administration. Available at 
https://www.dea.gov/sites/default/files/Publications/Final%20STANDARD%20size%20of%20Fen
tanyl%20Safety%20Recommendations%20for%20First%20Respond....pdf (Accessed February 15, 
2019). 

 

 

42 

Appendix: Fentanyl Analogs Included in SOP 

Sample preparation and handling of the following compounds are included in this SOP: 

1.  Fentanyl hydrochloride 
2.  Furanylethyl Fentanyl (hydrochloride) 
3.  alpha-methyl Acetyl Fentanyl (hydrochloride) 
4.  beta-hydroxythioacetylfentanyl 
5.  beta-hydroxy Fentanyl (hydrochloride) 
6.  4'-methyl Fentanyl 
7.  Thiofentanyl (hydrochloride) 
8.  alpha-methyl Thiofentanyl 
9.  alpha-methyl Fentanyl 
10. Butyryl Fentanyl (hydrochloride) 
11. Isobutyryl Fentanyl (hydrochloride) 
12. Acrylfentanyl (hydrochloride) 
13. Cyclopropyl Fentanyl (hydrochloride) 
14. Cyclopentyl Fentanyl (hydrochloride) 
15. Tetrahydrofuran Fentanyl (hydrochloride) 
16. Cyclohexyl Fentanyl (hydrochloride) 
17. ortho-Methylfentanyl (hydrochloride) 
18. meta-Methylfentanyl (hydrochloride) 
19. para-Methylfentanyl (hydrochloride) 
20. para-Methoxyfentanyl (hydrochloride) 
21. para-Chlorofentanyl (hydrochloride) 
22. para-Fluorofentanyl (hydrochloride) 
23. ortho-Fluorobutyryl Fentanyl (hydrochloride) 
24. para-Fluorobutyryl Fentanyl (hydrochloride) 
25. meta-Fluorobutyryl Fentanyl (hydrochloride) 
26. meta-Fluoroisobutyryl Fentanyl (hydrochloride) 
27. ortho-Fluoroisobutyryl Fentanyl (hydrochloride) 
28. FIBF (hydrochloride) 
29. Ocfentanil 
30. meta-Fluoro Methoxyacetyl Fentanyl (hydrochloride) 
31. para-Fluoro Methoxyacetyl Fentanyl (hydrochloride) 

 

 

43 

ortho-fluorobutyryl fentanyl 

meta-fluorobutyryl fentanyl 

para-fluorobutyryl fentanyl 

ortho-fluoroisobutyryl fentanyl 

 

meta-fluoroisobutyryl fentanyl 

para-fluoroisobutyryl fentanyl 

 

 

Figure A2.1 Structures of all fentanyl analogs used in this work 

 

 

44 

Figure A2.1 cont’d 

ortho-fluoro methoxyacetyl 
fentanyl 

meta-fluoro methoxyacetyl 
fentanyl 

para-fluoro methoxyacetyl fentanyl 

 

 

butyryl fentanyl 

 

isobutyryl fentanyl 

acrylfentanyl 

45 

 

cyclopropyl fentanyl 

 

cyclopentyl fentanyl 

Figure A2.1 cont’d 

cyclohexyl fentanyl 

tetrahydrofuran fentanyl 

ortho-methylfentanyl 

meta-methylfentanyl 

46 

 

 

Figure A2.1 cont’d 

para-methylfentanyl 

para-methoxyfentanyl 

para-chlorofentanyl 

para-fluorofentanyl 

furanylethyl fentanyl 

α-methyl acetyl fentanyl 

47 

 

 

4’-methyl fentanyl 
 

Figure A2.1 cont’d 

thiofentanyl 

α-methyl thiofentanyl 

α-methyl fentanyl 

48 

Table A2.1 PCA R Code4 

R Code 

Command 

getwd() 
setwd(“C:/directory”) 
data=read.table(“file name.txt”,header=TRUE) 

pca<-prcomp(data,scale=FALSE) 
print(pca) 
pca$rotation[,1:n] 

summary(pca) 
pca$x 

Identifies current directory 
Sets the directory containing data 
Inputs data, header=TRUE identifies the 
first column and row as headers 
Application of PCA to data 
Output for loadings values for all PCs 
Output for loadings values for only one 
through n number of PCs 
Output for scree plots for PCA 
Output for score values for PCs 

 

 

 

 

 

 

 

 

 

 

 

 

 

49 

Table A2.2 LDA R Code4 

R Code 

Command 

getwd() 
setwd(“C:/directory”) 
data=read.table(“file name.txt”,header=TRUE) 

names(data)=c(“mass41”,”mass43”,…,”Class”) 

attach(data) 
library(MASS) 

data.lda=lda(Class~mass41+mass43+…,data,CV=1) 
data.lda 

train<-data[1:85,] 

test<-data[86:139,] 

Identifies current directory 
Sets the directory containing data 
Inputs data, header=TRUE identifies 
the first column and row as headers 
Names the variables in the top row of 
data sheet 
Attaches data 
Loads R package that applied LDA to 
data 
Preforms leave-one-out cross 
validation on dataset and provides 
output of results 
Identifies the training set by identifying 
which row contain the training set 
Identifies the test set by identifying 
which row contain the test set 

data.lda=lda(Class~mass41+mass43+…,data=train)  Application of LDA to data 
data.lda 

Output for coefficientss of linear 
discriminants 
Obtains score values for training set 
Output for training set score values 
Application of test set to LDA model 
Output for probability of classification 
to each class for all training set samples 
Output for test set score values 

data.lda.values<-predict(data.lda,data[1:85,]) 
data.lda.values$x 
lda.pred<-predict(data.lda,test) 
lda.pred$posterior 

lda.pred$x 

 

 

 

 

 

 

 

50 

Table A2.3 SIMCA R Code 

R Code 

Command 

getwd() 
setwd(“C:/directory”) 
data=read.table(“file name.txt”) 
data2=data[,1:411] 
class=data[,412] 

X.c=data[1:44,] 
X.class=X.c[1:14,] 

library(mdatools) 

m.class=simca(X.class,'class',PC,alpha=n) 

m.class=selectCompNum(m.class,PC) 

Identifies current directory 
Sets the directory containing data 
Inputs data 
Identifies columns for variables 
Identifies column with class 
membership label 
Identifies rows with training set data 
Identifies rows containing class data, 
must be specified for each class 
Loads the R Package that applied 
SIMCA to data 
Sets SIMCA parameters for a class, 
must be specified for each class 
individually (X.class=class label, 
PC=number of PCs retained, n=alpha 
value) 
Sets number of PCs retained for a 
class, must be specified for each 
class individually 

m=simcam(list(m.class1,m.class2,m.class3,m.class4))  Compiles all classes together to 

summary(m) 

X.t=data[45:112,] 
c.t=data[45:112,412] 
 
print(m.class) 
print(m.class$calres$scores) 
print(m$modpower) 

perform multiclass SIMCA 
Output for a summary of all classes 
and parameters used for each class in 
multiclass SIMCA 
Identifies test set rows 
Identifies test set rows and which 
column has class membership label 
Output options for a class 
Output for scores values 
Output for modelling power values 

Cooman’s Plots Code 

plotCooman(m,c(1,2),show.labels=T) 

 

 

Plots the Cooman’s Plot of class 1 
and class 2 in R (numbers can 
change based on which classes are 
being compared) 

51 

Table A2.3 cont’d 

Obtains Q values for the training set 
to specified class, n represents the 
number for a class (first class n=1, 
etc.) 
Obtains Q values for the test set to 
specified class, n represents the 
number for a class (first class n=1, 
etc.) 
Output for Q values corresponding to 
class identified in rn 
Output for Q critical limit boundary 

Residuals Plots Code 

Plots residuals plot for a class in R 
Output for T2 values for training set 
for indicated class 
Output for Q values for training set 
for indicated class 
Output for T2 values for cross 
validation of training set for 
indicated class 
Output of Q values for cross 
validation of training set for 
indicated class 
T2 critical limit boundary 
Q critical limit boundary 

rn=predict(m.class,X.c) 

rn=predict(m.class,X.t) 

rn$Q 

m.class$Qlim 

plotResiduals(m.class) 
print(m.class$calres$T2) 

print(m.class$calres$Q) 

print(m.class$cvres$T2) 

print(m.class$cvres$Q) 

m$T2lim 
m$Qlim 

 

 

 

52 

REFERENCES

53 

REFERENCES 

 
 
(1) Cayman Chemical. Fentanyl Identification Cayman Currents. 28, Ann Arbor (2017). 

 
(2) Venables WN, Ripley BD (2002). Modern Applied Statistics with S, Fourth edition. Springer, 

New York. ISBN 0-387-95457-0, http://www.stats.ox.ac.uk/pub/MASS4/.LDA 

 
(3) Sergey Kucheryavskiy, mdatools – R package for chemometrics, Chemometrics and 

Intelligent Laboratory Systems, Volume 198, 2020 (DOI: 10.1016/j.chemolab.2020.103937 

 
(4) Setser, A. L.; Waddell Smith, R. Comparison of Variable Selection Methods Prior to Linear 

Discriminant Analysis Classification of Synthetic Phenethylamines and Tryptamines. 
Forensic Chemistry. 2018, 11, 77–86.

54 

3. LINEAR DISCRIMINANT ANALYSIS (LDA) FOR CLASSIFICATION OF 

FENTANYL ANALOGS ACCORDING TO STRUCTURAL SUBCLASS 

The gold standard for the analysis and identification of controlled substances is gas 

chromatography-mass spectrometry (GC-MS). Identification is made by comparing the retention 

time and mass spectrum of a sample to those of a reference standard. With the emergence of 

novel psychoactive substances (NPS), fentanyl analogs in particular, identification by mass 

spectral comparison may not be possible due to lack of reference materials. Different 

multivariate statistical methods have been investigated as a method to obtain further structural 

information or discriminate between structures.1-7 In this work, mass spectra of fentanyl analogs 

were subjected to linear discriminant analysis (LDA) to classify the analogs according to 

structural subclass. Different models were developed to investigate the effect of spectral 

variation within a peak, the effect of spectral variation over time, and the use of neutral losses 

(rather than fragment ions) as variables. Each model was validated, and external test sets were 

used to determine the success/accuracy in classifying fentanyl analogs according to structural 

subclass. 

3.1 MASS SPECTRAL ANALYSIS OF FENTANYL ANALOGS 

The 28 fentanyl analogs investigated in this work were representative of four structural 

subclasses. The core structure of fentanyl, with initial cleavage sites, is shown in Figure 3.1. The 

four subclasses, which were determined based on the site of substitution on the core fentanyl 

structure, were as follows8: n-alkyl chain substituted (AN) subclass, aniline ring substituted (AR) 

subclass, amide group substituted (AG) subclass, and amide and aniline ring substituted (AA) 

subclass. The electron ionization (EI) fragmentation of these analogs has been hypothesized 

based on the known fragmentation of the core fentanyl structure.8 Regardless of the type of 

55 

substituent (e.g., halogen, methoxy, methyl, etc.), the site of substitution was expected to impact 

the order in which bonds cleave for the analogs. The first cleavage site was dependent upon 

where the substitution occurs on the structure.8  

C 

B 

A 

 

 

Figure 3.1 Initial cleavage sites of fentanyl analogs A) cleavage of the amide group, B) cleavage 

on the piperidine ring, C) cleavage of the n-alkyl chain 

 

 

One compound from each of the four subclasses and its corresponding spectrum are 

shown in Figure 3.2, with the site of substitution highlighted. Representative spectra and 

structures of all compounds are shown in the appendix (Figure A3.1). The molecular ion was not 

observed for any of the fentanyl analogs. When spectra of analogs within a subclass were 

examined, there were similarities. For example, with the exception of α-methylfentanyl and α-

methyl thiofentanyl, all of the AN subclass analogs had a base peak at m/z 245. However, 

similarities were also observed between subclasses. For example, a base peak at m/z 259 was 

observed in the spectra of α-methyl thiofentanyl (AN subclass) and ortho-methylfentanyl (AR 

56 

subclass). The variability in mass spectra within a subclass is due to the type of substitution 

which changes the base peak and many of the smaller mass fragments containing the 

substitution. Although the high intensity m/z ions are not always the same for every compound in 

a subclass, analogs with the same site of substitution are predicted to fragment in a similar 

manner. The predicted similarity in fragmentation supports the hypothesis that chemometric 

methods would be able identify characteristic ions and classify fentanyl analogs according to 

structural subclass using mass spectral data. All fragmentation comments in this work are 

hypothetical as no further chemical analysis was performed to obtain elemental formulae of 

fragment ions. 

 

57 

A) Thiofentanyl 

B) ortho-Methylfentanyl 

 

 

 

C) Cyclopropyl Fentanyl 

D) para-Fluorobutyrylfentanyl 

 

 

 

Figure 3.2 Mass spectra and chemical structures of selected fentanyl analogs A) thiofentanyl 

representing the AN subclass, B) ortho-methylfentanyl representing the AR subclass, C) 

cyclopropyl fentanyl representing the AG subclass, and D) para-fluorobutyrylfentanyl 

representing the AA subclass. 

58 

2004000.00.20.40.60.81.0Relative Intensitym/z245189146571112004000.00.20.40.60.81.0Relative Intensitym/z6943771461892022572004000.00.20.40.60.81.0Relative Intensitym/z431051602032592004000.00.20.40.60.81.0Relative Intensitym/z277207164105713.2 INITIAL LINEAR DISCRIMINANT ANALYSIS (LDA) MODELS TO ASSESS 

VARIATION WITHIN A CHROMATOGRAPHIC PEAK 

To develop classification models, the initial set of 28 analogs was split into a training set 

and a test set. For the initial models, the training set consisted of replicate spectra (n = 2) of 22 

fentanyl analogs (44 spectra total). The test set consisted of replicate spectra (n = 4) of six 

analogs, along with additional spectra (n = 2) of the 22 training set compounds analyzed in two 

subsequent months. The six new analogs represented all four subclasses: two analogs were from 

the AA subclass, two analogs were from the AG subclass, one analog was from the AR subclass, 

and one analog was from the AN subclass. However, due to sample degradation, four spectra 

(representing four analogs) were excluded from the test set such that the final test set contained a 

total of 64 spectra.  

Forensic laboratories typically obtain the mass spectrum from the apex of the 

chromatographic peak; however, fragment ion intensities and ratios vary during elution of the 

chromatographic peak due to changes in concentration.9 When developing the LDA models, it 

was important to develop robust models by accounting for different factors that may have 

affected how the models performed. The first factor considered was the difference between 

classification success using the mass spectrum obtained from the apex of the chromatographic 

peak versus the average mass spectrum across the chromatographic peak. Thus, two data sets 

were generated: the first contained spectra collected at the apex for the training and test set and 

the second contained the average spectra collected across the peaks for the training and test set. 

Each data set was used to develop and validate LDA models and the classification success of 

each model was assessed.  

59 

3.2.1 Principal Components Analysis (PCA) for Variable Selection 

As LDA requires the number of variables to be less than the number of samples, the full 

mass spectrum (m/z 40-450, 411 variables) for each of the samples could not be used for 

modelling. Principal components analysis (PCA) was used as a dimensionality reducing method 

to identify the variables (m/z values) responsible for the most variance in the data set, to reduce 

the number of variables used in LDA. While PCA was conducted on both data sets (apex and 

average spectra), the following discussion focuses on the apex spectra. 

After examining the PCA data, only the first four principal components (PCs), which 

accounted for 68% of the total variance in the data set, were retained as they resulted in adequate 

separation among all four fentanyl analog subclasses. Scores plots representing the first four PCs 

are shown in Figure 3.3.  

60 

A) 

B) 

 

Figure 3.3 PCA scores plots of A) principal component 1 (PC1) vs principal component 2 

(PC2), B) PC1 vs principal component 3 (PC3), and C) PC1 vs principal component 4 (PC4) 

61 

-1.201.2-1.501.5PC3 (14.5%)PC1 (27%)-1.501.5-1.501.5PC2 (17.7%)PC1 (27%)Figure 3.3 cont’d  

C) 

 

 

 

62 

-0.800.8-1.501.5PC4 (8.4%)PC1 (27%)The AA subclass was distinguished from the other three subclasses on PC1 (Figure 

3.3A). The AG subclass and AN subclass were differentiated from the AR subclass on PC2. 

When PC3 was examined, no further separation among the subclasses was achieved (Figure 

3.3B). However, on PC4, the AG subclass was differentiated from the AN subclass (Figure 

3.3C).  

Positioning of each subclass can be explained with reference to the loading plots for each 

PC (Figure 3.4). The majority of the AA analogs were positioned negatively on PC1 and were 

thus distinguished from the other three subclasses. These analogs were positioned negatively on 

PC1 due to high intensities of m/z 43, 164, 207, and 277, which all contributed negatively to PC1 

(Figure 3.4A). However, the duplicate spectra of ortho-fluoro methoxyacetyl fentanyl and meta-

fluoro methoxyacetyl fentanyl were positioned close to zero on PC1 (Figure 3.3A). These 

samples were positioned around zero (as opposed to negatively on PC1) due to the low 

intensities of m/z 43, 164, 207, and 277, which all contributed negatively to PC1 (Figure 3.4B). 

The fluoro methoxyacetyl fentanyl isomers differed in structural substitutions from the other 

analogs in the AA subclass, which caused these isomers to instead have high intensities of m/z 

42, 208, and 279, which minimally contributed to PC1 (Figure 3.4A). It should be noted that the 

other five AA analogs were all fluorobutyryl fentanyl or fluoroisobutyryl fentanyl isomers, 

which could cause a skewed representation of this subclass due to the similarity in structure and, 

therefore, mass spectral fragmentation. Due to the structural similarity, the analogs in the AA 

subclass are not truly representative of this structural subclass; however, there are not currently a 

variety of other analogs representative of this subclass available and these analogs are some of 

the more prominent fentanyl analogs submitted to operational forensic laboratories.10  

 

 

63 

A) 

B) 

 

Figure 3.4 Loadings plot for A) principal component 1 (PC1), B) principal component 2 (PC2), 

and C) principal component 4 (PC4) 

64 

-0.800.84090140190240290340390440PC2 Loadingsm/z-0.800.84090140190240290340390440PC1 Loadingsm/zFigure 3.4 cont’d 

C) 

 

 

65 

-0.800.84090140190240290340390440PC4 Loadingsm/zThe AR subclass was positioned positively on PC2 while the other three subclasses were 

positioned negatively. Two analogs, ortho-methylfentanyl and para-methylfentanyl were 

positioned most positively due to high intensities of m/z 160, 203, 216 and 259, which all 

contributed positively to PC2 (Figure 3.4B). Para-methoxy fentanyl and para-chlorofentanyl 

were also positioned positively on PC2 due to high intensities of m/z 275 (base peak in para-

methoxy fentanyl), m/z 279 (base peak in para-chlorofentanyl), m/z 91, and m/z 105, all of which 

were weighted positively on PC2 (Figure 3.4B). Two of these variables (m/z 91 and m/z 105) are 

common fragments in any compound that contains an aromatic ring. While para-fluorofentanyl 

also contained m/z 91 and m/z 105, this analog was positioned less positively (closer to zero) on 

PC2 than other AR analogs. The base peak in para-fluorofentanyl, m/z 263, was weighted 

negatively on PC2 although not strongly, which resulted in a more positive positioning of this 

analog on PC2. Additionally, the para-fluorofentanyl replicates were the only samples in this 

subclass positioned negatively on PC1, due to a high intensity of m/z 207. Although m/z 207 is a 

common background contaminant ion in mass spectral analysis, it was an important ion to take 

into consideration with fentanyl analogs that have fluorine-substituted aniline rings (Figure 3.5) 

66 

 

 

 

Figure 3.5 Predicted structure of fragment ion at m/z 207  

The AN subclass was positioned negatively on PC2 due to high intensities of m/z 146 and 

m/z 245, while the AG subclass was positioned negatively on PC2 due to high intensity of m/z 

146 (Figure 3.3A). The exceptions to this trend are isobutyryl fentanyl (AG subclass) and α-

methylfentanyl (AN subclass). Isobutyryl fentanyl and α-methylfentanyl positioned positively 

due to high intensity of m/z 259, which contributed positively to PC2 (Figure 3.4B).  

 

While separation was not achieved on the first three PCs, the AG subclass and the AN 

subclass were distinguished on the fourth PC (Figure 3.3C). The AG subclass was positioned 

positively on PC4 due to high intensities of m/z 146, m/z 189, and the base peaks of most of the 

compounds in this subclass (m/z 243, 257, 287, and 299), which were all weighted positively on 

PC4 (Figure 3.4C). The exception was isobutyryl fentanyl, which had a base peak of m/z 259 

that was not weighted positively on PC4. Instead, this ion was weighted negatively on PC4, 

67 

which caused the replicates to be positioned less positively than the other AG analogs. The AN 

subclass was positioned negatively due to high intensities of m/z 245 and m/z 259, which were 

the base peaks for all analogs in the AN subclass. 

 

The relative loadings across the first four PCs were used to determine the optimal number 

of variables to retain in the LDA model.11 Thresholds of 1.5%, 2.0%, and 2.5% were investigated 

(data not shown) and the 2% threshold, retaining 23 variables, was determined to have optimal 

leave-one-out cross validation success for LDA classification (Table 3.1).  

Table 3.1 Variables retained for LDA based on a relative loadings threshold of 2% 

43 
146 
202 
246 

71 
147 
203 
259 

m/z 

105 
164 
216 
277 

77 
160 
207 
260 

119 
189 
243 
278 

132 
190 
245 

 

 

 

 

The majority of spectra in the training set contained all variables up to and including m/z 

216 (Table 3.1). As all analogs contain the same fentanyl core structure, their fragmentation was 

predicted to be similar.8 This results in many of the same hypothetical fragment ions that would 

remain consistent amongst analogs, just a difference in the intensity of these ions. The retained 

variables greater than m/z 216 were all base peaks for analogs in each of the four subclasses 

(Table 3.1). Although they were all base peaks, they were not all the base peaks for all of the 

fentanyl analogs in this work. Some analogs had base peaks not retained by this LDA model, for 

example ortho-fluoro methoxyacetyl fentanyl which had a base peak of m/z 279. From an initial 

68 

comparison of the spectra and the retained variables, it was evident that all variables retained 

were in high intensity in at least some of the fentanyl analogs. However, not all the high intensity 

ions across all spectra were retained. For example, m/z 42, 56, and 69 all had a relatively high 

intensity (>25% of the base peak) in the spectra of multiple analogs but were not retained 

because they did not have a high contribution to the variation in the training set.  

Principal components analysis was also conducted on the average spectra collected across 

the chromatographic peaks for analogs in the training set. Positioning of analogs was similar to 

that observed from PCA of the apex mass spectral data. The associated scores and loadings plots 

for the averaged spectra are shown in the appendix (Figure A3.2-A3.3). Further, from an 

assessment of the relative loadings, the same 23 variables were retained for the average spectra 

(Table 3.1). 

3.2.2 Linear Discriminant Analysis (LDA) Models 

Using the 23 variables determined by PCA, two LDA models were developed, one using 

the spectra collected at the apex of the chromatographic peak (referred to as the apex model) and 

the other using the average spectra from across the chromatographic peaks (referred to as the 

average model). Figure 3.6 shows the LDA scores plots for both models. 

 

 

69 

B) 

D) 

A) 

C) 

 

Figure 3.6 Scores plots for the apex data A) linear discriminant 1 (LD1) vs linear discriminant 2 

(LD2), B) LD1 vs linear discriminant 3 (LD3), and scores plots for the average data C) LD1 vs 

LD2, D) LD1 vs LD3 

 

 

70 

-15015-1000100LD3 (0.5%)LD1 (90.8%)-25025-1000100LD2 (8.7%)LD1 (90.8%)-10010-1000100LD3 (0.4%)LD1 (91.6%)-30030-1000100LD2 (8%)LD1 (91.6%)For the apex model (Figures 3.6A and B), separation was observed between the AN 

subclass which positioned negatively on LD1 and the AG subclass which positioned positively 

on LD1. There was no separation between the AR subclass and the AA subclass until LD3, on 

which the AR subclass positioned positively and the AA subclass positioned negatively. When 

the apex model was compared to the average model (Figure 3.6C and D), the subclasses were 

positioned very similarly on all three LDs. The leave-one-out cross validation for both models 

was 100% and the classification success was 98% with only one sample (para-fluoro 

methoxyacetyl fentanyl, vide infra) misclassified. Although the average spectra account for 

variability in ion intensities within a peak, the apex spectra were still sufficiently representative, 

as comparable classification results were observed. As spectra are typically collected at the apex 

of the chromatographic peak in forensic laboratories, the apex spectra were used for all 

subsequent modelling using LDA.  

 

When the LD1 versus LD2 scores plot was examined, there was separation between the 

AN subclass and the AG subclass on LD1 (Figure 3.6A). The AN subclass was positioned 

negatively on LD1 due to higher intensities of m/z 77, 202, 203, and base peaks m/z 245 and m/z 

259 (the only base peaks in this subclass), which all contributed negatively to LD1 (Figure 

3.7A). The AG subclass was positioned positively on LD1 due to high intensities of m/z 132 and 

m/z 190.  

 

71 

A) 

B) 

  

Figure 3.7 Coefficients of A) linear discriminant 1 (LD1), B) linear discriminant 2 (LD2), and 

C) linear discriminant 3 (LD3) 

72 

-110001100437177105119132146147160164189190202203207216243245246259260277278Coefficients of LD2m/z-150001500437177105119132146147160164189190202203207216243245246259260277278Coefficients of LD1m/zFigure 3.7 cont’d 

C) 

 

 

73 

-130001300437177105119132146147160164189190202203207216243245246259260277278Coefficients of LD3m/zBoth the AG subclass and the AN subclass were positioned positively on LD2, while the 

AR subclass and the AA subclass were positioned negatively (Figure 3.6A). The AG subclass 

and AN subclass were positioned positively due to high intensities of m/z 77, 147, and 189, while 

the AA subclass and AR subclass positioned negatively due to high intensity m/z 43, 105, and 

160 (Figure 3.7B).  

The AA subclass and the AR subclass were not distinguished on LD1 or LD2, but 

separation between these groups was achieved on LD3 (Figure 3.6B). The largest negative 

loading was m/z 190, but did not provide separation between the AA and AR subclasses. This 

variable was observed in the AG and AN subclasses and explains the spread of the subclasses 

around LD3. The largest positive loading was m/z 260, which was observed in high intensity in 

the methylfentanyl isomers (AR subclass). This contributed to more positive positioning of these 

isomers on LD3. The AR subclass was positioned positively on LD3 due to high intensities of 

m/z 77 and m/z 160, which both contributed positively to LD3 (Figure 3.7C). The AA subclass 

was positioned negatively on LD3 due to high intensities of m/z 105 and m/z 277, the latter of 

which was the base peak for the fluorobutyryl fentanyl and fluoroisobutyryl fentanyl isomers. 

Although these variables had minimal contribution to LD3, analogs in the AA subclass either 

lacked or had a low intensity of the other variables that contributed positively to LD3 (Figure 

3.7C).  The average LDA coefficients of linear discriminants are shown in the appendix (Figure 

A3.4). 

When the scores plot for LD1 versus LD3 was examined, the separation between the AR 

subclass and the AA subclass was minimal (Figure 3.6B). The predicted fragmentation of 

fentanyl analogs with a substitution at the amide group and/or the aniline ring was that the first 

fragment was a result of an α-β cleavage of the n-alkyl chain (Figure 3.1C) and the second 

74 

fragment generated by a cleavage of the amide group (Figure 3.1A). However, it was predicted 

that a larger substituent on the amide group could cause the amide group to cleave first (Figure 

3.1A).11 Figure 3.8 shows para-fluorofentanyl from the AR subclass and para-fluorobutyryl 

fentanyl from the AA subclass with these initial cleavage sites highlighted. These two cleavages 

would result in a consistent fragment containing the aniline ring and piperidine ring for analogs 

in both the AR and AA subclasses. There were nine analogs in the AA subclass with a fluorine 

substituent on the aniline ring and one analog in the AR subclass with a fluorine substituent. This 

caused a common fragment between both subclasses, m/z 207 (Figure 3.5). Additionally, once 

the fluorine substituent cleaved off the aniline ring, one would expect there to be similar 

fragments between all nine of the AA subclass analogs and at least three of the AR subclass 

analogs. The minimal separation of these subclasses on the scores plot was consistent with the 

fact that their predicted fragmentation patterns would be very similar, making it harder to 

differentiate between these two subclasses of fentanyl analogs. Due to this similarity it was 

predicted that any new samples belonging to either of these subclasses may be misclassified due 

to these similarities in fragmentation. It is worth noting that these classes were readily 

distinguished in the PCA scores plot (Figure 3.3A). However, for PCA, all m/z values in the 

scan range were included in the data set whereas, for LDA, a reduced number of variables was 

used for modelling. As a result, there was less separation of these two classes by LDA. 

 

 

75 

 

A) 

Chemical Formula: C4H7O+ 

Nominal Mass: 71 Da 

Chemical Formula: C7H7

+ 

Nominal Mass: 91 Da 

Chemical Formula: C12H16FN2

+ 

Nominal Mass: 207 Da 

B) 

Chemical Formula: C3H5O+ 

Nominal Mass: 57 Da 

Chemical Formula: C7H7

+ 

Nominal Mass: 91 Da 

Chemical Formula: C12H16FN2

+ 

 

Nominal Mass: 207 Da 

 

Figure 3.8 Predicted fragments of A) para-fluorofentanyl from the AR subclass and B) para-

fluorobutyryl fentanyl from the AA subclass 

76 

 

 

 

 

Following the leave-one-out cross validation of the model, which resulted in 100% cross 

validation success, the model was then used to classify the test set. One replicate of para-fluoro 

methoxyacetyl fentanyl was misclassified, resulting in a successful classification rate of 98%. 

Para-fluoro methoxyacetyl fentanyl is an AA analog, but this replicate was misclassified as an 

AR analog (the other three replicates were correctly classified). The misclassification was likely 

due to an unusually high background in the sample which resulted in a higher intensity of m/z 

207 than would be expected. As m/z 207 contributes positively to LD3, this caused para-fluoro 

methoxyacetyl to position more positively than its correct class. Overall, for both the apex and 

average LDA models, the successful classification rate was 98%, with one analog (para-fluoro 

methoxyacetyl) misclassified.  

3.3 REFINED LINEAR DISCRIMINANT ANALYSIS (LDA) MODEL TO 

INCORPORATE INSTRUMENT VARIATION 

The second factor that was investigated was the effect of instrument variation over time 

on the rate of successful classification. The training set was redefined to include 22 analogs 

analyzed across four months, with three replicates from the last collection removed due to 

sample degradation (three total spectra). This resulted in a smaller test set; however, no samples 

in the test set were represented in the training set. The test set now consisted of six analogs 

analyzed across four months, with one replicate removed due to sample degradation (one 

spectrum). Additionally, the refined model was developed using mass spectra collected at the 

apex of the chromatographic peaks as no significant difference in classification success was 

observed using apex and average spectra in the initial model (Section 3.2). 

77 

3.3.1 Refined LDA Model for Classification of Fentanyl Analogs 

The full mass spectra of all training set analogs and replicates were input into PCA to 

identify the characteristic m/z values to retain for LDA model development. Only the first four 

PCs were retained as there was separation among all four subclasses. They accounted for 66% of 

the variation (compared to 68% for PCA based on the initial training set, Section 3.2.1). The 

scores plot for the first four PCs is shown in Figure 3.9.  

78 

A) 

B) 

Figure 3.9 Principal components analysis scores plot of A) PC1 vs PC2, B) PC1 vs PC3, 

C) PC1 vs PC4 

79 

-1.501.5-1.501.5PC2 (17.1%)PC1 (27.7%)-1.201.2-1.501.5PC3 (13.6%)PC1 (27.7%)C) 

Figure 3.9 cont’d 

 

 

 

80 

-1.201.2-1.501.5PC4 (8.1%)PC1 (27.7%)The PCA scores plot of PC1 versus PC2 showed separation of the AA subclass, which 

was positioned negatively on PC1, from the other three subclasses, which were positioned 

positively on PC1. The AN and AG subclasses were also separated from the AR subclass on 

PC2, as the former were positioned negatively, and the latter was positioned positively on PC2 

(Figure 3.9A). The scores plot of PC1 versus PC3 (Figure 3.9B) did not provide further 

separation among the subclasses. When the scores plot of PC1 versus PC4 (Figure 3.9C) was 

examined, there was separation between the AG subclass and the AN subclass, as the AN 

subclass positioned negatively on PC4 and the AG subclass positioned positively. The 

positioning of all samples could be explained with reference to the loadings plots in the appendix 

(Figure A3.5). 

The relative loadings across all four PCs were calculated to determine the variables that 

should be retained for LDA. At a threshold of 2%, 25 variables were retained to develop the 

refined LDA model (Table 3.2). 

 

 

 

Table 3.2 Variables retained for the refined LDA model, as determined by PCA 

41 
105 
160 
203 
246 

43 
119 
164 
207 
259 

71 
146 
190 
243 
277 

77 
147 
202 
245 
278 

m/z 
57 
132 
189 
216 
260 

81 

The 25 variables retained for the refined LDA model included the same 23 variables that 

were retained for the initial model (Table 3.1). The additional variables retained were m/z 41 and 

m/z 57, which were observed in analogs across all four subclasses. The 25 variables were used to 

develop the LDA model and resulted in 100% successful leave-one-out cross validation. The 

LDA scores plots for the refined LDA model had tighter groupings of each subclass, which 

indicated less within-class variation (Figure 3.10). The initial LDA models (Figure 3.6) had 

more spread in sample positioning, which supported that the refined model, which accounted for 

instrument variation, optimized the LDA modelling. The AN subclass was positioned negatively 

on LD1 and the AG subclass was positioned positively on LD1 (Figure 3.10A). Although the 

AR subclass and the AA subclass were not separated on LD1 or LD2, they were able to be 

differentiated on LD3 (Figure 3.10B). The positioning of the subclasses on the LDs could be 

explained with reference to the coefficients of linear discriminants plots (Figure A3.6). The 

incorporation of instrument variation allowed for 100% correct classification of all test set 

samples due to enhanced separation between subclasses. 

 

 

82 

A) 

B) 

 

Figure 3.10 Scores plot for the refined LDA model A) LD1 vs LD2, B) LD1 vs LD3 

83 

-10010-40040LD3 (3.5%)LD1 (77.9%)-15015-40040LD2 (18.6%)LD1 (77.9%)3.3.2 Additional Test Sets to Validate the Linear Discriminant Analysis (LDA) Model 

To investigate the applicability of the refined LDA model, two external test sets of mass 

spectra collected on different instruments and using different methods were applied. The first test 

set, Test Set 1, contained spectra of 42 non-fentanyl NPS compounds, including 

phenethylamines, tryptamines, and cathinones (Table 3.3).11,12 Full chemical names of the non-

fentanyl NPS compounds can be found in the appendix (Table A3.1). The variables for these 

compounds varied significantly from the fentanyl analogs. Since LDA only took 25 variables 

into account, optimized for fentanyl analogs, less than half of the variables selected for model 

development were observed in Test Set 1.  

 

 Table 3.3 List of non-fentanyl NPS compounds in the external test set 

Phenethylamines 

Tryptamines 

FMA*  EMC* 

4-hydroxy-N,N-Dimethyltryptamine 

5-methoxy-N,N-Dimethyltryptamine 

APB*  NBOMe*  2C* 
2-FMA  2-EMC 
4-APB 
2CB 
2CC  5-methoxy-N,N-Diisopropyltryptamine  3-FMA  3-EMC 
5-APB 
2CD 
6-APB 
4-FMA  4-EMC 
7-APB 
2CE 
2CG 
4E-APB 
2CH 
4M-APB 
2CI 
2CN 
2CP 
3,4-DMA  2CT 

4-hydroxy Diethyltryptamine 
4-methyl-α-Ethyltryptamine 

25-B 
25-C 
25-D 
25-E 
25-G 
25-H 
25-N 
25-P 
25-T 

N,N-Dipropyltryptamine 
N,N-Dimethyltryptamine 

5,7-Dichlorotryptamine 

α-Ethyltryptamine 

α-methyl Tryptamine 

 
 
 
 
 
 
 

 
 
 
 
 
 
 

 
 
 
 

*APB – aminopropyl benzofuran 

*NBOMe – N-methoxybenzyl 

*2C – 2,5-dimethoxy 

*FMA – fluoromethamphetamine 

*EMC - ethylmethcathinone 

 

 

 

84 

Even though the samples in the external test set were not fentanyl analogs, LDA is a hard 

classification method and forces classification. All Test Set 1 samples were classified to an 

available subclass with a posterior probability of 1 (the highest probability); however, when the 

scores plots were examined it was clear these samples did not belong to any of the subclasses 

(Figure 3.11). This highlights that the posterior probabilities should not be considered alone but 

should be examined in conjunction with the scores plots. All samples from the external test set 

were positioned far outside the grouping, and centroids, of the fentanyl subclasses. However, 

there were two exceptions: the fluoromethamphetamine (FMA) and ethylmethcathinone (EMC) 

isomers, which positioned closely to the AR subclass (Figure 3.11D)  

 

 

 

85 

A) 

C) 

B) 

D) 

    

Figure 3.11 Scores plot for the refined LDA model A) LD1 vs LD2, B) enlarged LD1 vs LD2, 

C) LD1 vs LD3, D) enlarged LD1 vs LD3 

 

 

These isomers have mass spectra with very few ions (Figure 3.12). The two most intense 

ions in each spectrum were not variables retained in the LDA model, so the variables used in 

LDA had low intensity. The LDA model used 25 variables to model new samples, but the EMC 

and FMA isomers contained at most only 13 of the 25 variables. The low intensity of the few 

86 

-10010-40040LD3 (3.5%)LD1 (77.9%)-150001500-800008000LD3 (3.5%)LD1 (77.9%)-20020-40040LD2 (18.6%)LD1 (77.9%)-800008000-800008000LD2 (18.6%)LD1 (77.9%)ions contributed to the samples positioning close to zero, indicating that minimal variation was 

able to contribute to positioning of samples. 

A) 

B) 

Figure 3.12 Representative spectrum of A) 2-EMC and B) 2-FMA 

 

 

 

 

Test Set 2 contained mass spectra of six case samples obtained from the Michigan State 

Police Forensic Science Division. These samples had previously been analyzed by GC-MS and 

the fentanyl analog identified based on mass spectral comparison to a reference standard. The six 

samples and the corresponding spectra are shown in Figure 3.13. 

87 

02004000.00.20.40.60.81.0Relative Intensitym/z587702004000.00.20.40.60.81.0Relative Intensitym/z5895A) 

Carfentanil 

B) 

Methoxy Acetyl Fentanyl 

Furanyl Fentanyl 

Valeryl Fentanyl 

C) 

D) 

E) 

Acetyl Fentanyl 

F) 

3’-Methylfentanyl 

Figure 3.13 Structures and spectra of case samples for A) carfentanil, B) methoxy acetyl 

fentanyl, C) furanyl fentanyl, D) valeryl fentanyl, E) acetyl fentanyl, F) 3’-methylfentanyl  

88 

2004000.00.20.40.60.81.0Relative Intensitym/z23118814643912004000.00.20.40.60.81.0Relative Intensitym/z273189146105422004000.00.20.40.60.81.0Relative Intensitym/z952832401582004000.00.20.40.60.81.0Relative Intensitym/z24518914693422022004000.00.20.40.60.81.0Relative Intensitym/z303187105422432004000.00.20.40.60.81.0Relative Intensitym/z261218158105Of the six case samples, four of the analogs identified belonged to the AG subclass 

(valeryl fentanyl, acetyl fentanyl, methoxy acetyl fentanyl and furanyl fentanyl, Figure 3.13B-

E), one belonged to the AN subclass (3’-methylfentanyl, Figure 3.13F), and one (carfentanil, 

Figure 3.13A) did not belong to any of the four structural subclasses defined in this work. These 

spectra contained fewer fragment ions and at lower intensities than observed in the fentanyl 

analogs used to develop and optimize the models. Each spectrum only contained 20 m/z values, 

indicating an instrument parameter set to retain only the 20 most intense ions. The LDA model 

used 25 variables to classify each sample; however, due to the small number of ions in the full 

spectrum, only 15-50% of the variables were accounted for in the LDA model, depending upon 

the analog. This data indicated that the lower intensity ions may be responsible for separation 

and classification of analogs. Additionally, the case samples were likely in much lower 

concentrations than the samples that were prepared at MSU, which could cause variations in ion 

intensity not accounted for in the model.  

When the case samples were applied to LDA, valeryl fentanyl was the only sample that 

was correctly classified. Five of the six case samples were misclassified in LDA, likely because 

the case samples had very few variables. Valeryl fentanyl likely only classified correctly due to a 

high intensity of m/z 132 and m/z 190, without the presence of other ions contributing highly to 

the other LDs. As stated previously, the low number of variables was likely due to an instrument 

parameter and the concentration of the controlled substances in case samples being very low. 

This highlights the need for concentration to be incorporated in multivariate statistical models in 

order to correctly classify a wide variety of samples. 

89 

3.4 APPLICATION OF NEUTRAL LOSS SPECTRA TO REFINE THE LINEAR 

DISCRIMINANT ANALYSIS (LDA) MODEL 

The use of neutral loss spectra in multivariate methods and for obtaining structural 

information about unknowns has been demonstrated.13,14 This work explored the use of neutral 

losses, rather than fragment ions, for fentanyl analog classification according to structural 

subclass. To obtain representative neutral loss spectra, high-resolution mass spectrometry must 

be used to determine elemental formulae of fragment ions. As high-resolution mass spectrometry 

was not used here, this work represents a very preliminary investigation into the potential to 

classify analogs according to subclass based on neutral losses. The neutral loss spectra used here 

were generated by subtracting every m/z value from the base peak value for each analog. It was 

known that all neutral loss fragments do not derive from the base peak, but this was used as 

preliminary attempt to explore the potential for classifying fentanyl analogs according to 

structural subclass based on common neutral losses. The intensity of each neutral loss was 

assumed to be consistent with the intensity of the resultant m/z fragment in the mass spectrum. 

Because the neutral loss spectra were not obtained using high-resolution mass spectrometry, all 

predicted neutral loss spectra and fragments are hypothetical.  

3.4.1 Neutral Loss Spectra of Fentanyl Analogs 

The first step was to identify common neutral losses within each subclass. As an 

example, consider ortho-methylfentanyl and para-methoxy fentanyl. These two analogs are 

members of the AR subclass and representative mass spectra are shown in Figure 3.14. Because 

of the different substitutions, the mass spectra of these two analogs were different: for example, 

the base peak in ortho-methylfentanyl was at m/z 259 with other dominant ions at m/z 160, 203, 

and 216. However, in para-methoxy fentanyl, the base peak was at m/z 275 with other dominant 

90 

ions at m/z 176, 219, and 232. Despite the different masses of the fragment ions, both compounds 

had the same neutral losses, which are highlighted in red in Figure 3.14. An important note is 

that this work assumes all neutral losses were derived from the base peak, which is not 

necessarily true. A neutral loss of 99 could consist of two neutral losses 43 and 56; however, 

without knowledge of the elemental formulae for fragment ions, this assumption was used for the 

preliminary investigation. A neutral loss spectrum for one compound from each of the four 

subclasses is shown in Figure 3.15. Neutral loss spectra for all other analogs are shown in the 

appendix (Figure A3.7).  

 

 

91 

A) ortho-methylfentanyl 

A) 

 

B) para-methoxy fentanyl 

B) 

 

Figure 3.14 Mass spectrum with common neutral losses highlighted for A) ortho-methylfentanyl 

 

and B) para-methoxy fentanyl 

92 

1602032161721469956438711325911814123217618816213499564387113141275219A) Thiofentanyl 

 

B) ortho-Methylfentanyl 

 

C) Cyclopropyl Fentanyl 

 

D) para-Fluorobutyrylfentanyl 

 

 

 

Figure 3.15 Neutral loss spectra and chemical structures of selected fentanyls A) thiofentanyl 

representing the AN subclass, B) ortho-methylfentanyl representing the AR subclass, C) 

cyclopropyl fentanyl representing the AG subclass, and D) para-fluorobutyrylfentanyl 

representing the AA subclass. 

93 

01002000.00.10.20.30.40.50.60.70.8Relative IntensityNeutral Loss (Da)203995613416801002000.00.10.20.30.40.50.6Relative IntensityNeutral Loss (Da)1882166811118015216101002000.00.20.40.60.8Relative IntensityNeutral Loss (Da)4356992172021681540501001502002500.00.10.20.30.40.5Relative IntensityNeutral Loss (Da)113702342061723.4.2 Application of Linear Discriminant Analysis (LDA) to Neutral Loss Spectra for 

Classification of Fentanyl Analogs 

The 28 fentanyl analogs and replicates were divided into the same training and test sets 

that were used in the refined model (i.e., accounting for variations in spectra as a function of 

time, Section 3.3). The full neutral loss spectra for all training set analogs were subjected to PCA 

to determine which variables would be retained for LDA. The PCA data were examined and the 

four subclasses were differentiated across the first four PCs, which accounted for 72% of the 

total variation in the data set, 4% more variation than the refined LDA model (Figure 3.16). 

 

 

 

94 

A) 

B) 

 

Figure 3.16 PCA scores plot for neutral loss LDA model A) PC1 vs PC2, B) PC1 vs PC3, C) 

PC1 vs PC4 

 

95 

-101-101PC3 (10.3%)PC1 (37.5%)-101-101PC2 (16.8%)PC1 (37.5%)Figure 3.16 cont’d 

C) 

 

 

96 

-101-101PC4 (7.6%)PC1 (37.5%) 

The AA subclass was differentiated on PC1 by positioning negatively due to a higher 

intensity of neutral loss 70, 71, 113, and 234, which were all weighted negatively on PC1 

(Figure 3.17A). Replicates (n = 4) of isobutyryl fentanyl (AG subclass) were also positioned 

negatively on PC1, close to the AA subclass, specifically the fluoro methoxyacetyl fentanyl 

analogs. Isobutyryl fentanyl positioned negatively on PC1 due to high intensities of neutral 

losses 70 and 113. The AG subclass was differentiated on PC2 by positioning positively due to 

higher intensities of neutral losses 153, 188, and 216 (Figure 3.17B). When PC3 was examined, 

the isobutyryl fentanyl replicates were differentiated from the fluoro methoxyacetyl fentanyl 

analogs (Figure 3.17C). The fluoro methoxyacetyl fentanyl analogs positioned positively on 

PC3 due to high intensities of neutral losses 43, 174, 188, 234, and 237, and the isobutyryl 

fentanyl replicates positioned negatively due to high intensities of neutral losses 70, 113, and 

216. The AR and AN subclasses were differentiated on PC4 (Figure 3.17D). The AR subclass 

positioned positively on PC4 due to high intensities of neutral losses 56 and 141, while the AN 

subclass positioned negatively due to high intensities of neutral losses 43, 113, 188 and 201.  

It was predicted that the neutral loss data would provide more separation among 

subclasses. However, when the PCA scores plots were examined, they did not show enhanced 

separation among the subclasses as predicted. The refined PCA model using mass spectral data 

showed less spread among of the four subclasses (Figure 3.9).  

97 

A) 

B) 

  

Figure 3.17 Neutral loss PCA loadings plots for A) PC1, B) PC2, C) PC3, and D) PC4 

98 

-0.600.615099148197246PC2 Loadingsm/z-0.800.815099148197246PC1 Loadingsm/zFigure 3.17 cont’d 

C) 

D) 

 

 

99 

-0.500.515099148197246PC4 Loadingsm/z-0.500.515099148197246PC3 Loadingsm/z 

The relative loadings were calculated across the first four PCs. A threshold of 3.5% was 

determined to be optimal for LDA classification by resulting in 100% successful leave-one-out 

cross validation, which retained 17 variables that were used in LDA (Table 3.4).  

 

Table 3.4 Variables retained for neutral loss LDA model, as determined by the 3.5% threshold of 

the PCA data 

Neutral Loss (Da) 
97 
70 
168 
154 
233 
234 

99 
188 
235 

113 
203 

 

56 
149 
206 

69 
152 
215 

 

 

The variables retained were representative of neutral losses in all four subclasses. When 

the neutral loss spectra were examined, neutral losses 69, 70, 206, 233, 234, and 235 were 

observed in the AA subclass. The AG subclass contained neutral losses 188 and 215 in high 

intensity. The AR subclass contained neutral losses 56, 97, 99, and 113. The AN subclass 

contained neutral losses 56, 97, 99, 113, and 203. The other variables retained were observed in 

multiple analogs and subclasses. Also, due to the way in which the neutral loss spectra were 

generated, some of the retained variables were isotope peaks of a high intensity ion in the mass 

spectra, for example neutral loss 234 was the high intensity ion and neutral losses 233 and 235 

were likely isotope peaks. The 17 variables were used to develop the LDA model and resulted in 

100% successful leave-one-out cross validation. The LDA scores plots demonstrated the 

separation among fentanyl subclasses (Figure 3.18).  

100 

A) 

B) 

Figure 3.18 Scores plot for neutral loss LDA model A) LD1 vs LD2 and B) LD1 vs LD3 

101 

-808-10010LD3 (16%)LD1 (63%)-16016-15015LD2 (21%)LD1 (63%)The AG and AA subclasses were differentiated from the AN and AR subclasses on LD1. 

The AG and AA subclasses positioned negatively due to neutral losses 70, 188, 206, and 215 and 

the AN and AR subclasses positioned positively due to neutral losses 56, 149, 154, and 168 

(Figure 3.19A). The AR and AN subclasses were differentiated on LD2. The AR subclass 

positioned positively due to neutral losses 56 and 168, while the AN subclass positioned 

negatively due to neutral losses 149, 154, and 203 (Figure 3.19B). The AG and AA subclasses 

were differentiated on LD3. The AG subclass positioned positively on LD3 due to neutral losses 

97, 206, and 215, while the AA subclass positioned negatively due to neutral losses 69, 113, and 

234 (Figure 3.19C).  

Once again, the LDA model developed with neutral losses showed more spread in 

subclass grouping than the refined LDA model (Figure 3.10). The refined LDA model with mass 

spectral data showed tight grouping of all subclasses and test set samples. For the neutral loss 

data, it was predicted that it would provide more separation between groups by reducing within-

class variation, allowing for between-group variation to be maximized. These results do not 

support the theory that neutral loss data would enhance separation between groups; however, this 

is likely due to the manner in which the neutral losses were generated. With further investigation 

and identification of neutral losses, this model could be refined to be more specific, potentially 

providing the predicted tighter grouping of subclasses. 

Although there was more spread in the subclass groupings in the neutral loss model, 

when the test set was applied there was 100% successful classification. These classification 

results are comparable to the refined LDA model in Section 3.3, which supports the idea that 

these analogs could be classified according to structural subclass using neutral loss spectra. With 

high-resolution mass spectrometry, elemental formulae of fragment ions could be determined; 

102 

and along with the mass difference, the data could be used to identity the chemical composition 

of neutral loss fragments. The LDA model could be further refined to contain specific neutral 

loss fragments characteristic of each of the four subclasses. 

 

 

 

103 

A) 

B)  

  

Figure 3.19 Coefficients for neutral loss LDA model A) LD1, B) LD2, and C) LD3 

104 

-400405669709799113149152154168188203206215233234235Coefficients of LD2m/z-13001305669709799113149152154168188203206215233234235Coefficients of LD1m/zFigure 3.19 cont’d 

C) 

 

 

 

105 

-700705669709799113149152154168188203206215233234235Coefficients of LD3m/z3.5 SUMMARY OF LINEAR DISCRIMINANT ANALYSIS (LDA) MODELS 

The first factor explored in this work was the effect of spectral variation within a peak on 

LDA classification of fentanyl analogs. There was minimal difference in the rate of successful 

classification for LDA models developed based on apex and average spectra across the full width 

at half maximum. Both models resulted in 98% correct classification, with one of the para-fluoro 

methoxyacetyl fentanyl replicates misclassified. These results showed that using the average 

spectra provided no benefit to classification and therefore supported using the mass spectra 

collected at the apex of the peak in model development and application. 

When the training set was refined to incorporate instrument variation, the LDA model 

was improved. The test set resulted in 100% correct classification. This model indicates that 

LDA was a suitable method to differentiate fentanyl analogs into four structural subclasses; 

however, the models were only tested with a very limited test set that contained only six new 

compounds. A much larger test set would need to be applied to validate the model for the wide 

range of potential analogs that a forensic laboratory may encounter. Additionally, three analogs 

in the test set were positional isomers of compounds in the training set. This may indicate a 

higher confidence in unknown classification than would be true for analogs without positional 

isomers represented in the training set.  

To further investigate the applicability of the refined LDA model, two external test sets 

were applied. When Test Set 1 (non-fentanyl samples) was applied, all samples were classified 

incorrectly because LDA is a hard classification method. When examining the scores plots, the 

FMA and EMC samples were positioned closely to the AR subclass. This could potentially 

mislead analysts to classifying unknowns as fentanyl analogs when they are not. When Test Set 2 

(fentanyl samples) was applied, only one of the six case samples was correctly classified. The 

106 

case samples, likely in much lower concentration than the fentanyl standards used in this work, 

highlighted the need to account for concentration in model development. Changes in 

concentration result in mass spectral variation and can result in incorrect classification. In order 

to make robust models applicable to forensic laboratories, concentration should be a factor in 

developing the training set.  

The application of LDA to neutral loss spectra, rather than mass spectra, also showed 

potential for fentanyl analog classification. The neutral loss LDA model resulted in 100% correct 

classification of the test set, comparable to results obtained for the refined LDA model in Section 

3.2. Although the model did not result in less within-class variation as expected, this is likely due 

to the method in which the neutral loss spectra were generated. The neutral loss spectra were 

generated to test the potential of these data to classify structurally similar compounds with 

different substituents. To truly use neutral loss data for classification, the chemical identity of the 

mass fragments and neutral loss fragments must be investigated using alternative mass 

spectrometry techniques. 

 

 

107 

APPENDIX 

108 

para-fluorobutyryl fentanyl 

ortho-fluorobutyryl fentanyl 

meta-fluorobutyryl fentanyl 

para-fluoro methoxyacetyl fentanyl 

ortho-fluoro methoxyacetyl fentanyl 

meta-fluoro methoxyacetyl fentanyl 

 

Figure A3.1 Mass spectra of all fentanyl analogs 

109 

2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/zFigure A3.1 cont’d 

para-fluoroisobutyryl fentanyl 

ortho-fluoroisobutyryl fentanyl 

meta-fluoroisobutyryl fentanyl 

cyclohexyl fentanyl 

cyclopropyl fentanyl 

cyclopentyl fentanyl 

 

110 

2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/zFigure A3.1 cont’d 

butyryl fentanyl 

isobutyryl fentanyl 

acrylfentanyl 

tetrahydrofuran fentanyl 

para-methylfentanyl 

ortho-methylfentanyl 

 

111 

2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/zFigure A3.1 cont’d 

meta-methylfentanyl 

para-methoxyfentanyl 

para-chlorofentanyl 

para-fluorofentanyl 

furanylethyl fentanyl 

α-methyl acetyl fentanyl 

112 

2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/zFigure A3.1 cont’d 

α-methyl thiofentanyl 

α-methylfentanyl 

4’-methylfentanyl 

thiofentanyl 

 

 

 

113 

2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/zA) 

B) 

  

 

Figure A3.2 Initial average model PCA scores plots for A) PC1 vs PC2, B) PC1 vs PC3,  

and C) PC1 vs PC4 

114 

-1.501.5-1.501.5PC2 (17.7%)PC1 (26.9%)-1.201.2-1.501.5PC3 (14.6%)PC1 (26.9%)Figure A3.2 cont’d 

C) 

 

 

 

115 

-101-1.501.5PC4 (8.4%)PC1 (26.9%)A) 

B) 

Figure A3.3 Initial average model PCA loadings plots for A) PC1, B) PC2, C) PC3, and D) PC4 

 

 

116 

-0.800.84090140190240290340390440PC1 Loadingsm/z-0.800.84090140190240290340390440PC2 Loadingsm/zFigure A3.3 cont’d 

C) 

D) 

 

 

117 

-0.800.84090140190240290340390440PC3 Loadingsm/z-0.800.84090140190240290340390440PC4 Loadingsm/zA) 

B) 

 

Figure A3.4 Initial average LDA model coefficients of A) LD1 and B) LD3 

 

 

118 

-200002000437177105119132146147160164189190202203207216243245246259260277278Coefficients of LD3m/z-100001000437177105119132146147160164189190202203207216243245246259260277278Coefficients of LD1m/zA) 

B) 

 

Figure A3.5 Refined PCA loadings plots for A) PC1, B) PC2, C) PC3, and D) PC4 

 

119 

-0.700.74090140190240290340390440PC2 Loadingsm/z-0.700.74090140190240290340390440PC1 Loadingsm/zC) 

D) 

Figure A3.5 cont’d 

 

120 

-0.700.74090140190240290340390440PC4 Loadingsm/z-0.700.74090140190240290340390440PC3 Loadingsm/zA) 

B) 

 

 

Figure A3.6 Refined LDA model coefficients of A) LD1 and B) LD3 

 

121 

-25002504143577177105119132146147160164189190202203207216243245246259260277278Coefficients of LD1m/z-40004004143577177105119132146147160164189190202203207216243245246259260277278Coefficients of LD3m/zTable A3.1 Chemical names of non-fentanyl NPS compounds 

Abbreviation  Chemical Name 

Abbreviation 

4-APB 

5-APB 

6-APB 

7-APB 

4E-APB 

4-(2-
aminopropyl)benzofuran 
5-(2-
aminopropyl)benzofuran 
6-(2-
aminopropyl)benzofuran 
7-(2-aminopropyl) 
benzofuran 

4-(2-ethylaminopropyl) 
benzofuran 

2CB 

2CC 

2CD 

2CE 

2CG 

4M-APB 

4-(2-methylaminopropyl) 
benzofuran 

2CH 

2CI 

2CN 

2CP 

2CT 

25-B 

25-C 

25-D 

25-E 

25-G 

25-H 

 

4-bromo-2,5-dimethoxy-
N-[(2-
methoxyphenyl)methyl]- 
benzeneethanamine 
2-(4-chloro-2,5-
dimethoxyphenyl)-N-(2-
methoxybenzyl)ethanami
ne 
2-(2,5-dimethoxy-4-
methylphenyl)-N-(2-
methoxybenzyl)ethanami
ne 
2-(4-ethyl-2,5-
dimethoxyphenyl)-N-(2-
methoxybenzyl)ethanami
ne 
2,5-dimethoxy-N-[(2-
methoxyphenyl)methyl]-
3,4-dimethyl-
benzeethanamine 
2-(2,5-
dimethoxyphenyl)-N-(2-
methoxybenzyl) 
ethanamine 
 

Chemical Name 
2,5-dimethoxy-4-
bromophenethylamine 
2,5-dimethoxy-4-
chlorophenethylamine 
2,5-dimethoxy-4-
methylphenethylamine 
2,5-dimethoxy-4-
ethylphenethylamine 
3,4-dimethyl-2,5-
dimethoxyphenethylamin
e 
2,5-
dimethoxyphenethylamin
e 

2,5-dimethoxy-4-
iodophenethylamine 

2,5-dimethoxy-4-
nitrophenethylamine 

2,5-dimethoxy-4-
propylphenethylamine 

2,5-dimethoxy-4-
methylthiophenethylamin
e 

4-hydroxy-N,N-
Dimethyltryptamine 

3-[2-
(dimethylamino)ethyl]-
1H-indol-4-ol 

5-methoxy-N,N-
Diisopropyltryptamin
e 

5-methoxy-N,N-bis(1-
methylethyl)-1H-indole-
3-ethanamine 

122 

Table A3.1 cont’d 

5-methoxy-N,N-
Dimethyltryptamine 

5-methoxy-N,N-
dimethyl-1H-indole-3-
ethanamine 

N,N-
Dipropyltryptamine 

N,N-dipropyl-1H-indole-
3-ethanamine 

N,N-
Dimethyltryptamine 

N,N-dimethyl-1H-indole-
3-ethanamine 

4-hydroxy 
Diethyltryptamine 

4-methyl-α-
Ethyltryptamine 
5,7-
Dichlorotryptamine 

α-Ethyltryptamine 

3-[2-
(diethylamino)ethyl]-1H-
indol-4-ol 

α-ethyl-4-methyl-1H-
indole-3-ethanamine 
5,7-dichloro-1H-indole-
3-ethanamine 
α-ethyl-1H-indole-3-
ethanamine 
α-methyl-1H-indole-3-
ethanamine 
 
 

2-(2,5-dimethoxy-4-
nitrophenyl)-N-(2-
methoxybenzyl)ethanami
ne 
2,5-dimethoxy-N-[(2-
methoxyphenyl)methyl]-
4-propyl-
benzeneethanamine 
2,5-dimethoxy-N-[(2-
methoxyphenyl)methyl]-
4-(methylthio)-
benzeneethanamine 
3,4-dimethoxy-N-[(2-
methoxyphenyl)methyl]-
α-methyl-
benzeethanamine 
2-
fluoromethamphentamine 
3-
fluoromethamphetamine 
4-
fluoromethamphetamine 

25-N 

25-P 

25-T 

3,4-DMA 

2-FMA 

3-FMA 

4-FMA 

2-EMC 

3-EMC 
4-EMC 

 

2-ethylmethcathinone 

α-methyl Tryptamine 

3-ethylmethcathinone 
4-ethylmethcathinone 

 
 

123 

para-fluorobutyryl fentanyl 

ortho-fluorobutyryl fentanyl 

meta-fluorobutyryl fentanyl 

para-fluoro methoxyacetyl fentanyl 

ortho-fluoro methoxyacetyl fentanyl 

meta-fluoro methoxyacetyl fentanyl 

 

Figure A3.7 Neutral loss spectra of all fentanyl analogs 

124 

0501001502002500.00.10.20.30.40.5Relative IntensityNeutral Loss (Da)0501001502002500.00.10.20.30.40.50.6Relative IntensityNeutral Loss (Da)0501001502002500.00.20.4Relative IntensityNeutral Loss (Da)0501001502002500.00.20.4Relative IntensityNeutral Loss (Da)0501001502002500.000.050.100.150.200.250.300.350.40Relative IntensityNeutral Loss (Da)0501001502002500.00.20.4Relative IntensityNeutral Loss (Da)Figure A3.7 cont’d 

para-fluoroisobutyryl fentanyl 

ortho-fluoroisobutyryl fentanyl 

meta-fluoroisobutyryl fentanyl 

cyclohexyl fentanyl 

cyclopropyl fentanyl 

cyclopentyl fentanyl 

 

125 

0501001502002500.00.10.20.30.40.50.6Relative IntensityNeutral Loss (Da)0501001502002500.00.10.20.30.40.50.6Relative IntensityNeutral Loss (Da)0501001502002500.00.10.20.30.40.50.6Relative IntensityNeutral Loss (Da)0501001502002500.00.10.20.30.40.50.6Relative IntensityNeutral Loss (Da)01002000.00.10.20.30.40.50.6Relative IntensityNeutral Loss (Da)0501001502002500.00.10.20.30.40.50.60.7Relative IntensityNeutral Loss (Da)   Figure A3.7 cont’d 

butyryl fentanyl 

isobutyryl fentanyl 

acrylfentanyl 

tetrahydrofuran fentanyl 

para-methylfentanyl 

ortho-methylfentanyl 

 

126 

01002000.00.10.20.30.40.50.60.7Relative IntensityNeutral Loss (Da)01002000.00.10.20.30.40.50.6Relative IntensityNeutral Loss (Da)01002000.00.20.4Relative IntensityNeutral Loss (Da)0501001502002500.00.10.20.30.40.50.6Relative IntensityNeutral Loss (Da)01002000.00.10.20.30.40.50.60.70.8Relative IntensityNeutral Loss (Da)01002000.00.20.40.60.8Relative IntensityNeutral Loss (Da)    Figure A3.7 cont’d 

meta-methylfentanyl 

para-methoxyfentanyl 

para-chlorofentanyl 

para-fluorofentanyl 

furanylethyl fentanyl 

α-methyl acetyl fentanyl 

127 

01002000.00.10.20.30.40.50.60.70.8Relative IntensityNeutral Loss (Da)0501001502002500.00.10.20.30.40.50.60.7Relative IntensityNeutral Loss (Da)0501001502002500.000.050.100.150.200.250.300.350.40Relative IntensityNeutral Loss (Da)01002000.00.10.20.30.40.50.6Relative IntensityNeutral Loss (Da)01002000.00.51.0Relative IntensityNeutral Loss (Da)01002000.000.050.100.150.200.25Relative IntensityNeutral Loss (Da)Figure A3.7 cont’d 

α-methyl thiofentanyl 

α-methylfentanyl 

4’-methylfentanyl 

thiofentanyl 

 

 

 

128 

01002000.000.020.040.060.080.100.120.140.160.180.20Relative IntensityNeutral Loss (Da)01002000.00.10.2Relative IntensityNeutral Loss (Da)01002000.00.10.20.30.40.50.6Relative IntensityNeutral Loss (Da)01002000.00.10.20.30.40.50.60.70.8Relative IntensityNeutral Loss (Da)REFERENCES 

129 

 
 

 

 

 

 

 

 

 

 

REFERENCES 

(1) Bonetti, J. Mass Spectral Differentiation of Positional Isomers using Multivariate 

Statistics. Forensic Chemistry. 2018, 9, 50–61. 
 

(2) Quinn, M.; Brettell, T.; Joshi, M.; Bonetti, J.; Quarino, L. Identifying PCP and Four PCP 

Analogs Using the Gold Chloride Microcrystalline Test Followed by Raman 
Microspectroscopy and Chemometrics. Forensic Science International. 2020, 307, 
110135. 

(3) Setser, A. L.; Waddell Smith, R. Comparison of Variable Selection Methods Prior to 

Linear Discriminant Analysis Classification of Synthetic Phenethylamines and 
Tryptamines. Forensic Chemistry. 2018, 11, 77–86. 

(4) Kranenburg, R. F.; Peroni, D.; Affourtit, S.; Westerhuis, J. A.; Smilde, A. K.; Asten, A. 

C. V. Revealing Hidden Information in GC–MS Spectra from Isomeric Drugs: 
Chemometrics Based Identification from 15 EV and 70 EV EI Mass Spectra. Forensic 
Chemistry. 2020, 18, 100225. 

(5) Roberson, Z. R.; Goodpaster, J. V. Differentiation of Structurally Similar 

Phenethylamines via Gas Chromatography–Vacuum Ultraviolet Spectroscopy (GC–
VUV). Forensic Chemistry. 2019, 15, 100172. 

(6) Davidson, J. T.; Jackson, G. P. The differentiation of 2,5-dimethoxy-N-(N-

methoxybenzyl)phenethylamine (NBOMe) isomers using GC retention indices and 
multivariate analysis of ion abundances in electron ionization mass spectra. Forensic 
Chemistry. 2019, 14, 100160. 

(7) Stuhmer, E.L.; McGuffin, V.L.; Waddell Smith, R. Discrimination of seized drug 

positional isomers based on statistical comparison of electron-ionization mass spectra. 
Forensic Chemistry. 2020, 20, 100261. 

(8) Cayman Chemical. Fentanyl Identification Cayman Currents. 28, Ann Arbor (2017). 

(9) Watson, J. T.; Sparkman, O. D. Introduction to mass spectrometry: instrumentation, 

applications and strategies for data interpretation; Wiley: Chichester, 2011. 

(10) Franki, R. Fentanyl Analogues an Increasing Factor in Opioid Deaths. 

https://www.mdedge.com/psychiatry/article/150493/addiction-medicine/fentanyl-
analogues-increasing-factor-opioid-deaths (accessed Jun 9, 2020) 

130 

 

 

 

 

(11) Setser, A. L. Classification of Synthetic Phenethylamines and Tryptamines using 

Multivariate Statistical Procedures [Master’s Thesis]; Michigan State University, East 
Lansing, 2019. 

(12) Stuhmer, E.L. Statistical Comparison of Mass Spectral Data for Positional Isomer 
Differentiation [Master’s Thesis]; Michigan State University, East Lansing, 2019. 

(13) Fowble, K. L.; Shepard, J. R.; Musah, R. A. Identification and Classification of 

Cathinone Unknowns by Statistical Analysis Processing of Direct Analysis in Real Time-
High Resolution Mass Spectrometry-Derived “Neutral Loss” 
Spectra. Talanta. 2018, 179, 546–553. 

(14) Moorthy, A. S.; Wallace, W. E.; Kearsley, A. J.; Tchekhovskoi, D. V.; Stein, S. E. 
Combining Fragment-Ion and Neutral-Loss Matching during Mass Spectral Library 
Searching: A New General Purpose Algorithm Applicable to Illicit Drug 
Identification. Analytical Chemistry. 2017, 89 (24), 13261–13268.

131 

4. SOFT-INDEPENDENT MODELLING OF CLASS ANALOGIES (SIMCA) FOR 

CLASSIFICATION OF FENTANYL ANALOGS ACCORDING TO STRUCTURAL 

SUBCLASS 

Soft independent modelling of class analogies (SIMCA) is a multivariate statistical 

classification method that has been applied to various forensic disciplines.1-4 It is a soft 

classification method that develops PCA models for each class individually. This work explored 

the application of SIMCA to classify fentanyl analogs according to structural subclass. As in 

Chapter 3, SIMCA models were developed to investigate the effect of three factors on the 

classification success rate: spectral variation within a peak, the effect of spectral variation over 

time, and the use of neutral loss data. The same analogs and subclasses were used in this work as 

in Chapter 3: n-alkyl chain substituted (AN) subclass, aniline ring substituted (AR) subclass, the 

amide group substituted (AG) subclass, and the amide and aniline ring substituted (AA) subclass. 

All mass spectra and corresponding discussion is in Chapter 3, Section 3.1. All models were 

optimized using leave-one-out cross validation, and the applicability of the models was tested 

using external test sets.  

4.1 INITIAL SOFT INDEPENDENT MODELLING OF CLASS ANALOGIES (SIMCA) 

MODELS TO ASSESS VARIATION WITHIN A CHROMATOGRAPHIC PEAK 

The same training set and test set was used as in Section 3.2. Unlike linear discriminant 

analysis (LDA), the number of variables used to develop SIMCA models was not limited by the 

number of samples. As such, the full mass spectrum of each analog was used to develop the 

initial SIMCA model. The apex spectrum and average spectrum were used to develop two 

separate models to assess the ability to successfully classify the analogs. As discussed in Chapter 

1, SIMCA is based on PCA models for each subclass independently. The number of principal 

132 

components (PCs) and the optimum significance (α) value were determined by the user for each 

subclass. The optimal PCA models were then compared to one another using the squared 

residual distances (Q) to each subclass. The α values were optimized once more to increase 

separation between structural subclasses while ensuring no critical boundary was large enough to 

include training set analogs of other subclasses. The conditions for each of the subclasses for the 

final optimized apex and average models are listed in Table 4.1.  

 

Table 4.1 Conditions for each subclass in SIMCA for both apex and average models 

Apex Model 
Cumulative 

Cross 

α 

Variance 

Validation 

Average Model 
Cumulative 

Cross 

# of 
PCs 

α 

Variance 

(%) 
99 
99 
78 
97 

Validation 
Success (%) 

100 
80 
100 
100 

Class 

# of 
PCs 

AA 
AR 

AG 

AN 

2 
3 
3 
2 

0.01 
0.01 
0.01 
0.05 

 

 

(%) 
99 
99 
78 
97 

(%) 
100 
80 
100 
100 

2 
3 
3 
2 

0.01 
0.01 
0.01 
0.05 

Conditions for both the apex model and the average model were the same for all 

subclasses. The α value was larger for the AN subclass than any of the other subclasses, which 

indicated more similarity among analogs in the AN subclass. The larger α value indicated a 

smaller critical limit boundary, which meant less variability among analogs in the AN subclass. 

The AR subclass was the only subclass without 100% successful cross validation. This indicated 

that the analogs within this subclass had more variability. When they were removed one by one 

133 

and applied to the model, as was done for leave-out-out cross validation, the model was 

not sufficiently representative to classify all analogs correctly. 

The overall cross validation success for both models was 95% with two spectra of para-

chlorofentanyl misclassified as members of no subclass, rather than the AR subclass (Figure 

4.1). The para-chlorofentanyl replicates were misclassified because their squared residual 

distance (Q) was higher than the critical limit for the subclass, even though their Hotelling’s T2 

distance was within the defined critical limit for this subclass. This highlights the importance of 

utilizing both of these parameters for optimal classification with a minimal number of false 

positives. The para-chlorofentanyl replicates were not misclassified in LDA; this was likely due 

to the differences in modelling between LDA and SIMCA. Subclasses were modelled against 

each other in LDA, whereas in SIMCA they were modelled individually. This difference in 

modelling caused differences in cross validation success.

134 

para-chlorofentanyl replicates 

 

Figure 4.1 Residuals plot for the AR subclass from the apex model 

 

 

The Cooman’s plots show the optimized separation among the four subclasses for the 

apex and average models (Figure 4.2). As discussed in Chapter 1, the Cooman’s plots show the 

distance to two of the subclasses plotted against one another. Each subclass has an optimal 

critical limit determined in the development of the model, in which all analogs belonging to that 

subclass fall under the critical limit and all analogs not belonging to that class fall outside the 

limit. When the Cooman’s plots were examined, all the training set analogs fall below the critical 

limit for their respective subclasses, with no samples falling below the critical limit for another 

subclass or outside the critical limit for its correct subclass. When the apex model was examined 

135 

00.1040Squared Residual Distance (Q)Hotelling's T2Training SetCross ValidationCritical Limit(Figure 4.2A-C), separation among the four subclasses was achieved. When the Cooman’s plots 

for the average model (Figure 4.2D-F) were examined, the separation among subclasses showed 

similar results and used the same parameters. Once again, the comparison of the apex and 

average models showed many similarities between the training sets and development of the 

models, so further discussion is only in reference to the apex model. 

 

 

136 

D) 

E) 

F) 

A) 

B) 

C) 

 

Figure 4.2 Cooman’s plots for the apex model A) amide and aniline ring (AA) subclass vs amide 

group (AG) subclass, B) AA vs aniline ring (AR) subclass, C) AA vs n-alkyl chain (AN) 

subclass, and Cooman’s plots for the average model D) AA vs AG, E) AA vs AR, F) AA vs AN 

 

 

137 

0505Distance to Amide Group Subclass (Q)Distance to Amide and Aniline Ring Subclass (Q)0505Distance to Aniline Ring Subclass (Q)Distance to Amide and Aniline Ring Subclass (Q)0505Distance to the n-Alkyl Chain Subclass (Q)Distance to the Amide and Aniline Ring Subclass (Q)0505Distane to the Amide Group Subclass (Q)Distance to the Amide and Aniline Ring Subclass (Q)0505Distance to the Aniline Ring Subclass (Q)Distance to the Amide and Aniline Ring Subclass (Q)0505Distance to the n-Alkyl Chain Subclass (Q)Distance to the Amide and Aniline Ring Subclass (Q)When the optimized Cooman’s plots were examined, there was clear separation among 

all subclasses. During model development, it was important to evaluate the modelling power 

plots that showed which variables contributed to each subclass. As an example, Figure 4.3 

shows the modelling power plot for the AG subclass based on apex data and Table 4.2 shows the 

variables contributing most to this subclass. The modelling power of variables are specific to 

each subclass and show how the variables in new samples will contribute to classification in that 

subclass. The limits for modelling power are 0 to +1, with +1 contributing the most to modelling. 

Variables contributing over 0.3 are considered necessary variables for modelling.5 The modelling 

power plots are only for a specific subclass, as all subclasses are modelled individually. 

Therefore, the variables that contribute to the modelling do not necessarily contribute to 

discrimination among subclasses.

138 

Figure 4.3 Modelling power plot for the AG subclass from the apex model 

Table 4.2 Variables contributing most to the AG subclass 

83 
204 
300 

158 
243 
301 

189 
244 
332 

m/z 
190 
245 
389 

200 
264 
390 

201 
297 

 

202 
299 

 

 

 

 

 

Table 4.2 shows which variables contributed most (≥80%) to the modelling of the AG 

subclass. The 80% threshold was used to visualize the variables contributing most to modelling. 

When the spectra of AG analogs were examined, it was apparent that these variables highlight 

the variability within this model. Since each model is a PCA model, it follows that the modelling 

139 

-0.2140557085100115130145160175190205220235250265280295310325340355370385400415430445Modelling Powerm/zpower would explain variability within the model. For example, many of the contributing 

variables were only present at high intensity in one of the analogs in the AG subclass, such as 

m/z 83 in cyclohexyl fentanyl. Additionally, some of the variables were isotope peaks of the 

dominant ion, for example cyclohexyl fentanyl had a high intensity peak at m/z 299 with isotope 

peaks at m/z 300 and m/z 301. The variables m/z 332, 389, and 390 were not visible in any 

spectra so it is unclear why these variables contributed to the AG subclass model. Some variables 

also showed modelling power below zero, outside the range for modelling power. This was 

potentially due to the processing of the data prior to when SIMCA was applied. The modelling 

plots and tables with the variables contributing most to the other three subclasses are shown in 

the appendix (Figure A4.1-A4.3 and Table A4.1-A4.3). 

When the test set was applied to the SIMCA models, the apex model resulted in a 55% 

correct classification rate, while the average model resulted in a 58% correct classification rate. 

The apex and average models misclassified the same 27 samples, with the apex model 

misclassifying two additional samples (total of 29). The two additional misclassifications were 

replicates of meta-fluoro methoxyacetyl fentanyl and α-methyl thiofentanyl, neither of which 

were misclassified in the initial LDA model. Spectra used in the training set were collected over 

two months, while many spectra used in the test set were collected over four months. Variation 

in spectral intensities across the four months was observed, leading to the low classification 

success rate. For example, replicates of m/z 69 in cyclopropyl fentanyl in the training set had a 

range in relative intensities of 5%; however, the range in relative intensities across the four 

months for the same m/z value was 18%. The SIMCA models were more affected by variation in 

spectral intensity because all m/z values in the spectra were used in model development, whereas 

only selected variables were used in LDA model development. Overall, the low classification 

140 

success indicated that the SIMCA model was over-trained and, therefore, not capable of 

correctly classifying new samples. This highlighted the importance of including instrument 

variation in the training set when developing SIMCA models. 

The classification success for both the apex and average models once again showed 

minimal difference between using the apex or average data. As such, the following sections 

discuss only models developed using data collected at the apex.  

4.2 REFINED SOFT INDEPENDENT MODELLING OF CLASS ANALOGIES (SIMCA) 

MODEL TO INCORPORATE INSTRUMENT VARIATION 

The training and test sets were re-defined to incorporate instrument variation into the 

model development (the refined training and test sets are described in Chapter 3, Section 3.3). As 

before, the full mass spectra were used to develop, optimize, and test the refined SIMCA model. 

In terms of model development and optimization, the optimal conditions for each subclass are 

shown in Table 4.3.  

Table 4.3 Conditions for each subclass in refined SIMCA model 

Class 

# of PCs 

α 

Cumulative 
Variance (%) 

AA 

AR 

AG 
AN 

2 
3 
3 
2 

0.01 
0.01 
0.01 
0.01 

97 
97 
78 
95 

Cross 
Validation 
Success (%) 
100 
95 
100 
100 

 

 

 

141 

The only difference between the conditions for the refined model, compared to the initial 

model, was the α value for the AN subclass. In the refined model, the α value was 0.01 (rather 

than 0.05 in the initial model). The lower α value meant a larger critical limit, allowing for 

samples with more variation to be classified to the model. With the refined model, some 

differences in the variables used to model the subclasses were observed compared to the initial 

model. Figure 4.4 shows a comparison of the modelling power plots for the AG subclass for the 

refined model (instrument variation), versus the initial model (no instrument variation). Table 

4.4 shows a comparison of the variables contributing most (≥80%) to both models. The 

modelling power plots for the three other subclasses are shown in the appendix (Figure A4.4). 

 

 

 

142 

A) 

B) 

 

Figure 4.4 Modelling power plots for the AG subclass SIMCA model A) with instrument 

variation incorporated, B) without instrument variation incorporated 

143 

-0.2140557085100115130145160175190205220235250265280295310325340355370385400415430445Modelling Powerm/z-0.2140557085100115130145160175190205220235250265280295310325340355370385400415430445Modelling Powerm/zTable 4.4 Comparison of variables contributing most the initial and refined AG subclass SIMCA 

83 
200 
243 
297 
332 

Initial Model 
158 
189 
202 
201 
245 
244 
299 
300 
390 
389 

models 

m/z 

190 
204 
264 
301 

 

111 
215 
257 
300 

 

 

 

Refined Model 
162 
204 
244 
228 
297 
258 
301 
389 

 

 

214 
245 
299 

 
 

 

The initial model had 19 variables that contributed 80% or greater to the modelling of the 

AG subclass model. When the refined model was used, 15 variables contributed 80% or greater 

to the modelling of the AG subclass SIMCA model. When the variables contributing most were 

compared, there were only eight variables common between the two models (m/z 204, 244, 245, 

297, 299, 300, 301, 389). The refined model had fewer variables contributing over 80% to the 

modelling power. The differences in variables contributing to the variation in the refined model 

was likely due to the variation (range in relative intensities between 0.4-19%) introduced by 

month-to-month instrument usage. Many of the new variables contributing to the refined model 

are present at lower intensity in the mass spectra of the AG analogs. 

The refined SIMCA model resulted in a 99% successful leave-one-out cross validation 

with only para-chlorofentanyl misclassified. The misclassification was due to a Q value that was 

higher than the critical limit for the AR subclass (Figure 4.5). The initial model misclassified 

two replicates of para-chlorofentanyl for the same reason. The larger critical limit boundary in 

the refined model, along with the additional variation accounted for by the model, allowed for 

144 

one additional replicate to classify correctly. The one para-chlorofentanyl replicate that 

misclassified had a spectrum with low relative intensities compared to the other three spectra. 

This was potentially due to a change in relative ion intensities as a result of the analog being in 

solution over time, unrefrigerated. 

  

para-chlorofentanyl replicate 

 

Figure 4.5 Residuals plot for the AR subclass in the refined SIMCA model 

 

 

When the test set was applied, the refined SIMCA model resulted in a 91% successful 

classification rate. Replicate spectra of cyclopentyl fentanyl (AG subclass) were misclassified as 

not belonging to any class. The samples had a low Hotelling’s T2 value but a large Q value for 

145 

00.25020Squared Residual Distance (Q)Hotellings T2Training SetCross ValidationCritical Limitthis class, which caused them to fall outside the critical limit boundary. The two other 

cyclopentyl fentanyl replicates followed the same trend but were just below the critical limit 

boundary, permitting correct classification. The variability from month-to-month caused two of 

the four replicates to misclassify, even though all replicates were concentrated around the critical 

limit (Figure 4.6). Select ions were examined for the range in variation among the four 

cyclopentyl fentanyl spectra and three high intensity ions (m/z 69, 105, and 146) had a range in 

relative ion intensities that varied from 11-35.5% among the spectra. 

  

 

Figure 4.6 Residuals plot for the AG subclass in the refined SIMCA model 

146 

02.5020Squared Residual Distance (Q)Hotellings T2Training SetTest SetCritical Limit4.2.1 Additional Test Sets to Validate the Classification Models 

In order to test the validity of the models, the two external test sets applied in Chapter 3 

(Section 3.3.2) were also applied to the refined SIMCA model. For Test Set 1 (non-fentanyl NPS 

compounds), all samples were classified as ‘none’.6,7 The samples did not fall within the 

boundaries of any of the fentanyl subclasses. Test Set 1 showed the benefit of SIMCA, and soft 

classification methods, that have the option to classify samples as ‘none’ (not force 

classification). For a complete unknown, SIMCA is a more conservative classification method 

than LDA. Since LDA is a hard classification method, it forced classification to one of the 

subclasses, which would be incorrect for any non-fentanyl sample. 

When Test Set 2 (case samples) was applied to the refined SIMCA model, all six samples 

were classified as ‘none’, and thus only carfentanil was correctly classified. The case samples 

were not classified correctly by SIMCA, likely because the case samples had very few variables. 

As stated previously, the low number of variables was likely due to instrument parameters and 

the concentration of the controlled substances in case samples being very low. This highlighted 

the need for concentration to be incorporated in multivariate statistical models in order to 

correctly classify a wide variety of samples. 

4.3 APPLICATION OF NEUTRAL LOSS SPECTRA FOR CLASSIFICATION OF 

FENTANYL ANALOGS 

Soft independent modelling of class analogies was applied to the neutral loss data to 

assess classification and was compared to the results obtained when mass spectral data were 

used. The 28 fentanyl analogs and replicates were divided into the same training and test sets that 

were used for the refined model. The neutral loss spectra of the fentanyl analogs are discussed in 

Section 3.4.1. The full neutral loss spectra were input into SIMCA.  

147 

The conditions for each subclass were optimized based on cross-validation success and 

are shown in Table 4.5. The conditions for the neutral loss model differed in several ways from 

the refined model (Table 4.3). First, the optimal PCA model for the AR subclass retained two 

instead of three PCs. Second, with the exception of the AG subclass, the neutral loss model 

accounted for lower cumulative variance than the refined model. Third, the neutral loss model 

resulted in a higher cross-validation classification success rate (100%) for the AR subclass than 

the refined model (95%). The AR subclass resulted in better cross validation but retained fewer 

PCs and explained less cumulative variance. This indicated that when the subclass models 

included more cumulative variance, they also included more within class variation. 

 

 

 

 

Table 4.5 Conditions for each subclass in the neutral loss SIMCA model 

Class 

# of PCs 

α 

Cumulative 
Variance (%) 

AA 

AR 
AG 

AN 

2 
2 
3 
2 

0.01 
0.01 
0.01 
0.01 

95 
77 
83 
89 

Cross 
Validation 
Success (%) 
100 
100 
100 
100 

The Cooman’s plots showed the optimized separation among the four subclasses (Figure 

4.7). There was clear separation among the four subclasses, with all analogs belonging to a 

subclass below the critical limit for that subclass and no analogs overlapping subclasses. This 

separation was to be expected because during the development of a SIMCA model the classes 

are optimized for separation. The modelling of each subclass was independent and the variables 

148 

that contributed to the modelling of each subclass are shown in the modelling power plots in the 

appendix (Figure A4.5). 

149 

A) 

B) 

 

Figure 4.7 Cooman’s plots for the A) AA subclass vs AG subclass, B) AA subclass vs AR 

subclass, and C) AA subclass vs AN subclass 

150 

0505Distance to Amide Group Subclass (Q)Distance to Amide and Aniline Ring Subclass (Q)0505Distance to Aniline Ring Subclass (Q)Distance to Amide and Aniline Ring Subclass (Q)Figure 4.7 cont’d 

C) 

 

 

151 

0505Distance to n-Alkyl Chain Subclass (Q)Distance to Amide and Aniline Ring Subclass (Q) 

The neutral loss SIMCA model had a 100% correct leave-one-out cross validation across 

all four subclasses and resulted in 87% correct classification of the test set. This model resulted 

in better cross validation but had lower correct classification of the test set (one less sample) than 

the refined SIMCA model. Three replicates of cyclopentyl fentanyl (AG subclass) were 

misclassified as ‘none’, that is, not belonging to any of the subclasses. The residuals plot for the 

AG subclass showed that the cyclopentyl fentanyl replicates had a Q value outside the critical 

limit, which caused them to be classified as ‘none’ (Figure 4.8). Although three of the four 

cyclopentyl fentanyl replicates were misclassified, the fourth replicate was very close to the Q 

critical limit, indicating this analog varied more than the other analogs in this subclass. This was 

similar to the results observed for the refined SIMCA model (Figure 4.6), in which two of the 

four cyclopentyl fentanyl replicates were outside the critical limit but all were concentrated 

around the critical limit. When the mass spectra of cyclopentyl fentanyl were visually examined, 

the obvious difference was the intensity of m/z 69 which varied from 60% of the base peak to 

95% of the base peak. Due to the manner in which the neutral loss spectra were developed, this 

variation was translated to the neutral loss data as well. This indicated that cyclopentyl fentanyl 

varied in both its mass spectrum and neutral loss spectrum more than the analogs used to train 

and develop these models. This highlighted a potential need for this class to have a more 

representative training set to account for analogs with structural variations. 

152 

 

Figure 4.8 Residuals plot for the AG subclass in the neutral loss SIMCA model 

 

 

4.4 SUMMARY OF SOFT INDEPENDENT MODELLING OF CLASS ANALOGIES 

(SIMCA) MODELS 

The initial SIMCA models resulted in a 55% correct classification rate for the apex data 

and 58% for the average data. For the initial models, the same 27 samples were misclassified, 

with an additional two samples misclassified using the apex model. As apex spectra are typically 

used and the average spectra provided minimal benefit to classification, the data collected at the 

apex were used for all subsequent model development. The initial SIMCA models demonstrated 

lower classification success than the initial LDA models described in Chapter 3. As the 

153 

01.4020Squared Residual Distance (Q)Hotelling's T2Training SetTest SetCritical Limitsubclasses were optimized individually and the full mass spectrum was used in SIMCA, the 

model was likely over-trained, resulting in poorer performance in classifying new samples than 

LDA.  

The refined SIMCA model performed with a 91% correct classification rate. The higher 

correct classification rate in the refined model highlighted the importance of incorporating 

instrument variation in model development. When Test Set 1 (non-fentanyl compounds) was 

applied, all samples were correctly classified as ‘none’, or not belonging to any of the fentanyl 

subclasses. However, when Test Set 2 (case samples) was applied, only one of six samples was 

correctly classified. The variation in the mass spectra for samples with low concentration were 

not accounted for in this model, as all analogs were prepared at a relatively high concentration (1 

mg/mL) compared to the case samples. Further optimization of this model needs to incorporate 

concentration as a factor so samples with unknown concentrations can more accurately be 

classified. 

The neutral loss SIMCA model performed with 87% successful classification. These 

results were comparable to the refined SIMCA model, with three replicates of cyclopentyl 

misclassifying instead of two cyclopentyl replicates in the refined model. As stated in Chapter 3 

Section 3.5, all neutral loss data are hypothetical and further analysis must be done to obtain 

accurate neutral loss information.  

 

 

154 

APPENDIX 

155 

Figure A4.1 Modelling power plot for the AA subclass from the apex initial SIMCA model 

 

Table A4.1 Variables contributing most to the AA subclass in the apex initial SIMCA model 

 

59 
237 

 

176 
238 

 

177 
250 

 

43 
207 
277 

45 
219 
278 

46 
220 
279 

m/z 
58 
236 
280 

 

 

 

 

 

156 

-0.2140557085100115130145160175190205220235250265280295310325340355370385400415430445Modelling Powerm/z 

 

Figure A4.2 Modelling power plot for the AR subclass from the apex initial SIMCA model 

 

 

 

 

 

 

 

 

157 

-0.2140557085100115130145160175190205220235250265280295310325340355370385400415430445Modelling Powerm/zTable A4.2 Variables contributing most to the AA subclass in the apex initial SIMCA model 

122 
142 
154 
166 
177 
190 
204 
224 
238 
261 
276 
297 
368 

 

123 
143 
155 
167 
178 
192 
206 
225 
239 
263 
277 
298 
369 

 

124 
144 
160 
168 
180 
194 
207 
226 
244 
264 
278 
313 
371 

 

95 
126 
149 
161 
172 
181 
195 
208 
232 
246 
265 
279 
314 
440 

109 
135 
150 
162 
173 
182 
196 
219 
233 
250 
273 
280 
315 
443 

110 
136 
151 
164 
175 
183 
201 
220 
234 
259 
274 
281 
352 

 

m/z 
112 
140 
152 
165 
176 
188 
203 
223 
236 
260 
275 
282 
365 

 
 

 

 

 

 

158 

Figure A4.3 Modelling power plot for the AN subclass from the initial apex SIMCA model 

 

 

 

Table A4.3 Variables contributing most to the AN subclass in the apex initial SIMCA model 

42 
158 
240 
262 

 

57 
189 
245 

 

70 
190 
246 

 

139 
204 
259 

 

146 
216 
260 

 

147 
217 
261 

 

m/z 
96 
202 
247 

 

 

 

159 

-0.2140557085100115130145160175190205220235250265280295310325340355370385400415430445Modelling Powerm/zA) 

B) 

 

Figure A4.4 Modelling power plots from the refined SIMCA model for the A) AA subclass, B) 

AR subclass, and C) AN subclass 

160 

-0.2140557085100115130145160175190205220235250265280295310325340355370385400415430445Modelling Powerm/z-0.2140557085100115130145160175190205220235250265280295310325340355370385400415430445Modelling Powerm/zFigure A4.4 cont’d 

C) 

 

161 

-0.2140567288104120136152168184200216232248264280296312328344360376392408424440Modelling Powerm/zA) 

B) 

 

Figure A4.5 Modelling power plots from the neutral loss SIMCA models for the A) AA 

subclass, B) AR subclass, and C) AN subclass 

162 

01191725334149576573818997105113121129137145153161169177185193201209217225233Modelling Powerm/z01191725334149576573818997105113121129137145153161169177185193201209217225233Modelling Powerm/zFigure A4.5 cont’d 

C) 

 

 

 

163 

01191725334149576573818997105113121129137145153161169177185193201209217Modelling Powerm/zREFERENCES 

164 

REFERENCES 

 
 
(1) Álvarez, Á.; Yáñez, J.; Contreras, D.; Saavedra, R.; Sáez, P.; Amarasiriwardena, D. 

Propellant’s Differentiation Using FTIR-Photoacoustic Detection for Forensic Studies of 
Improvised Explosive Devices. Forensic Science International. 2017, 280, 169–175. 

 

(2) Kaniu, M.; Angeyo, K. Challenges in Rapid Soil Quality Assessment and Opportunities 

Presented by Multivariate Chemometric Energy Dispersive X-Ray Fluorescence and 
Scattering Spectroscopy. Geoderma. 2015, 241-242, 32–40. 

 

(3) Pereira, J. F.; Silva, C. S.; Vieira, M. J. L.; Pimentel, M. F.; Braz, A.; Honorato, R. S. 

Evaluation and Identification of Blood Stains with Handheld NIR 
Spectrometer. Microchemical Journal. 2017, 133, 561–566. 

 

(4) Waddell, E. E.; Williams, M. R.; Sigman, M. E. Progress Toward the Determination of 

Correct Classification Rates in Fire Debris Analysis II: Utilizing Soft Independent Modelling 
of Class Analogy (SIMCA). Journal of Forensic Sciences. 2014, 59 (4), 927–935. 

 

(5) Vogt, N.; Knutsen, H. SIMCA Pattern Recognition Classification of Five Infauna Taxonomic 

Groups Using Non-Polar Compounds Analysed by High Resolution Gas Chromatography. 
Marine Ecology Progress Series. 1985, 26, 145–156. 

 

(6) Setser, A. L. Classification of Synthetic Phenethylamines and Tryptamines using 

Multivariate Statistical Procedures [Master’s Thesis]; Michigan State University, East 
Lansing, 2019. 

 

(7) Stuhmer, E.L. Statistical Comparison of Mass Spectral Data for Positional Isomer 
Differentiation [Master’s Thesis]; Michigan State University, East Lansing, 2019.

165 

5. CONCLUSIONS AND FUTURE WORK 

5.1 CONCLUSIONS 

Overall, this work aimed to explore and compare two multivariate statistical classification 

methods: linear discriminant analysis (LDA) and soft independent modelling of class analogies 

(SIMCA). Three factors were explored to validate and increase robustness of the models: within-

peak variation, instrument variation, and the use of neutral losses rather than fragment ions for 

classification. Both LDA and SIMCA showed successful results for the ability to classify 

fentanyl analogs according to structural subclass.  

When within chromatographic peak variation was investigated, the LDA apex and 

average models performed with a 98% successful classification rate. For SIMCA, the apex 

model performed with a 55% successful classification rate and the average model performed 

with a 58% successful classification rate. The apex spectrum resulted in consistent classification 

results to the average spectrum, supporting the current practice of forensic laboratories collecting 

the mass spectrum at the apex of the peak. Instrument variation was also investigated in this 

work and highlighted the need for its incorporation in statistical classification methods. The 

refined LDA model resulted in a 100% successful classification rate and the refined SIMCA 

model resulted in a 91% successful classification rate. When instrument variation was not 

accounted for (initial models), the models, particularly SIMCA, performed worse than when 

instrument variation was incorporated in the training set (refined models). 

The third factor investigated, neutral loss spectra, showed promise for an alternative way 

to develop LDA and SIMCA models. Instead of using the mass spectra, neutral loss spectra were 

used to develop the models and showed results comparable to the refined models that 

166 

incorporated instrument variation. However, it was acknowledged that the neutral loss data were 

hypothetical and only serve to demonstrate the potential for classification purposes. 

In terms of forensic application, SIMCA may be the better classification method due to 

the lower likelihood of misclassifications or false positives. Linear discriminant analysis forces 

all new samples into one of the available subclasses, while SIMCA has the ability to classify 

samples as ‘none’ or ‘both’ making SIMCA potentially more conservative, which is preferred in 

forensic applications. This advantage was observed in the current work with the SIMCA model 

correctly classifying the external data set of non-fentanyl samples (Test Set 1) as ‘none’, as 

opposed to the LDA model which classified all samples in a fentanyl subclass. Additionally, 

SIMCA has more ways to optimize the model specific to a set of data, such as changing the 

critical limit and examining the modelling power plots to assess variables within a subclass. 

Linear discriminant analysis could be modified to incorporate an “other” class or could be used 

as a tiered system to start as a general class and get more specific, as a way to circumvent the 

challenges of forced classification. 

This work demonstrated the application of LDA and SIMCA to classify fentanyl analogs 

according to structural subclasses. Through this work, instrument variation and the introduction 

of neutral loss application were explored and highlighted limitations when statistical 

classification methods were used. Additionally, a limited number of analogs was used to 

represent each subclass in the training set and only one to two analogs were representative of 

each subclass in the test set. Within subclasses there were multiple sets of isomers. It is 

acknowledged that the small data set with limited diversity must be expanded upon for 

validation. 

167 

5.2 FUTURE WORK 

 

This work can be expanded to incorporate other types of instrument variation, including 

collection over longer periods of time, collection on other Agilent gas chromatography-mass 

spectrometry (GC-MS) instruments, and collection on GC-MS instruments from various 

manufacturers. As mentioned previously, a limited data set was used. Future work can expand 

upon this data set by creating a larger training set with more representative compounds of every 

subclass, as well as expanding the test set to assess model development further. For SIMCA 

specifically, there are a lot of optimization techniques; however, they are time consuming. One 

such optimization utilizes the modelling power plots, in addition with discrimination power 

plots. The modelling power plots show how variables contribute to modelling of each class 

individually. Discrimination power plots show how the variables discriminate between two 

classes. For example, a variable that is below 0.3 on the modelling power plot may be deemed 

irrelevant to modelling;1 however, if its discrimination power is 3 or greater, the variable must be 

retained to provide differentiation between classes.2 Discrimination power plots could be used in 

future work to optimize the SIMCA model further and eliminate variables that do not contribute 

highly to modelling or discrimination between classes. 

 

A limitation to this work was highlighted when the case samples from Michigan State 

Police Forensic Science Division were applied to the models. The case samples were suspected 

to be in lower concentration than the reference standards used in the development of the models. 

Minimal classification success of the case samples showed the need for concentration to be 

incorporated as a source of variation in the training set. Future work should include analyzing 

reference standards at various concentrations, including concentrations comparable to those 

commonly observed in casework samples. Beyond refining the LDA and SIMCA models to 

168 

incorporate instrument variation, there are other multivariate statistical methods that can be 

applied that may enhance classification success for samples with constantly changing 

concentrations (such as case samples). Partial least squares discriminant analysis (PLS-DA) is a 

classification method that also incorporates linear regression into the model and could be applied 

to improve classification of samples with varying concentration, such as case samples. 

 

As previously mentioned, the potential to develop models using neutral losses rather than 

fragment ions was explored in this work. However, all neutral loss data used were hypothetical 

due to the limitations of the low-resolution mass spectral data used in this work. The fentanyl 

analogs contained varying structural substitutions, even within a subclass, so mass spectral data 

varied depending on the type of substitution. Utilization of neutral loss data requires high-

resolution mass spectral data with high mass accuracy so the chemical identity of the fragment 

ions is known and, therefore, the neutral loss identities can be determined. The analogs used in 

this work could be reanalyzed using gas chromatography-triple quadrupole-tandem mass 

spectrometry (GC-QQQ-MSMS) to obtain the high-resolution mass spectral data, as well as 

indicate of the fragmentation pattern of the analogs. Once the chemical identity of the neutral 

losses is determined, neutral losses characteristic of each subclass could be used in the 

development of new LDA and SIMCA models which may result in improved classification of 

fentanyl analogs according to structural subclass. It is predicted that the neutral losses would be 

more consistent within a subclass, which could result in better classification and separation 

between subclasses. 

 

This work has the potential to be improved upon but demonstrates the importance for this 

application. Multivariate statistical methods applied to fentanyl analogs has shown the ability to 

classify fentanyl analogs according to structural subclass. This work has the potential to be 

169 

utilized in forensic laboratories to obtain further structural information of newly synthesized 

fentanyl analogs as they appear in laboratories. 

 

 

170 

REFERENCES 

171 

REFERENCES 

 
 
(1) Vogt, N.; Knutsen, H. SIMCA Pattern Recognition Classification of Five Infauna Taxonomic 

Groups Using Non-Polar Compounds Analysed by High Resolution Gas Chromatography. 
Marine Ecology Progress Series. 1985, 26, 145–156. 

 

(2) Kucheryavskiy, S. Discrimination power plot for SIMCAM model. 

http://finzi.psych.upenn.edu/library/mdatools/html/plotDiscriminationPower.simcam.html. 
2020 (accessed July 2020). 

172