IMPROVING METHODS FOR THE ANALYSIS OF 
AMPHETAMINE
-
TYPE STIMULANTS
 
By
 
Fanny Chu
 
 
A THESIS
 
Submitted to 
 
Michigan State University
 
in partial fulfillment of the requirements 
 
for the degree of
 
Forensic Science 

 
Master of Science
 
2015 
 
 
ABSTRACT
 
IMPROVING METHODS FOR THE ANALYSIS OF 
AMPHETAMINE
-
TYPE STIMULANTS
 
By
 
Fanny Chu
 
Forensic analysis for the definitive identification of controlled substances using 
attenuated 
total re
flectance
-
Fourier transform infrared (
ATR
-
FTIR
) spectroscopy
 
is challenging for sample 
mixtures without extraction techniques. However, the application of principal components 
regression (PCR), to FTIR spectra is able to provide identification and quantifi
cation of 
controlled substances in sample mixtures in a single analysis without separation of components. 
In this study, sample binary mixtures were analyzed and used in the development of a PCR 
model. After model development, two other sets of sample mixt
ures were used to evaluate model 

accurate quantification were observed, demonstrating the potential of PCR to overcome the 
limitations of current analysis with A
TR
-
FTIR. 
 
Synthetic designer drugs have recently become an international concern, leading research to 
be directed towards alternative methods of analysis for definitive identification of these drugs. 
The com
bination of high
-
resolution mass spectrometry (HRMS) and mass defect filters is able to 
overcome the limitations of current forensic methods and enable prioritization of novel synthetic 
drugs for identification by allowing rapid classification to structura
l class. In this study, three 
different types of mass defect filters were developed using phenethylamine and cathinone 
reference standards. The potential for mass defect filters to be incorporated into a classification 
scheme to discriminate between phenet
hylamines and cathinones is demonstrated, thus allowing 
other resources to be directed towards identification of novel synthe
tic designer drugs
.
iii
 
 
ACKNOWLEDGEMENTS
 
 
I would first like to wholeheartedly thank my advisor and committee member, Dr. Ruth 

career. She has been a wonderful advisor, not only for my research, but also in
 
reading and 
rereading all the drafts of my thesis. Without her, none of this would be possible. Next, I would 
like to acknowledge Dr. Victoria McGuffin, who has been a great help to me throughout my 
research, especially in asking questions that have chall
enged me into looking at aspects of my 
research that I would not have otherwise considered. I would also like to thank Dr. Steven Dow, 
for agreeing to serve as a committee member on such short notice. My Ph.D. advisor, Dr. A. 
Daniel Jones, has also been a 
great help to me in my research by asking questions that have 
allowed me to direct my research into more interesting avenues and helping me to arrive at 
logical interpretations of my results. 
 
Finally, I would like to express my thanks to the past and curr
ent members of the Forensic 
Chemistry Group for their advice and support in these two years. Past members include John 
McIlroy, Christine Hay, and Jordyn Geiger, who have taught me so much in my first year. 
Current members include Kristen Reese, Trevor Cur
tis, Rebecca Brehe, Alex Anstett, and 
Barbara Fallon, who have all spent countless hours listening to my presentation practices 
ad 
nausea
m
. I would especially like to thank KLR and TEC for motivating me when my research 
took some unexpected turns and throu
ghout my thesis
-
writing process; without them, this thesis 
would not have been possible. Thank you all; this has been a crazy adventure in and of itself. 
 
iv
 
 
TABLE OF CONTENTS
 
 
LIST OF TABLES
 
LIST OF FIGURES
 
Part I
. Principal Components Regression for the Quantification of Controlled 
 
Substances in Sample Mixtures Based on ATR
-
FTIR Spectra
 
Chapter 
1 Introduction
 
1.1
 
Controlled Substance Analysis
 
1.2
 
Application of Principal Components Analysis (PCA)
 
1.3
 
Application of Pr
incipal Components Regression (PCR)
 
1.4
 
Non
-
s
ample Sources of Variance in Spectroscopic Data
 
1.5
 
Research Objective
 
R
EFERENCES
 
 
Chapter 
2 Theory
 
2.1 
Attenuated Total Reflectance
-
Fourier Transform Infrared (ATR
-
FTIR) 
 
Spectroscopy 
 
2.2
 
Data Pretreatment
 
2.3
 
Principal 
Components Regression (PCR)
 
2.4
 
PC Selection for Regression
 
2.5
 
Leave
-
one
-
out Cross Validation
 
R
EFERENCES
 
 
Chapter 
3 Materials and Method 
 
3.1
 
Sample Preparation and Collection 
 
3.2
 
Instrument Parameters
 
3.3
 
Data Pretreatment 
 
3.4
 
Example Spectra of Components in Sample Mixtures
 
R
EFER
E
NCES
 
 
Chapter 
4 Results and Discussion
 
4.1 Effects of Data Pretreatment 
 
4.1.1 Baseline Correction
 
4.1.2 Smoothing
 
4.1.3 Standard Normal Variate Normalization
 
4.1.4 Multiplicative Scatter Correction
 
4.1.5 Optimal Sequence of Data Pretreatment
 
4.2
 
PCA
 
4.3
 
PC 
Selection for MLR
 
4.4 
Model Performance
 
4.4.1 Multiple Linear Regression
 
4.4.2 Test Set 1
 
vii 
 
 
viii
 
 
1
 
1
 
1
 
2
 
3
 
4
 
6
 
8
 
 
1
1
 
 
1
1
 
1
6
 
1
8
 
2
1
 
2
2
 
2
4
 
 
2
7
 
2
7
 
29
 
29
 
30
 
3
4
 
 
3
6
 
3
6
 
3
6
 
42
 
4
4
 
49
 
52
 
53
 
56
 
61
 
61
 
6
7
 
v
 
 
4.4.3 Test Set 2
 

4.4.5 Summary
 
APPENDIX
 
R
EFERENCES
 
 
Chapter 
5 Conclusions and Future Work
 
5.1 Conclusions
 
5.2 Future Work
 
 
Part 
II
. Development of Mass Defect Filters for the Classification of Novel 
 
Synthetic Designer Drugs
 
Chapter 
6 Introduction
 
R
EFERENCES
 
 
Chapter 
7 Theory 
 
7.1 Gas Chromatography
-
Mass Spectrometry (GC
-
MS)
 
7.1.1 Chromatography
 
7.1.2 Gas Chromatography
 
7.1.3 Separ
ation Efficiency
 
7.1.4 Mass Spectrometry
 
7.1.4.1 Resolution in Mass Spectrometry
 
7.1.4.2 Low
-
Resolution Mass Spectrometry
 
7.1.4.3 High
-
Resolution Mass Spectrometry
 
7.2 Mass Defect
 
7.2.1 Absolute Mass Defect
 
7.2.2 Kendrick Mass Defect
 
7.2.3 Relative Mass Defect 
 
R
EFERENCES
 
 
Chapter 
8 Materials and Method
 
8.1 Sample Preparation
 
8.2 Instrument Parameters
 
8.3 Data Processing
 
8.4 Ion Selection for Mass Defect Filters
 
8.4.1 Absolute Mass Defect Filters
 
8.4.2 Kendrick Mass Defect Filters
 
8.4.
3 Relative Mass Defect Filters and Profiles 
 
 
Chapter 
9 Results and Discussion
 
9.1 Comparison of GC
-
QMS and GC
-
TOFMS Spectra for Phenethylamines
 
9.2 Comparison of GC
-
QMS and GC
-
TOFMS Spectra for Cathinones
 
9.3 Absolute Mass Defect
 
9.4 Kendrick Mass Defect
 
9.5 Relative Mass Defect
 
9.6 Classification Scheme
 
71
 
7
5
 
7
7
 
79
 
84
 
 
86
 
86
 
87
 
 
88
 
88
 
9
8
 
 
10
0
 
10
0
 
10
0
 
100
 
102
 
105
 
109
 
1
10
 
1
12
 
1
15
 
1
17
 
1
18
 
1
19
 
1
21
 
 
1
24
 
1
24
 
1
27
 
1
28
 
1
30
 
1
30
 
1
31
 
1
32
 
 
1
34
 
1
34
 
1
44
 
1
50
 
1
60
 
1
79
 
1
83
 
vi
 
 
APPENDIX 
 
R
EFERENCES
 
 
Chapter 
10 Conclusions and Future Work
 
10.1 Conclusions
 
10.2 Future Work
 
 
185
 
220
 
 
222
 
2
22
 
223
 
 
vii
 
 
LIST OF TABLES
 
 
Table 3.1. Training set sample mixtures containing amphetamine and caffeine. 
 
Table 3.2. Test Set 1 mixtures containing amphetamine and caffeine.
 
Table 3.3. Test Set 2 mixtures containing amphetamine and caffeine. 
 

res. 
 
Table 4.1. PCC values after applying data pretreatments. 
 
 
Table 4.
2

amphetamine regression.
 
 
Table 4.
3

methamphetamine regres
sion.
 
 
Table 9.1. Molecular ion filter for phenethylamines using absolute mass defect.
 
 
Table 9.2. Fragment ion filter at 
m/z
 
77 for phenethylamines using absolute mass 
defect. 
 
 
Table 9.3. Molecular ion filter for phenethylamines using Kendrick mass defec
t.
 
 
Table 9.4. List of the ions included in each fragment ion filter for the 
phenethylamine class using Kendrick mass defect. 
 
 
Table 9.5. List of the ions included in each fragment ion filter for the cathinone class 
using Kendrick mass defect.
 
 
2
8
 
 
2
8
 
 
28
 
 
28
 
 
49
 
 
7
5
 
 
7
7
 
 
1
51
 
 
1
53
 
 
1
61
 
 
1
63
 
 
1
71
 
 
viii
 
 
LIST OF FIGURES
 
 
Figure 2.1. Diagram of a Michelson interferometer in an FTIR instrument.
 
Figure 2.2. Diagram of an ATR accessory set
-
up depicting the IR light path 
generated 
via
 
reflection from the 
mirrors.
 
 
Figure 3.1. Average spectrum of caffeine from the first collection of amphetamine
-
caffeine mixtures after baseline correction, smoothing, and standard normal 
variate normalization.
 
 
Figure 3.2. Average spectrum of amphetamine from the first colle
ction of 
amphetamine
-
caffeine mixtures after baseline correction, smoothing, and 
standard normal variate normalization. 
 
 
Figure 3.3. Average spectrum of methamphetamine from the first collection of 
methamphetamine
-
caffeine samples after baseline correctio
n, smoothing, and 
standard normal variate normalization.
 
 
Figure 4.1. The effect of baseline correction in spectra as compared to raw spectra.
 
 
Figure 4.2. The effect of baseline correction observed in the PCA scores plot.
 
Figure 4.3. PC 1 and PC 2 
loadings plot corresponding to the PCA scores plot of 
raw spectra.
 
 
Figure 4.4. Mean
-
centered spectra of replicates of 60% amphetamine samples after 
baseline correction. 
 
 
Figure 4.
5
. The effect of applying an automated smoothing function that 
incorporates
 
a Savitzky
-
Golay smooth as well as block averaging on different 
regions of the spectrum.
 
 
Figure 4.
6
. The effects of SNV normalization in conjunction with baseline 
correction and smoothing.
 
 
Figure 4.
7
. Loadings plots for (a) PC 1 and (b) PC 2 associated 
with PCA scores 
plot of spectra after applying baseline correction, smoothing, and SNV 
normalization.
 
 
Figure 4.
8
. The effect of multiplicative scatter correction on spectra and PCA 
scores plot.
 
 
1
1
 
 
1
3
 
 
31
 
 
3
2
 
 
3
3
 
 
3
6
 
 
38
 
 
40
 
 
42
 
 
4
3
 
 
4
5
 
 
48
 
 
50
 
 
ix
 
 
Figure 4.
9
. (a) PC 1 and (b) PC 2 loadings plot associated with scores plot of 
spectra after applying baseline correction, smoothing, and 
multiplicative 
scatter 
correction.  
 
 
Figure 4.
10
. PCA applied to training set.
 
Figure 4.1
1
. The standard error of validation pl
otted as a function of the number of 
PCs included in the validation for amphetamine (black) and methamphetamine 
(red) mixtures. 
 
 
Figure 4.1
2
. Plot of total variance as a function of the number of PCs.
 
Figure 
4.
1
3
. PC 3 loadings 
plot
 
associated with pretreated sample mixtures in the 
training set.
 
 
Figure 
4.
1
4
. Calibration 
curves generated in (a) amphetamine regression and (b) 
methamphetamine regression using the training set.
 
 
Figure 4.1
5
. Regression vectors. 
 
Figure 4.1
6
. PCA scores
 
plot of training set (filled in circles) with Test Set 1 
plotted. 
 
 
Figure 4.1
7
. Calibration curves of training set with Test Set 1 plotted for the (a) 
amphetamine regression and (b) methamphetamine regression.
 
 
Figure 4.1
8
. Scores 
plots of training set (
filled in circles) with Test Set 2 plotted. 
 
Figure 4.1
9
. Calibration 
curve of training set with Test Set 2 for the (a) 
amphetamine regression and (b) methamphetamine regression. 
 
 
Figure 
A.
1. Loadings plots for (a) PC 1 and (b) PC 2 corresponding to the b
aseline
-
corrected spectra of the first collection of amphetamine mixtures.
 
 
Figure A.2. Loadings plots for (a) PC 8 and (b) PC 9 corresponding to the 
pretreated spectra of the training set mixtures.
 
 
Figure A.3. 
Average spectra of 80% methamphetamine mixtures in training set and 
in Test Set 1. 
 
 
Figure A.4. Average spectra of 20% amphetamine and 40% amphetamine from the 
training set as well as the average spectrum of 30% amphetamine from Test Set 
2. 
 
 
Figure 6.1. 
Core structure of (a) phenethylamine and (b) cathinone with possible 
substitution sites designated with R
n
. 
 
 
52
 
 
54
 
 
58
 
 
59
 
 
61
 
 
62
 
 
6
4
 
 
68
 
 
69
 
 
72
 
 
73
 
 
8
0
 
 
8
1
 
 
82
 
 
83
 
 
9
0
 
x
 
 
Figure 6.2. Mass spectrum obtained using GC
-
QMS. 
 
Figure 7.1. Schematic of a gas chromatograph attached to a detector.
 
Figure 
7.2
. Example 
chromato
gram of a 4
-
component mixture, displaying ideal 
separation.
 
 
Figure 7.3. Example of a peak showing fronting. 
 
Figure 
7.4
. Schematic 
of an ion source for electron ionization.
 
Figure 
7.5
. Example 
mass spectrum illustrating mass resolution.
 
Figure 7.6. Schema
tic of a quadrupole with blue and red lines indicating two 
possible trajectories of ions at the same moment in time.
 
 
Figure 7.7. Diagram of an orthogonal acceleration
-
time
-
of
-
flight (oa
-
TOF) mass 
analyzer in a high
-
resolution mass spectrometer.
 
 
Figure 
7.
8
. Structures 
of (a) 4
-
APB and (b) 5
-
APDI which have elemental formulae 
C
11
H
13
NO and C
12
H
17
N, respectively.
 
 
Figure 8.1. Structures of phenethylamines used in this research.
 
Figure 8.2. Structures of cathinones used in this research.
 
Figure 9.1. Average ma
ss spectrum of 2C
-
H acquired via (a) GC
-
QMS and (b) GC
-
TOFMS.
 
 
Figure 9.2 Proposed fragmentation pathway for 2C
-
H. 
 
Figure 9.3. Average mass spectrum of 4
-
APB. 
 
Figure 9.4. Proposed fragmentation pathway for 4
-
APB. 
 
Figure 9.5. Average mass spectrum of 5
-
MAPB. 
 
Figure 9.6. Proposed fragmentation pathway for 5
-
MAPB. 
 
Figure 9.7. Average mass spectrum of 2
-
methoxy MC obtained via (a) GC
-
QMS 
and (b) GC
-
TOFMS.
 
 
Figure 9.8. Proposed fragmentation pathway for 2
-
methoxy MC.
 
Figure 9.9. Average mass spectrum of 

-
PPP obtained via (a) GC
-
QMS and (b) 
GC
-
TOFMS.
 
9
1
 
 
101
 
 
10
2
 
 
10
5
 
 
10
6
 
 
1
1
0
 
 
1
1
1
 
 
11
4
 
 
1
18
 
 
12
4
 
 
1
26
 
 
1
35
 
 
1
37
 
 
1
39
 
 
1
41
 
 
1
42
 
 
1
43
 
 
1
45
 
 
1
47
 
 
1
48
 
xi
 
 
Figure 9.10. Proposed fragmentation pathway for 

-
PPP. 
 
Figure 9.11. Molecular ion filter for the phenethylamine class using absolute mass 
defect (82% CL, n = 3).
 
 
Figure 
9.12. Fragment ion filter for phenethylamines at 
m/z
 
77 using absolute mass 
defect (99.9% CL for n = 5).
 
 
Figure 9.13. Fragment ion filter at 
m/z
 
56 using absolute mass defect for cathinones 
(99.99% CL for n = 5).
 
 
Figure 9.14. Fragment ion filter at 
m/z
 
9
1 developed for the cathinones (99.998% 
CL, n = 5).
 
 
Figure 9.15. Absolute mass defect values of synthetic designer drugs plotted as a 
function of their exact mass. 
 
 
Figure 9.16. Molecular ion filter for 2C
-
phenethylamines using KMD (78% CL for 
n = 2).
 
 
Figure 9.17. Fragment ion Filter 2 for phenethylamines using KMD (99.9999998% 
CL, n = 16), with test sets from both classes plotted. 
 
 
Figure 9.18. Fragment ion Filter 4 for phenethylamines using KMD (99.995% CL, 
n = 11), with test sets from both classes p
lotted. 
 
 
Figure 9.19. Fragment ion Filter 6 for phenethylamines (98% CL, n = 5) using 
KMD.
 
 
Figure 9.20. Fragment ion Filter 2 for cathinones (99.99% CL, n = 11) using KMD.
 
 
Figure 
9.21. Fragment ion Filter 7 for cathinones (95% CL, n = 4) using KMD.
 
 
Fig
ure 9.22. Fragment ion Filter 9 for cathinones (98% CL, n = 5) using KMD.
 
 
Figure 9.23. Molecular ion filter for phenethylamines using RMD (82% CL, n = 3), 
with phenethylamine test set plotted.
 
 
Figure 9.24. Fragment ion profile for phenethylamines using R
MD, with fragment 
ions from the phenethylamine test set plotted. 
 
 
Figure 9.25. Fragment ion profile for cathinones using RMD, with fragment ions 
from cathinone test set plotted.  
 
 
150
 
 
1
52
 
 
1
54
 
 
1
56
 
 
1
57
 
 
1
60
 
 
1
62
 
 
1
64
 
 
1
67
 
 
1
68
 
 
1
72
 
 
1
74
 
 
1
77
 
 
1
80
 
 
1
81
 
 
1
82
 
 
xii
 
 
Figure 9.26. Diagram of a proposed classification scheme using the three types of 
mass defects for classification of novel synthetic designer drugs to the 
cathinone or phenethylamine class. 
 
 
Figure 
B.
1. Mass spectrum of 2C
-
P obtained by (a) GC
-
QMS and (b)
 
GC
-
TOFMS 
with (c) proposed fragmentation pathway. 
 
 
Figure 
B.
2. Mass spectrum of 2C
-
D obtained by (a) GC
-
QMS and (b) GC
-
TOFMS 
with (c) proposed fragmentation pathway.
 
 
Figure 
B.
3. Mass spectrum of 6
-
APB obtained by (a) GC
-
QMS and (b) GC
-
TOFMS with (c) proposed fragmentation pathway. 
 
 
Figure 
B.
4. Mass spectrum of 5
-
MAPDB obtained by (a) GC
-
QMS and (b) GC
-
TOFMS with (c) proposed fragmentation pathway.
 
 
Figure 
B.
5. Mass spectrum of
 
3,4
-
MDPA obtained by (a) GC
-
QMS and (b) GC
-
TOFMS with (c) proposed fragmentation pathway. 
 
 
Figure 
B.
6. Mass spectrum of 3
-
methyl PPP obtained by (a) GC
-
QMS and (b) GC
-
TOFMS with (c) proposed fragmentation pathway. 
 
 
Figure 
B.
7. Mass spectrum of methcathi
none obtained by (a) GC
-
QMS and (b) GC
-
TOFMS with (c) proposed fragmentation pathway. 
 
 
Figure 
B.
8. Mass spectrum of 2
-
methyl MC obtained by (a) GC
-
QMS and (b) GC
-
TOFMS with (c) proposed fragmentation pathway. 
 
 
Figure 
B.
9. Mass spectrum of 3
-
MEC obtained 
by (a) GC
-
QMS and (b) GC
-
TOFMS with (c) proposed fragmentation pathway. 
 
 
Figure 
B.
10. Mass spectrum of pyrovalerone obtained by (a) GC
-
QMS and (b) GC
-
TOFMS with (c) proposed fragmentation pathway. 
 
 
Figure 
B.
11. Mass spectrum of mephedrone obtained by (a)
 
GC
-
QMS and (b) GC
-
TOFMS with (c) proposed fragmentation pathway. 
 
 
Figure 
B.
12. Structure of cocaine.
 
Figure 
B.
13. Structures of the (a) 
m/z
 
56, (b) 
m/z
 
77, and (c) 
m/z
 
91 ions common to 
cathinone training set standards. 
 
 
Figure B.
14. Fragment ion Filter
 
1 developed for the phenethylamine class using 
KMD.
 
 
1
84
 
 
1
86
 
 
1
88
 
 
1
90
 
 
1
92
 
 
1
94
 
 
1
96
 
 
198
 
 
200
 
 
202
 
 
204
 
 
206
 
 
208
 
 
209
 
 
210
 
 
xiii
 
 
Figure B.
15. Fragment ion Filter 3 developed for the phenethylamine class using 
KMD.
 
 
Figure B.
16. Fragment ion Filter 5 developed for the phenethylamine class using 
KMD.
 
 
Figure B.
17. Fragment ion 
Filter 7 developed for the phenethylamine class using 
KMD.
 
 
Figure B.
18. Fragment ion Filter 1 developed for the cathinone class using KMD.
 
 
Figure B.
19. Fragment ion Filter 3 developed for the cathinone class using KMD.
 
 
Figure B.
20. Fragment ion Filter 4
 
developed for the cathinone class using KMD.
 
 
Figure B.
21. Fragment ion Filter 5 developed for the cathinone class using KMD.
 
 
Figure B.
22. Fragment ion Filter 6 developed for the cathinone class using KMD.
 
 
Figure B.
23. Fragment ion Filter 8 developed fo
r the cathinone class using KMD.
 
 
2
11
 
 
2
12
 
 
2
13
 
 
2
14
 
 
2
15
 
 
2
16
 
 
2
17
 
 
2
18
 
 
219
 
 
1
 
 
Part I. Principal Components Regression for the Quantification of Controlled Substances in 
Sample Mixtures Based on ATR
-
FTIR Spectra
 
 
Chapter 
1 Introduction
 
1.1 Controlled Substance Analysis
 
Controlled substances, such as 
methamphetamine and marijuana, are commonly seen in 
forensic laboratories. Drug seizures,
 
including methamphetamine, marijuana, cocaine, and 
heroin, by the Drug Enforcement Agency (DEA) alone range from thousands of kilograms to 
tens of thousands of kilogr
ams annually 
(1)
, which are then sent to the laboratories for 
analysis
 
and identification
. Furthermore, as 
recently
 
as 2013, drug
-
related deaths are among the top 10 
causes of death in the 
United States 
(2)
 
and arrests for 
drug abuse violations
 
are upwards 
of 1.5 
million, with over 80% of arrests for possession of controlled substances
 
(3)
. From th
ese 
s
tatistics, it is apparent that the use and abuse of controlled substances is a national concern. 
 
As such, the analysis and identification of controlled substances in forensic laboratories is 
crucial. The Scientific Working Group for 
the Analysis of Seized
 
Drugs
 
(SWGDRUG) has 
established recommended guidelines for the identification of controlled substances 
(4)
. The 
analytical techniques detailed in the guidelines are split into three categories: Categories A, B, 
and C, which correspond to confirmatory, selective, and pres
umptive tests, respectively. Of the 
three categories, Category A methods are of primary interest, as these are the techniques that 
allow forensic scientists to definitively identify the controlled substances in submitted samples 
(4)
. In particular, mass spectrometry and i
nfrared spectroscopy are two common methods used in 
controlled substance analyses. 
 
Mass spectrometry is often used in conjunction with gas chromatography, which is a 
SWGDRUG Category B technique
.
 
T
he combination of the two methods fulfill
s
 
SWGDRUG 
guideli
nes of controlled substance analysis, as the recommendations suggest a combination of a 
2
 
 
Category A and a technique from Category B or C, or any two uncorrelated Category B 
techniques and one from Category C in the absence of a Category A technique 
(4)
. The 
combination of 
the two techniques is known as GC
-
MS and it is able to confirm the identity 
controlled substances by separating out different components in mixtures and identifying the 
components based upon mass and chemical information. A complementary technique to mass 
spectrometry is infrared spectroscopy, and more specifically, Fourier transform infrared 
spectroscopy. For ease of analysis, the majority of forensic laboratories utilize attenuated total 
reflectance
-
Fourier transform infrared spectroscopy (ATR
-
FTIR).
 
ATR
-
FTIR is a rapid and non
-
destructive method that is commonly used for preliminary 
analysis and screening of controlled substances in solid samples. It requires minimal sample 
preparation and is a high
-
throughput technique with spectroscopic output that is r
ich in chemical 
information. However, due to the nature of the analysis, definitive identification of the controlled 
substance(s) in street samples can be challenging. Street samples most often contain a multitude 
of adulterants and diluents that complicat
e the spectra and subsequent extraction and purification 
schemes of the samples are necessary. 
 
1.2 Application of Principal Components Analysis (PCA)
 
Despite these challenges, the complex spectra provide not only qualitative but quantitative 
information t
hat can be extracted using multivariate statistical procedures. One of the most 
common procedures is principal components analysis (PCA), which is an exploratory approach 
that displays patterns and trends among complex samples using only a few dimensions. 
 
PCA has been widely used in forensic science to successfully discriminate ballpoint pen inks 
of similar color 
(5)
, paint samples of similar color 
(6)
, tablets containing illicit drugs 
(7)
, and 
cocaine mixtures 
(8)
. While PCA is strictly a qualitative approach to extracting information in 
3
 
 
complex samples, there exists other multivariate statistic
al procedures that are able to extract 
quantitative information, such as principal components regression (PCR), that are forensically 
relevant. 
 
1.3 Application of Principal Components Regression (PCR)
 
PCR is a two
-
pronged approach that combines PCA and mu
ltiple linear regression. The 
relationship of two matrices, typically designated as X and Y, can be determined and displayed 
as a calibration curve. The X matrix consists of independent or predictor variables, such as 
spectroscopic data across a given freq
uency range for many different samples, and the Y matrix 
contains a set of dependent variables such as concentration values for the given samples. Once 
the relationship between X and Y is determined 
via
 
multiple linear regression, the calibration 
curve can
 
then be used to predict the concentration of a questioned sample based on the spectrum 
of that sample
 
(9)
. 
 
PCR is widely used in the food and environmental industries for quality control purposes. 
Models to quantify the analytes of interest, such as methanol and water content in biodiesel 
(10)
, 
protein content in wheat samples 
(11)
, and various soil properties 
(12)
 
were developed; the 
models were able to accurately quan
tify the analytes of interest with minimal prediction error (< 
5%). These studies have shown that the quantification of multiple components in a single 
analysis with minimal error is possible. However, these studies used near
-
infrared reflectance 
spectrosc
opic data in conjunction with PCR, which is not as widely used in forensic laboratories. 
 
A few
 
studies have shown the potential of PCR in quantifying
 
controlled substances in 
sample mixtur
es
 
for forensic purposes based on the ATR
-
FTIR technique
. 
For examp
le, 
Goh 
et 
al.
 
developed a PCR model to predict the concentrations of methamphetamine in 
simulated 
mixtures analyzed by ATR
-
FTIR 
(1
3)
. 
The PCR model was successful in accurately predicting 
4
 
 
methamphetamine concentrations in mixtures that also contained glucose and caffeine, with 
prediction error ranging 
from 3% to 6%. However
, the scope of the study was limited to one 
controlled sub
stance, indicating that the identity of the controlled substance had to be known 
a 
priori
 
(13)
.
 
In the event of a complex mixtur
e containing multiple controlled substances or if the 
identity of the controlled substance is not known, quantification with the model developed by 
Goh 
et al.
 
may become challenging. 
 
Penido 
et al.
 
also developed a PCR model for the quantification of contr
olled substances in 
simulated samples 
(14)
. The focus of the study was on cocaine mixtures analyzed by two 
different methods: Raman spectroscopy and FTIR spectroscopy. Despite the success of the 
model, the study concluded that spectroscopic data obtained by Raman spectroscopy resulted in a 
better 
model with higher prediction ability 
(14)
. The limitation of this study is the application to 
forensic laboratories, as most forensic laboratories do not
 
utilize Raman spectroscopy as a 
technique for definitive identification. Instead, FTIR spectroscopy is more widely used. Also, 
similar to the study by Goh 
et al.
, the scope of the study was limited to one controlled substance. 
 
Thus, it is advantageous to
 
develop a PCR model that is able to identify and quantify 
multiple controlled substances in sample mixtures in a single analysis with minimal error, and to 
investigate the inclusion of a wide range of controlled substances in the model. 
 
1.4 Non
-
sample So
urces of Variance in Spectroscopic Data
 
Prior to performing any multivariate statistical procedure, it is imperative that only chemical 
information is considered and that the data are not dominated by non
-
sample sources of variance. 
This is because small d
ifferences among samples can be extracted using multivariate statistics 
even though these differences may not be chemically meaningful. To that end, data pretreatments 
are used as preprocessing tools in order to minimize artifacts that may arise from instr
ument 
5
 
 
variation and ambient conditions. Those artifacts that are relevant to spectroscopic data include 
sloping baselines, instrument noise, differences in total signal due to variation in detector 
response, and signal reduction due to scattering.
 
Sloping 
baselines in spectroscopic data may indicate the presence of water or water vapor. 
The 

OH stretch in water molecules results in vibrations around the 3600 

 
3400 cm
-
1
 
range, and 
manifests itself as a broad peak. As ATR
-
FTIR is performed under ambient cond
itions, the 
sample may contain some water vapor 
that is 
absorb
ed
 
from
 
the atmosphere during sample 
collection. But because exposure time of the sample to the atmosphere is fairly short, water 
vapor absorption is minimal, resulting in a shapeless peak that 
creates a sloping baseline in the 
higher wavenumber region of the spectrum. While this phenomenon indicates the presence of 
water, it is still a non
-
sample source of variance since it is not inherent to the sample itself 
(
i.e.
 
the sample is not hygroscopic
) 
and thus, does not provide chemical information about the 
sample. This non
-
sample variance can be reduced using baseline correction methods. 
 
I
nstrument noise
, which manifests itself as small irregular spikes that resemble minute peaks 
in a 
spectrum
, is 
a
nother example of a source of non
-
sample 
variance
. Typically, this 
phenomenon is due to the inability to fully r
educe background noise and 
display 
only 
signals. 
Background noise is more apparent in the baseline where minimal signal is present as opposed t
o 
peaks, since the signal
-
to
-
noise ratio is substantially higher in peaks. Smoothing the data is 
generally used to reduce background noise with the goal of improving the signal
-
to
-
noise ratio.
 
 
Total s
ignal across the entire frequency range can
 
vary
 
as the
 
detector response can differ 
from sample to sample. In these cases, the total signal in each spectrum is different. While this 
may be expected for samples that are chemically different, replicates of the same sample 
theoretically should have the same tota
l area as they express the same chemical information. To 
6
 
 
correct for differences in detector response, normalization is performed, which re
-
scales the data 
so that the spectra are more comparable. The re
-
scaling across the frequency range does not 
affect t
he peak pattern; instead, the magnitude of the re
-
scaling is similar, if not equivalent, 
across the frequency range.   
 
Particle size and differences in light scatter between samples are a major source of non
-
sample variance, particularly in solid samples.
 
The amount of light scatter may vary from sample 
to sample and this variation may be compounded by the differences in particle size in a solid 
sample. Light scatter reduces the efficiency of light transmission to the detector, resulting in 
signal reductio
n. To correct for these two phenomena, scatter correction is often used. After 
scatter correction, improvement in signal intensity is typically observed. 
 
1.5 Research Objective
 
The objective in this research was to develop a PCR model to identify and quantify the 
controlled substances present in simulated sample mixtures analyzed by ATR
-
FTIR. 
These 
sample mixtures contained amphetamine and caffeine or methamphetamine and caffeine
 
at 
different concentrations. 
The first goal was to investigate a series of data pretreatment procedures 
and determine the optimal set of procedures prior to performing PCR. This is necessary to ensure 
that non
-
sample sources of variance are substantially 
reduced and that non
-
sample variance will 
not dominate the model. The second goal was to determine the optimal number of PCs for 
regression by investigating different methods of PC selection. Once the regression was 
performed, the third goal was to evaluat
e performance of the PCR model in quantifying the 
controlled substance present in a test set of sample mixtures. With this PCR model, both 
identification and quantification are performed
 
in a single analysis.
 
7
 
 
As ATR
-
FTIR is a rapid technique that provides 
definitive identification of controlled 
substance with minimal sample preparation, it is advantageous to apply this method for both 
identification and quantification of controlled substances using multivariate statistics. By 
applying the PCR model to FTIR 
spectra, both identification and quantification of controlled 
substances in sample mixtures can be achieved in a single analysis without the need to perform a 
separation. The use of PCR also increases the confidence and objectivity of the analysis, as a 
kn
own error rate derived from the model can be assigned to the resulting output.  
 
 
8
 
 
REFERENCES
 
 
9
 
 
R
EFERENCES
 
 
1.
 
DEA. Statistics and Facts. DEA.gov; 2015 [updated 2015; cited]; Available from: 
http://www.dea.gov/resource
-
center/statistics.shtml#seizures
.
 
 
2.
 
Statistics CNCfH. Detailed Tables for the National Vital Statistics Report (NVSR) 

 
3.
 
FBI. Crime in the U.S. 2013: Estimat
ed Number of Arrests. FBI; 2015 [updated 2015; 
cited]; Available from: 
https://www.fbi.gov/about
-
us/cjis/ucr/crime
-
in
-
the
-
u.s/2013/crime
-
in
-
the
-
u.s.
-
2013/tables/table
-
29/table_29_estimated_number_of_arrests_united_states_2013.xls
.
 
 
4.
 
SWGDRUG. SWGDRUG 
Recommendations Version 7
-
0,  Contract No.: Document 
Number|.
 
 
5.
 
Senior S, Hamed E, Masoud M, Shehata E. Characterization and Dating of Blue 
Ballpoint Pen Inks Using Principal Component Analysis of UV

Vis Absorption Spectra, IR 
Spectroscopy, and HPTLC. Jo
urnal of Forensic Sciences. 2012;57(4):1087
-
93.
 
 
6.
 
Muehlethaler C, Massonnet G, Esseiva P. The application of chemometrics on Infrared 
and Raman spectra as a tool for the forensic analysis of paints. Forensic Science International. 
2011 6/15/;209(1

3):173
-
82.
 
 
7.
 
Romão W, Lalli P, Franco M, Sanvido G, Schwab N, Lanaro R, et al. Chemical profile of 
meta
-
chlorophenylpiperazine (m
-
CPP) in ecstasy tablets by easy ambient sonic
-
spray ionization, 
X
-
ray fluorescence, ion mobility mass spectrometry and NMR. Anal B
ioanal Chem. 2011 
2011/07/01;400(9):3053
-
64.
 
 
8.
 
Marcelo MCA, Mariotti KC, Ferrão MF, Ortiz RS. Profiling cocaine by ATR

FTIR. 
Forensic Science International. 2015 1//;246(0):65
-
71.
 
 
9.
 
Massy WF. Principal Components Regression in Exploratory Statistical R
esearch. 
Journal of the American Statistical Association. 1965;60(309):234
-
56.
 
 
10.
 
Felizardo P, Baptista P, Menezes JC, Correia MJN. Multivariate near infrared 
spectroscopy models for predicting methanol and water content in biodiesel. Analytica Chimica 
A
cta. 2007 7/9/;595(1

2):107
-
13.
 
 
11.
 
Mahesh S, Jayas DS, Paliwal J, White NDG. Comparison of Partial Least Squares 
Regression (PLSR) and Principal Components Regression (PCR) Methods for Protein and 
Hardness Predictions using the Near
-
Infrared (NIR) Hypers
pectral Images of Bulk Samples of 
Canadian Wheat. Food Bioprocess Technol. 2015 2015/01/01;8(1):31
-
40.
 
 
10
 
 
12.
 
Chang C
-
W, Laird DA, Mausbach MJ, Hurburgh CR. Near
-
Infrared Reflectance 
Spectroscopy

Principal Components Regression Analyses of Soil Properties. S
oil Science 
Society of America Journal. 2001;65(2):480
-
90.
 
 
13.
 
Goh CY, van Bronswijk W, Priddis C. Rapid nondestructive on
-
site screening of 
methylamphetamine seizures by attenuated total reflection Fourier transform infrared 
spectroscopy. Applied spectro
scopy. 2008;62(6):640
-
8.
 
 
14.
 
Penido CAFdO, Silveira L, Pacheco MTT. Quantification of Binary Mixtures of Cocaine 
and Adulterants Using Dispersive Raman and Ft
-
Ir Spectroscopy and Principal Component 
Regression. Instrumentation Science & Technology. 2012;4
0(5):441
-
56.
 
 
11
 
 
Chapter 
2 Theory
 
 
2.1 Attenuated Total Reflectance
-
Fourier Transform Infrared (ATR
-
FTIR) Spectroscopy
 
 
Attenuated total reflectance
-
Fourier transform infrared (ATR
-
FTIR) spectroscopy is a rapid 
and non
-
destructive technique that enables identification of molecules based on the vibrational 
motions elicited by these molecules under irradiation by infrared (IR
) light. The ATR technique 
combines total internal reflection within a crystal and conventional FTIR spectroscopy in order 
to generate a spectrum containing peaks of various intensities that are characteristic of the 
vibrational motions of different functi
onal groups. Through the identification of functional 
groups in a spectrum, the identity of a molecule can be determined. 
 
Molecules undergo irradiation 
via
 
a source lamp that emits a broad spectrum of light, 
especially in the infrared region
 
(10 

 
13,333
 
cm
-
1
)
. The light then enters into the Michelson 
interferometer (
Figure 
2.
1)
 
(1)
. 
 
 
Figure 2.1. Diagram of a Michelson interferometer in an FTIR instrument. 
 
12
 
 
The broad light 
beam
 
travels from the source lamp to a partially reflective mirror known as 
the beamsplitter, which splits 
the light beam into 
two. One beam of light passes through the 
beamsplitter to a completely reflective mirror while the other is reflected orthogonall
y to another 
reflective mirror. Of the two reflective mirrors, one is stationary while the other is a moving 
mirror that is set to continuously traverse a certain distance at a fixed rate. The two light beams 
are reflected from the two mirrors to recombine
 
at the beamsplitter 
(2)
. Because the distance 
from the moving mi
rror to the beamsplitter may not be the same as the distance from the 
stationary mirror to the beamsplitter, different patterns of constructive and destructive 
interferences occur, generating superposed waves 
at
 
characteristic
, 
discrete 
frequencies. The 
sp
ectrum that results from the superposed waves is known as the interferogram
 
(2)
.  
 
One of the most noticeable features in an interferogram is the oscillating wave with the 
highest amplitude. This point is known as the centerburst, and is the result of the superposition of 
two waves that are completely in phase. At this 
point, only constructive interference occurs, and 
is due to the distance from the stationary mirror to the beamsplitter being equivalent to that from 
the moving mirror to the beamsplitter. The distance at which this occurs is known as the zero
-
path differe
nce (ZPD) 
(3)
. ZPD is an impor
tant parameter that is necessary for alignment in the 
instrument prior to data collection. 
 
The recombined IR beam is then introduced to the sample, which is placed on top of the 
crystal, through a total internal reflection mechanism. Total internal reflec
tion is achieved due to 
differences in refractive index between the air and the ATR crystal itself, which is 
illustrated in 
Figure 
2.
2. 
Examples of ATR crystals include diamond and zinc selenide (ZnSe) 
(4)
. These 
materials are dense and have high refractive indices. Different materials in ATR crystals have 
different properties and are available depending on the type of analysis. Zinc selenide is a 
13
 
 
relatively soft material and is suited
 
for analysis of liquids and oils while diamond is a more 
robust material ideal for powders and other solids. For universal application, ATR crystals with a 
mixture of diamond and zinc selenide are advantageous 
(4)
. 
 
 
Figure 2.2. Diagram of an ATR accessory set
-
up depicting the IR light path generated 
via
 
reflection from the mirrors. 
 
 
As the IR beam is reflected from the mirrors towards the crystal, the difference in refractive 
indices at 
the first air
-
crystal boundary results in partial refraction and partial reflection of the 
incident light 
(3)
. However, as the refracted light beam comes into contact with the crystal
-
air 
boundary, which is also the point at which the sample contacts the crystal surface, at a certain 
angle, the light is completely reflected. Two criteria must be met in order for 
total internal 
reflection to occur. First, the angle of the incident light must be greater than the critical angle 
needed to achieve reflection. Second, the medium at the other side of the boundary must have a 
smaller refractive index than the medium throu
gh which incident light passes 
(5)
. 
 
The generation of total internal reflection also creates an evanescent wave
 
at the crystal
-
air 
boundary. This wave sits at the boundary and when a sample is in contact with the crystal
, the 
wave penetrates into the sample. The depth of penetration is generally 
below 10 mm
; in order to 
ensure that the sample can be irradiated with IR light, good contact must be maintained between 
14
 
 
the sample and the ATR crystal. This is especially importa
nt for solid samples, due to packing 
and density differences that exist in these samples. A pressure arm can be used to apply a certain 
amount of force on the sample to ensure good contact 
(4)
.  
 
When the evanescent wave penetrates into the sample, the molecules in the sample absorb 
some of the light at various wavelengths and are then excited due to the excess energy gained. 
The excitation manifests itself as molecular vibrations, suc
h as bond stretching and bending. The 
IR beam not absorbed will then pass through the sample, and the photons in the beam can be 
detected and counted with a detector, which is usually a photomultiplier tube 
(3)
. At this point, 
the interferogram of the transmitted spectrum is generated.
 
The interferogram is then sent to the 
computer for processing, and this is where Fourier transform can occur, in order to reduce the 
complexity associated with an interferogram. Fourier transform is the conversion from the time 
domain back to the frequenc
y domain. As the interferogram is that of the intensity of 
wavelengths for given frequencies in the time domain, it can quickly be converted into the 
frequency domain so that the independent variable is frequency rather than time. This simplifies 
the spect
rum drastically, and the output is a spectrum where the percent of transmitted light, or 
transmittance, is plotted as a function of frequency, or wavenumber. 
 
While the transmittance is directly measured by the detector, the measurement that is most 
useful
 
is absorbance, which is a quantitation of the amount of light absorbed. Since the two 
functions are related through the following relationship (Eq. 2.1) 
(2)
:
 
 
,
 
 
(2.1)
 
where A is the absorbance and T is the transmittance, the spectrum obtained can easily be 
converted to displ
ay the absorbance intensity at each wavenumber. It is assumed that all light 
15
 
 
that is not transmitted is absorbed by the analyte. Absorbance information is more useful due to 

Eq. 2.2:
 
         
(2.2)
 
 
where absorbance is dependent on 

, the molar absorptivity, b, the pathlength, and c, the 
concentration of the analyte. This equation illustrates the linear relationship between 
concentration and
 
the amount of light absorbed for a given pathlength and known absorptivity of 
the analyte 
(3)
. 
 
Some limitations of FTIR spectroscopy include small molar absorptivity values and 
variations in pathlengths, which subsequently affect absorbance measurements. Since molar 
absorptivity is i
nherent to an analyte for a given wavelength, this property cannot be optimized to 
yield larger absorbance values 
(2)
. Therefore, the only parameter that can be optimized is the 
pathlength. Due to the direct proportional relationship between pathlength and absorbance, larger 
pathlengths are desired. This can be achieved
 
using ATR
-
FTIR spectroscopy, in which multiple 
internal reflections increase the pathlength from that of traditional FTIR spectroscopy. However, 
this operates under the assumption that all incident light is completely reflected at each internal 
reflection
 
and that no light is lost in this process. Depending on the physical properties of the 
sample, light scattering can occur at each internal reflection point. In this case, scattering may be 
compounded by having multiple internal reflections and an ATR crys
tal that only utilizes a one
-
bounce reflection may be advantageous. The compromise with using a crystal with less total 
internal reflection is the loss in sensitivity and the reduced pathlength, which results in lower 
absorbance measurements. Despite these
 
limitations, the rapid analysis and minimal sample 
preparation associated with ATR
-
FTIR spectroscopy make it a useful technique for the analysis 
of solid samples. 
 
16
 
 
2.2 Data Pretreatment 
 
Data pretreatment procedures are performed on various types of data 
post
-
acquisition to 
minimize non
-
sample sources of variance while still retaining all of the chemical information. 
These procedures include baseline correction, smoothing, normalization, and scatter correction 
for spectroscopic data.   
 
Baseline correction as applied in spectroscopic data is useful in reducing sloping baselines, 
which is a common phenomenon. A typical baseline correction involves the subtraction of a 
mathematical function from the original data. The mathematical function 
chosen is data
-
dependent, since different baseline patterns exist for different types of data. For sloping 
baselines, a polynominal function, such as a quadratic function, is fitted to the original spectrum 
and is then subtracted 
(6)
. The function can be designed to target certain portions of the spectrum 
so that regions containing complex chemical information are not as heavily affected by the 
correction. This requires a function that can be adapted to tar
get different regions given the 
original spectrum. With this method, a knowledge of the baseline region for each spectrum is 
necessary. The first
-
derivative of the original spectrum can convey this information. In first
-
derivative spectra, tangents with ma
gnitudes close to zero indicate the baseline region whereas 
slopes with large magnitudes in the positive or negative directions indicate the presence of peaks 
that convey chemical information. Thus, a two
-
step process involving a first
-
derivative with a 
qu
adratic function fitting is appropriate and advantageous for spectroscopic data. 
 
Smoothing is normally performed in order 
to 
reduce noise in the data, either in the peaks or 
in the baseline. Different algorithms can be applied to achieve various levels of smoothing, 
depending on the extent of the noise observed in the data. A common smoothing algorithm is the 
Savitzky
-
Golay filt
er 
(6)
. 
This algorithm utilizes moving windows across a spectrum. A window 
17
 
 
contains a set number of data points in the spectrum, and the window is moved along the 
spectrum so that the smoothing occurs in e
ach window 
(7)
. 
U
sing the least
-
squares method, a 
polynomial function is fit to the 
center point in the moving 
window so that the error between the 
function and the data points in that window is min
imized, and this process continues across 
the 
spectrum with the moving windows 
until the entire spectrum is fit. These fitted functions then 
replace the original spectrum as the smoothed spectrum 
(7)
. The block averaging, or moving 
average, smoothing algorithm is used as well. This type of smooth is the result of fitting 
the 
center point of 
a 
moving 
window of data points with a straight line rather than with different 
polynomial functions 
(8)
. A combination of the two smoothing algorithms is advantageous as the 
smooth can be adapted to different regions of the spectrum. 
 
Normalization is a data pretreatment procedure that focuses on removing between
-
sample 
variance ar
ising from signal differences between replicates. For example, two replicates of a 
sample can vary in total signal. While the peak pattern is maintained, instrument variation and 
differences in detector response may lead to spectra exhibiting this phenomen
on. Although 
different normalization procedures can be applied to correct for this, standard normal variate 
(SNV) normalization is commonly applied to spectroscopic data 
(9)
. This no
rmalization 
procedure scales the spectrum in a manner that is similar to a z
-
score calculation at each 
independent variable. Each response at an independent variable 
in the X matrix 
is scaled using 
the following function: 
 
 
(2.3)
 
where 
x
n,
i
 
is the scaled response at the 
i
th variable as a result of SNV normalization, 
x
i
 
the 
original response at the 
i
th variable, 
is the average response across the independent variables, 
18
 
 
and 
s
 
is the standard deviation of the responses across the independent variables. Thus, the 
scaling is data set
-
independent, since the scaling factor varies from sample to sample 
(9)
.   
 
Scatter correction is used to remove variation in samples due to differences in light scatter, 
which changes the efficiency of light transmission. A common algorithm to minimize such 
differenc
es is multiplicative scatter correction (MSC) 
(10)
. In this procedure, an average 
spectrum, also known as the reference spectrum, is determined across all spectra in a data set (
i.e.
 
the X matrix). Each 
spectrum is then regressed against the reference spectrum to determine the 
multiplicative factor between the two spectra so that the product of the reference spectrum and 
the factor is equivalent to the measured spectrum. The measured spectrum is then norm
alized to 
the multiplicative factor
, which is a vector
. An average value for the reference spectrum is 
calculated and the sum of the average and the normalized spectrum results in the corrected 
spectrum 
(10
)
. As a result, MSC is a data set
-
dependent method as the degree of correction 
performed is heavily influenced by other spectra in the data set. 
 
2.3 Principal Components Regression (PCR)
 
Principal components regression (PCR) is a multivariate statistic
al procedure that enables the 
quantification of analytes of interest in complex samples through a combination of principal 
components analysis and multiple linear regression 
(11)
. 
 
Principal components analysis (PCA) is an exploratory multivariate statistical procedure that 
can associate and/or discriminate complex 
sa
mples based on variance
 
(12)
. PCA 
is the first step 
that is performed in PCR, and it is performed primarily to reduce the dimensions and the 
variables in the input data, or X matrix. For examp
le, a spectrum generated 
via
 
Fourier transform 
infrared spectroscopy in the mid
-
IR range will typically contain around 3000 independent 
variables since there is a signal associated with each wavenumber (or independent variable). 
19
 
 
However, not all 3000 varia
bles contribute or influence a sample equally, and therefore, it is 
necessary to reduce the number of variables to a substantially smaller number that can still 
provide chemically meaningful information. PCA does this by grouping independent variables 
that
 
covary linearly into a principal component (PC) 
(12)
. PCs are vectors that describe a set of 
independent variables and the main constraint is that each PC is orthogonal to the preceding one 
i
n matrix space. Therefore, all PCs are uncorrelated to each another.
 
The eigenvalue of each PC, or in broader terms, the variance accounted for by each PC, is 
determined by decomposition of the X matrix, which is detailed below 
(12)
. Variance can be 
thought of as the magnitude component of a vector. PCs are vectors that have a magnitude and 
direction; each PC extends out in multidimensional space by the magnitude specified in the 
variance. Vari
ance among the samples in the data set exist due to different responses of the 
samples to the independent variables. By decomposing the independent variables into PCs, the 
variance corresponding to a set of variables described by a PC is quantified in that
 
PC. In PCA, a 
common algorithm for the decomposition is singular value decomposition (SVD). This algorithm 
determines singular values that become the variance associated with each PC 
(13)
. Singul
ar 
values are determined with this relationship:
 
X = USV
T
 
 
(2.4)
 
where U is the matrix that contains eigenvectors in the rows, S is the diagonal matrix that 
contains the singular values, and V is the matrix that contains the eigenvectors in the colu
mns.   
 
As observed in the above equation, eigenvectors are also found during the decomposition, 
and these vectors are the PCs themselves. Each vector describes a set of variables that covary 
linearly and all vectors are positioned orthogonally in multidim
ensional space 
(12)
. 
 
In the simplest sense, the application of PCA to an X matrix is described by Eq. 2.5 
(12)
: 
 
20
 
 
X = TL
T
 
 
(2.5)
 
where T is the scores matrix and L
T
 
is the transposed matrix of the loadings. The loadings matrix 
contains loadings, or the weights and direction of each independent variable, for each PC. The 
scores matrix contains the scores of each sample for 
each eigenvector, or PC; a score is 
determined as the summation of the multiplicative effects of the loadings and the vector of the 
mean
-
centered X matrix at each independent variable for a PC 
(1
2)
. 
 
SVD can be used as the decomposition method in PCA with relative ease since the following 
relationships hold: 
 
           
L 

 
V and
 
 
(2.6)
 
           
T 

 
US
 
 
(2.7)
 
Once the X matrix is decomposed into its components, multiple linear 
regression can be 
performed on the data set in PCR. The goal of multiple linear regression (MLR) is to accurately 
predict Y, the dependent variable, given any X matrix. The regression takes the general form of 
the following equation 
(14)
: 
 
      
Y = 

X + E
 
 
(2.8) 
 
where Y is the matrix containing the dependent variables, such as concentration values, 

 
is the 
regression vector which contains the regression coefficients, X is the matrix containing the 
independent variables, such as a 
set of spectra, and E is the error associated with the regression 
in the Y direction. The relationship between X and Y is determined in this procedure, and 

 
is 
typically found by using the least
-
squares method of minimizing the error associated with each 
value in the X matrix. This is a calibration curve similar to ordinary linear regression, but the 
regression is performed with multiple variables, such as an absorbance spectrum that contains 
over 3000 independent variables. 
 
21
 
 
Replacing X with the decomposi
tion from PCA yields 
 
Y = (USV
T
)

 
+ E
 
 
(2.9)
 
which can be rewritten as 
 

= (U
T
S
-
1
V)Y
 
 
(2.10)
 
assuming
 
that E is minimal and not significant. Given a new X, or a new set of spectra, a new Y, 
or a matrix containing dependent variables such as concentration values, can be predicted as 
shown in Eq. 2.11:
 
Y
new
 
= X
new

 
(2.11)
 
It is important to note that
 
the decomposition in PCA does not depend on the Y matrix, and that 
the X matrix input for the regression is solely dependent on the scores and loadings determined 
in PCA. 
 
2.4 PC Selection for Regression
 
A main concern prior to establishing a calibration 
curve is the number of PCs to include in 
the regression. Intuitively, prediction ability is positively correlated with the number of PCs 
included in the model: more PCs allow for the calibration curve to better represent the X matrix, 
from which the PCs ar
e determined. The incorporation of more PCs in the regression means that 
a greater proportion of the variance in the X matrix is accounted for in the calibration curve. 
However, overfitting the calibration curve can occur with the inclusion of too many PCs
, as 
subsequent PCs begin to describe non
-
sample sources of variance 
(15)
. Therefore, it is 
imperative to determine the optimal number of PCs to include in the regression. 
 

-

order of greatest contribution to variance, and subsequent PCs are selected until a criterion is met 
(16)
. The most often used criterion is the minimum predicted residual sum of squares (PRESS) 
22
 
 
value for a certain number of PCs included in the regression, which can be determined 
via
 
internal val
idation. The PRESS value is then converted into an error value associated with the 
validated model 
(15)
. More information regarding PRESS values and internal validation are 
detailed in Section 2.5. In order to develop an optimal regression model, the error must be at a 
minimum, and this point 
is determined in PC selection. 
 
2.5 Leave
-
one
-
out Cross Validation
 
Leave
-
one
-
out cross validation (LOO CV) is an example of an internal validation method 
(15)
. It is commonly used in validating a regression model; in PCR, it is primarily used to 
determine the optimal number of PCs to include i
n a regression. Using this method, a model is 
created with one sample in the data set (or X matrix) left out and the prediction error in the 
regression is calculated for that sample. This process is then repeated until all samples have been 
left out once a
nd the errors are then summed to give the prediction residual error sum of squares 
(PRESS) value 
(15)
. Through leave
-
one
-
out cross validation, a PRESS value is determined for a 
regression that only includes the first PC. A second PRESS value is determined for a regression 
that includes the fir
st two PCs, and a third PRESS value is determined for a regression that 
includes the first three PCs, and so on until all PCs are included or until a certain criterion is 
reached. These PRESS values are compared to determine the minimum value; the number o
f PCs 
for which a minimum PRESS exists is considered to be the optimal number of PCs to be used in 
the calibration 
(15)
. This is because the minimum PRESS also represents the point at which the 
error in the regression model is minimal. 
The PRESS values are compared through an F test at a 
user
-
defined confidence level to determine whether the values are significantly different from 
one another. 
 
23
 
 
While PRESS values are useful, a more objective measure of error is standard error; in the 
case of validation, this is known as the standard error of va
lidation (SEV) 
(15)
. The two are 
relat
ed using the following equation:
 
 
(2.12)
 
where 
n
 
is the number of samples in the data set used in the validation. By normalizing to the 
number of samples in the data set, the error associated with the model is not biased by 
including 
too few or too many samples. In the cross
-
validation of a PCR model, an SEV value is 
determined for each PC. 
The comparison of SEV values results from the comparison of PRESS 
values described above, as SEV comparisons are typically within
-
sample 
comparisons between 
the validation errors associated with different PCs. 
 
 
24
 
 
REFERENCES
 
 
25
 
 
R
EFERENCES
 
 
1.
 
Smith LM, Dobson CC. Absolute displacement measurements using modulation of the 
spectrum of
 
white light in a Michelson interferometer. Applied Optics. 1989;28(16):3339
-
42.
 
 
2.
 
Smith BC. Fundamentals of Fourier Transform Infrared Spectroscopy, Second Edition: 
CRC Press, 2011.
 
 
3.
 
Skoog DA, Holler FJ, Crouch SR. Principles of Instrumental Analysis
: Thomson 
Brooks/Cole, 2007.
 
 
4.
 
PerkinElmer. FT
-
IR Spectroscopy Attenuated Total Reflectance (ATR). 2005.
 
 
5.
 
Axelrod D. Cell
-
substrate contacts illuminated by total internal reflection fluorescence. 
The Journal of cell biology. 1981;89(1):141
-
5.
 
 
6.
 
Perk
inElmer. Spectrum One FT
-
IR Spectroscopy. 2004.
 
 
7.
 
Savitzky A, Golay MJ. Smoothing and differentiation of data by simplified least squares 
procedures. Analytical chemistry. 1964;36(8):1627
-
39.
 
 
8.
 
Tong H. Fitting a smooth moving average to noisy data 
(Corresp.). Information Theory, 
IEEE Transactions on. 1976;22(4):493
-
6.
 
 
9.
 
Barnes R, Dhanoa M, Lister SJ. Standard normal variate transformation and de
-
trending 
of near
-
infrared diffuse reflectance spectra. Applied spectroscopy. 1989;43(5):772
-
7.
 
 
10.
 
Isa
ksson T, Næs T. The effect of multiplicative scatter correction (MSC) and linearity 
improvement in NIR spectroscopy. Applied Spectroscopy. 1988;42(7):1273
-
84.
 
 
11.
 
Massy WF. Principal Components Regression in Exploratory Statistical Research. 
Journal of th
e American Statistical Association. 1965;60(309):234
-
56.
 
 
12.
 
Wold S, Esbensen K, Geladi P. Principal component analysis. Chemometrics and 
Intelligent Laboratory Systems. 1987 8//;2(1

3):37
-
52.
 
 
13.
 
Lathauwer LD, Moor BD, Vandewalle J. A Multilinear 
Singular Value Decomposition. 
SIAM Journal on Matrix Analysis and Applications. 2000;21(4):1253
-
78.
 
 
14.
 
Sutter JM, Kalivas JH, Lang PM. Which principal components to utilize for principal 
component regression. Journal of Chemometrics. 1992;6(4):217
-
25.
 
 
1
5.
 
Varmuza K, Filzmoser P. Introduction to Multivariate Statistical Analysis in 
Chemometrics: CRC Press, 2009.
 
26
 
 
16.
 
Xie Y
-
L, Kalivas JH. Local prediction models by principal component regression. 
Analytica Chimica Acta. 1997 8/20/;348(1

3):29
-
38.
 
 
27
 
 
Cha
pter 
3 Materials and Method
 
 
3.1 Sample Preparation and Collection
 
 
Amphetamine sulfate and methamphetamine hydrochloride were obtained from Sigma
-
Aldrich (St. Louis, MO). Caffeine was purchased from Eastman 
Chemical (Kingsport, TN). 
Training set mixtures containing amphetamine and caffeine at different concentrations were 
prepared by first weighing out the appropriate mass of both compounds (Table 3.1) and 
homogenizing with a mortar and pestle. The total mass of each mixture was 10 mg
. The samples 
were stored in capped vials at room temperature. Additional sample mixtures (denoted as Test 
Set 1 and Test Set 2) were prepared and used to validate the model. Test Set 1 mixtures were 
prepared in the same manner and contained the same conce
ntrations by mass that were used in 
the training set (Table 3.2). Test Set 2 mixtures were prepared in the same manner at 
concentrations not included in the training set (Table 3.3). Training and test set mixtures 
containing methamphetamine and caffeine we
re prepared in a similar manner, except replacing 
amphetamine with methamphetamine. 
 
Each training set sample mixture was analyzed by ATR
-
FTIR five times in triplicate over a 
five
-
month period to account for variation in the instrument and ambient conditio
ns over a period 
of time. The samples were not re
-
mixed and homogenized again after preparation. A small 
amount of each mixture (approximately 1 mg) was taken out of the vial in each sampling and 
then replaced after analysis. Replicate measurements were pe
rformed by sampling from the same 
vial. Test set mixtures were sampled and analyzed in a similar manner six times in triplicate over 
a two
-
month period. 
 

methamphetamine an
d mixing with an amount of caffeine (Table 3.4). The total mass varied 
28
 
 
from sample to sample. The mixtures were then homogenized with a mortar and pestle prior to 

 
Table 3.1. Training set sample mixtures containing amphetamine and caffeine. 
 
Concentration (% w
-
%w)
 
Amphetamine (mg)
 
Caffeine (mg)
 
00
-
100
 
0
 
10
 
20
-
80
 
2
 
8
 
40
-
60
 
4
 
6
 
60
-
40
 
6
 
4
 
80
-
20
 
8
 
2
 
100
-
00
 
10
 
0
 
 
Table 3.2. Test Set 1 mixtures containing 
amphetamine and caffeine.
 
Concentration (% w
-
%w)
 
Amphetamine (mg)
 
Caffeine (mg)
 
20
-
80
 
2
 
8
 
40
-
60
 
4
 
6
 
60
-
40
 
6
 
4
 
80
-
20
 
8
 
2
 
 
Table 3.3. Test Set 2 mixtures containing amphetamine and caffeine. 
 
Concentration (% w
-
%w)
 
Amphetamine (mg)
 
Caffeine (mg)
 
10
-
90
 
1
 
9
 
30
-
70
 
3
 
7
 
50
-
50
 
5
 
5
 
70
-
30
 
7
 
3
 
90
-
10
 
9
 
1
 
 
Amphetamine (mg)
 
Caffeine (mg)
 
Methamphetamine (mg)
 
0.8
 
5
 
0
 
2.3
 
5.8
 
0
 
0
 
4.7
 
3.0
 
 
29
 
 
3.2 Instrument Parameters 
 
Spectra were collected using a Spectrum One 
FTIR with a Universal ATR Accessor
y (Perkin 
Elmer, Waltham, MA
). The ATR crystal was a d
iamond/ZnSe one
-
bounce crystal. The sca
n 
range was 4000 

 
650 cm
-
1
, and raw spectra were obtained by averaging four scans in 
transmittance mode. The pressure of the anv
il against the sample was set to 80 units of pressure. 
Prior to data collection, a system suitability check was performed on the instrument in order to 
assess noise, throughput, and contamination levels. Minimal levels of each below the threshold 
values we
re observed, indicating that the instrument was in good condition. A background scan 
was also performed prior to data collection. The ATR crystal was cleaned with acetone after each 
replicate. A background scan was performed after every two samples were an
alyzed in triplicate. 
 
3.3 Data Pretreatment
 
Raw spectra were first converted to absorbance mode through the instrument software 
(Spectrum v.5.0.1, Perkin Elmer
, Waltham, MA). The 
data pretreatment investigation was only 
performed using spectra of the firs
t collection of amphetamine
-
caffeine mixtures. 
 
Baseline correction and smoothing were performed using the appropriate functions available 
in the instrument software. The baseline region was first identified through a first
-
derivative plot 
and a quadratic 
function was then fit to the baseline region. This function was subsequently 
subtracted from the raw spectrum. Baseline
-
corrected spectra were then smoothed using a 
combination of the Savitzky
-
Golay and the Block Averaging algorithms to ensure that the 
bas
eline regions were more heavily smoothed than the peaks. Standard normal variate 
normalization was performed in Microsoft 
Excel (Microsoft, Redmond, WA
). Principal 
components analysis and principal components regression were performed using Pirouette v.4.0
 
30
 
 
(Infometrix, Bothell, WA). All 
output data were exported to Microsoft Excel for further 
processing. 
 
The effect of the data pretreatment procedures was visually assessed in the spectra and 
quantified in the PCA scores plot. Changes in the PCA scores plot 
as a result of the data 
pretreatment procedures were quantified by determining the average percent change in the 
clustering (PCC) of replicates between the PCA scores of the raw spectra of the first collection of 
samples 
(
i.e.
 
6 samples in triplicate) and those of the spectra after each pretreatment 
(1)
. For each 
sample, the average score
s
 
on PC 1 and PC 2 were calculated along with vari
ance. The variance 
for both PC 1 and 2 was then summed, as variance is additive for independent and normally 
distributed data. The standard deviation accounting for both PCs was then calculated. This 
procedure was then repeated for all six samples in the u
ntreated data and then following each 
data pretreatment. The percent change in the standard deviation of the scores for each sample 
was calculated and averaged across all samples. The average PCC for each pretreatment w
as
 
assessed to determine optimal pret
reatment based on the PCC value with the largest magnitude. 
Improvement in clustering due to pretreatment is indicated by a negative PCC value; the optimal 
set of data pretreatments is indicated by the largest negative PCC value. 
 
3.4 Example Spectra of Co
mponents in Sample Mixtures
 
The pretreated spectra of caffeine, amphetamine, and methamphetamine 
from the first 
collection of sample mixtures 
are shown in Figure
s
 
3.1
 

3.3
, respectively
. 
The spectrum of 
caffeine 
(Figure 3.1) 
displays two small peaks at 
31
15 and 2950 cm
-
1
 
that correspond to 

CH 
stretches 
of the methyl groups in caffeine. The two intense peaks at 1700 and 1600 cm
-
1
 
correspond to the C=O stretches from the two amide groups in the molecule. 
The peaks ranging 
from approximately 1600 

 
1400 cm
-
1
 
correspond to 
ring stretches of both the pyrimidine and 
31
 
 
imidazole components of the molecule. 
The sharp peak
s
 
at 
1250 cm
-
1
 
and between 1100 and 900 
cm
-
1
 
correspond to various 
C
-
N stretch
es
, and 
finally, the peak at 700 cm
-
1
 
cm
-
1
 
result
s
 
from 
CH 
out
-
of
-
plane bends. 
 
 
Figure 3.1. Average spectrum of caffeine from the first collection of amphetamine
-
caffeine 
mixtures after baseline correction, smoothing, and standard normal variate normalization.
 
 
The spectrum of amphetamine is shown in Figure 3
.2 and displays 
intense 
peaks that range 
from 3150 

 
26
00 cm
-
1
 
and
 
correspond to 

CH stretches
. 
Other charac
teristic peaks include those 
ranging from 1650 

 
1300 cm
-
1
 
that correspond to C
-
C stretches in the benzene ring
, as well as 
the intense peaks betwee
n 1200 and 900 cm
-
1
, which correspond to 

NH bends
. 
The two peaks 
between 750 and 650 cm
-
1
 
correspond to 
CH out
-
of
-
plane bends. 
 
32
 
 
Figure 3.2. Average spectrum of amphetamine from the first collection of amphetamine
-
caffeine mixtures after baseline correct
ion, smoothing, and standard normal variate 
normalization. 
 
 
The spectrum of methamphetamine is displayed in Figure 3.3. 
Characteristic 
peaks
 
ranging 
from 3150 

 
2300 cm
-
1
 
correspond to 

CH stretches. 
Similar to amphetamine, the peaks at 1650 

 
1300 cm
-
1
 
correspond to C
-
C stretches in the benzene ring. The peaks ranging from 1200 

 
900 
cm
-
1
 
are less intense in the spectrum of methamphetamine compared to amphetamine 
due to the 
weaker dipole moment
 
in the 

NH bend for methamphetamine
, as the amine is a seco
ndary 
amine rather than a primary amine. 
The two peaks between 750 and 650 cm
-
1
 
correspond to 
CH 
out
-
of
-
plane bends
. While similarities between the spectra of amphetamine and 
methamphetamine are observed, absorbance in the higher wavenumber region (i.e. 31
50 

 
2300 
cm
-
1
) is more intense for methamphetamine, resulting in more intense peaks in this region. 
 
33
 
 
Figure 3.3. Average spectrum of methamphetamine from the first collection of 
methamphetamine
-
caffeine samples after baseline correction, smoothing, and 
standard 
normal variate normalization. 
 
 
34
 
 
REFERENCES
 
 
35
 
 
R
EFERENCES
 
 
1.
 
McIlroy JW. Effects of data pretreatment on the multivariate statistical analysis of 
chemically complex 
samples [M.S.]. Ann Arbor: Michigan State University, 2014.
 
 
36
 
 
Chapter 
4 Results and Discussion
 
 
4.1 Effects of Data Pretreatment
 
4.1.1 Baseline Correction
 
The effect of each data pretreatment procedure was visually inspected in spectra and 
quantified 
in the PCA scores plot. Figure 4.1 shows the effect of baseline correction in the 
replicate spectra of caffeine. In the raw spectra, a sloping baseline is observed at the higher 
wavenumber region (4000 

 
3150 cm
-
1
) (Figure 4.1a). The rise in baseline does 
not originate 
from chemical sources; instead, it is the result of water vapor absorption from the atmosphere 
during sample collection. After baseline correction, the sloping baseline is substantially reduced 
in all replicates (Figure 4.1b). However, other 
non
-
sample sources of variance are present in the 
spectra even after baseline correction, indicating that further pretreatment is necessary. 
 
 
Figure 4.1. The effect of baseline correction in spectra as compared to raw spectra.
 
(a) Raw 
absorbance spectra 
of three replicates of caffeine.
 
(b) Baseline
-
corrected spectra of the caffeine 
sample.
 
 
37
 
 
Figure 4.2 displays the effect of baseline correction in the PC 1 vs. PC 2 scores plots. The 
total variance accounted for by PCs 1 and 2 in the scores plot of the raw spectra (Figure 4.2a) is 
98.2%. PC 1 separates samples based upon amphetamine content. Sa
mples with higher caffeine 
content and lower amphetamine content are positioned more positively on PC 1; caffeine 
samples are positioned most positive on the axis. Samples with higher amphetamine content and 
lower caffeine content are positioned more negat
ively on PC 1; replicates of amphetamine are 
positioned most negative on PC 1. The other sample mixtures are positioned along PC 1 
according to their amphetamine and caffeine content. PC 2 separates replicates within each of the 
six samples. In this PCA sc
ores plot, the replicates for each mixture span a large space in both 
dimensions, which is not expected for replicates that contain the same chemical information. 
Ideally, replicates of the same mixture should overlap since they are chemically the same; 
38
 
 
ho
wever, the large spread among replicates indicates that non
-
sample sources of variance, 
especially differences in peak heights, dominate PC 2.
 
 
Figure 4.2. The effect of baseline correction observed in the PCA scores plot. 
(a) PCA scores 
plot of PC 1 vs
. PC 2 of the raw spectra consisting of all replicates of the six samples. (b) PCA 
scores plot for the baseline
-
corrected spectra.
 
 
39
 
 
Associated loadings plots for PC 1 and PC 2 corresponding to the raw spectra are shown in 
Figure 4.3. From the loadings plot
 
for PC 1 (Figure 4.3a), the dominant source of variance is the 
difference in amphetamine and caffeine content among the samples. In the loadings plot, all 
peaks that are present in amphetamine (
e.g.
 
peaks between 3150 

 
2600 cm
-
1
) have a negative 
weightin
g while peaks that correspond to caffeine (
e.g.
 
two peaks in the 1700 

 
1600 cm
-
1
 
region) are weighted positively. Therefore, samples in PC 1 are separated based on 
concentration, where samples with a higher amphetamine content are positioned more negative
ly 
on PC 1 while those samples with a higher caffeine content are positioned more positively on 
this PC. This is somewhat reflected in the scores plot; however, the replicates in each sample are 
spread over a wide range along the PC 1 axis and samples even
 
overlap on this axis (
i.e.
 
60% 
amphetamine and 80% amphetamine). 
 
Positioning of the samples on PC 2 can be explained in a similar manner; however, all of the 
loadings in PC 2 (Figure 4.3b) are in the positive direction, indicating that the sample separat
ion 
is based upon peak heights and intensities. The PC 2 loadings plot looks quite similar to an 
absorbance spectrum of the mixtures, with peaks corresponding to both amphetamine and 
caffeine
 
(Figures 3.1 and 3.2, respectively)
. Spectra of samples that con
tain the majority of these 
peaks at higher intensities are positioned more positively on PC 2 in the scores plot (
e.g.
 
all 
replicates of 40% amphetamine). Those samples with spectra displaying these peaks at lower 
intensities or slightly different peak pat
terns are positioned more negatively on PC 2 (
e.g. 
caffeine and amphetamine samples). The positive positioning of the 40% amphetamine mixtures 
is due to all replicates of the mixture containing this pattern shown in the PC 2 loadings. Other 
mixtures with a
 
higher amphetamine content display less 
intense
 
caffeine peaks, and thus, 
deviate from the pattern, which accounts for their more negative positioning on PC 2
 
(Figure 
40
 
 
4.2b)
. Similarly, those samples with higher caffeine content display less intense amphet
amine 
peaks, and also deviate from this pattern, resulting in their negative positioning on PC 2. 
 
 
Figure 4.3. PC 1 and PC 2 loadings plot corresponding to the PCA scores plot of raw 
spectra. 
(a) PC 1 loadings plot associated with PCA scores plot of raw spectra. (b) PC 2 loadings 
plot associated with PCA scores plot of raw spectra. 
 
 
41
 
 
Some mixtures also have replicates that are positioned both negatively and positively on PC 
2 (
e.g.
 
60% amphetam
ine). Based on the PC 2 loadings plot, the replicates in these mixtures vary 
in peak intensities, where spectra containing the characteristic amphetamine and caffeine peaks 
at higher intensities are positioned more positively on PC 2 and spectra with these
 
peaks at lower 
intensities are positioned more negatively. 
This is apparent in the mean
-
centered spectra 
of the 
replicates of 60% amphetamine samples 
shown in Figure 4.4. 
Of the three replicate spectra, one 
spectrum displays less intense peaks than the ot
her two replicate spectra, resulting in a 
negative
ly
-
oriented
 
mean
-
centered spectrum. The PC 2 score of a sample is determined by 
summing the multiplicative effects of the PC 2 loadings and the mean
-
centered spectrum. 
Because the mean
-
centered spectrum 
of 
the first replicate is
 
negative
ly
-
oriented
 
and the PC 2 
loadings is positive
ly
-
oriented
, the score for this replicate on PC 2 is negative. 
From the scores 
plots, it can be observed that the separation of samples on PC 2 is due to differences in peak 
height
s within replicates as well as peak patterns among samples. While the differentiation of 
samples based upon peak patterns and peak heights arises from chemical differences, the 
separation of sample replicates on PC 2 due to peak heights is attributed to no
n
-
sample sources 
of variance since replicates are chemically the same.
 
42
 
 
Figure 4.4. Mean
-
centered spectra of replicates of 60% amphetamine samples after 
baseline correction. 
 
 
After the application of baseline correction, the PCA scores plot (Figure 4.2b
) does not 
display substantial change from the scores plot based on the raw spectra (Figure 4.2a). The 
associated PC 1 and 2 loadings 
plot (
Figure 
A.
1
) are also 
similar to the loadings plots observed 
for the raw spectra. The average percent change in the c
lustering (PCC) of replicates determined 
for baseline correction was 
-
3.7%. The negative value indicates improved clustering of the 
replicates after baseline correction. Despite slight variance reduction with baseline correction, 
the small magnitude in the
 
percent change supports the conclusion that other sources of non
-
sample variance are present and that baseline correction alone is not effective at removing all 
non
-
sample variance.
 
4.1.2 Smoothing
 
After baseline correction, noise in the baseline region w
as still observed in the spectra (4000 

 
3150 cm
-
1
 
region in Figure 4.1b). A smoothing function was applied in conjunction with 
baseline correction to assess its efficacy in non
-
sample variance reduction. Figure 4.
5
 
displays 
43
 
 
spectra where the smooth occurr
ed on the baseline
-
corrected spectra, with the associated PCA 
scores plot. Baseline
-
corrected and smoothed spectra (example shown in Figure 4.
5
a) display 
noise reduction in the baseline regions and minimal signal reduction, indicating the removal of 
non
-
sa
mple variance while maintaining the chemical information. 
 
S
lightly closer clustering of the replicates was observed in the PCA scores plot (Figure 4.
5
b) 
after baseline correction and smoothing
. However,
 
the PCC value of 
-
5.1% from the raw spectra 
indicat
es that
 
sloping baselines and noise 
are
 
not the major sources of non
-
sample variance. It 
can be concluded from the small PCC value and the large spread in the clustering of replicates 
that other non
-
samples sources of variance of a greater magnitude are pr
esent and that further 
pretreatment beyond baseline correction and smoothing is necessary.  
 
 
Figure 4.
5
. The effect of applying an automated smoothing function that incorporates a 
Savitzky
-
Golay smooth as well as block averaging on different regions of t
he spectrum.
 
(a) 
Baseline
-
corrected and smoothed spectra of three replicates of the caffeine sample.
 
(b) PCA 
scores plot displaying the samples after baseline correction and smoothing.
 
 
44
 
 
4.1.3 Standard Normal Variate Normalizatio
n 
 
A noticeable feature across all spectra was the difference in intensity among the replicates. 
For example, the first replicate of the caffeine sample had a peak intensity of 0.19 arbitrary units 
(AU) at 1650 cm
-
1
 
while the second replicate of the same s
ample had a peak intensity of 0.28 AU 
at the same wavenumber. The peak pattern and the peak ratios within the spectrum remained the 
same, but the overall peak intensities differed; essentially, the area under the curve varied 
between replicates of the same
 
sample. Standard normal variate (SNV) normalization was 
applied to each spectrum in order to reduce the differences observed in spectra and the PCA 
scores plots due to peak area variation. Figure 4.
6
 
displays the effects of SNV normalization in 
combinatio
n with baseline correction and smoothing. 
 
45
 
 
Replicate overlay is greatly improved in spectra after SNV normalization (Figure 4.
6
a) as 
compared to the overlay observed in raw spectra (Figure 4.1a), indicating that SNV 
normalization is effective at minimizing
 
this type of non
-
sample variance. 
 
 
Figure 4.6
. The effects of SNV normalization in conjunction with baseline correction and 
smoothing.
 
(a) Spectra of replicates of the caffeine sample after baseline correction, smoothing, 
and SNV normalization.
 
(b) PCA 
scores plot for spectra after baseline correction, smoothing, and 
SNV normalization.
 
 
46
 
 
The effect of SNV normalization can also be observed in the PCA scores plot, where the 
clustering of replicates is substantially improved (Figure 4.6b) after the series of baseline 
correction, smoothing, and normalization, especially on PC 2. This indicate
s that the dominant 
source of non
-
sample variance is instrument response as a result of differences in ATR crystal 
coverage by the sample, which leads to differences in peak intensities and areas. After reducing 
non
-
sample variance, the variance among the 
samples accounted for by PC 1 is 92.3%, while PC 
2 accounts for 6.6%. Of the pretreatments investigated, this series of pretreatment procedures 
yielded the greatest improvement in the clustering of replicates (
-
91.5%). 
 
The differentiation of samples on bo
th PCs is clearly displayed in the scores plot
 
(Figure
 
4.6
b)
. Based on the PC 1 loadings plot (Figure 4.
7
a), the samples are separated by amphetamine 
and caffeine content, similar to the separation on PC 1 observed for the scores plot of raw spectra 
(Figur
e 4.2a). However, PC 1 accounts for more of this variation after SNV normalization 
47
 
 
(92.3% compared to 69.7%), meaning that the variables associated with PC 1 are more dominant 
with the reduction in non
-
sample variance by SNV normalization. The PC 2 loading
s plot 
(Figure 4.
7
b) appears to highlight differences in peak heights among the mixtures. Thus, those 
samples with more intense peaks across the entire wavenumber region are positioned more 
positively (
e.g.
 
40% amphetamine or 60% amphetamine) whereas, thos
e samples that do not 
display this peak pattern are positioned more negatively on PC 2 (
e.g.
 
caffeine and 
amphetamine). The overall peak patterns of these loadings plot and those of the raw spectra 
(Figure 4.3) are similar; the major differences are the we
ightings on the more dominant peaks 
(
e.g.
 
amphetamine peaks between 3150 

 
2600 cm
-
1
 
and the caffeine peaks between 1700 

 
1600 
cm
-
1
). The weightings are greater for the peaks in the loadings plot corresponding to the 
normalized spectra for both PCs, indic
ating that the contributions of these variables to the 
positioning of samples on the scores plot are greater than for raw spectra. Also, the replicates of 
the samples in the scores plot cluster well, and it is apparent that spread along the PC 2 axis for 
r
eplicates is substantially reduced. While the distinction of samples on PC 2 is not necessarily 
meaningful, the variance highlighted by PC 2 still originates from chemical differences among 
the samples. 
 
48
 
 
Figure 4.7
. Loadings plots for (a) PC 1
 
and (b) 
PC 2 
associated with PCA scores plot of 
spectra after applying baseline correction, smoothing, and SNV normalization.
 
 
It is apparent that sequential application of pretreatments is more effective than applying any 
single data pretreatment, 
as the PCC afte
r applying a sequence of data pretreatments is greater 
than the PCC after applying any pretreatment individually (Table 4.1). 
 
 
49
 
 
Table 4.1. PCC values after applying data pretreatments. 
 
Data Pretreatment
 
PCC Value (%)
 
Baseline Correction
 
-
3.7
 
Smoothing
 
-
1.3
 
Standard Normal Variate Normalization
 
-
89.6
 
All
 
-
91.5
 
 
4.1.4 Multiplicative Scatter Correction
 
Scatter correction was applied to spectra in conjunction with baseline correction and 
smoothing in order reduce any effects due to light scatter and part
icle size differences, which are 
not inherent to the sample. Multiplicative scatter correction was not applied to SNV normalized 
spectra as the two pretreatments are complementary 
(1)
. The effects of applying a scatter 
correction
 
were observed in the spectra and PCA scores plots (Figure 4.
8
). 
 
An example of substantial improvement in replicate overlay is shown in Figure 4.
6
a, where 
the overlay appears to span both the baseline and peak regions. Taking into account other 
pretreatment procedures in conjunction with multiplicative scatter correction, visual assessment 
of spectra indicates comparable performance to SNV normalization (Figure 4.
8
a compared with 
Figure 4.
6
a). 
 
But, from the scores plot, it is evident that the cl
ustering of replicates is not as close as that 
observed in the scores plots where the spectra were pretreated using SNV normalization (PCC of 
-
86.6% compared to PCC of 
-
91.5% for SNV normalization) (Figure 4.
8
b compared to Figure 
4.
6
b). The non
-
sample vari
ance is not minimized as well in PC 2 using scatter correction as 
compared to SNV normalization. This is observed in the PC 2 loadings plot (Figure 4.
9
b), where 
the spectrum characteristics that are highlighted are those that describe replicate variation a
nd 
differences in noise among replicates. While it is evident that non
-
sample sources of variance are 
minimized with other pretreatment procedures, scatter correction is not as efficient as SNV 
50
 
 
normalization. This may be due to the way in which scatter cor
rection is performed on the data 
as compared to SNV normalization. Scatter correction is data set
-
dependent, whereas SNV 
normalization is data set
-
independent. As a result, each spectrum is treated individually by 
normalization, whereas the degree of scatt
er correction to each spectrum is dependent on the data 
set. Based upon the inherent difference in the way these two procedures are applied, the 
replicates are more likely to overlay after normalization as opposed to scatter correction.  
 
 
Figure 4.
8
. The effect of multiplicative scatter correction on spectra and PCA scores plot. 
(a) Spectra of replicates of caffeine after applying baseline correction, smoothing, and 
multiplicative scatter correction. (b) PCA scores plot of spectra after pretreatment 
with baseline 
correction, smoothing, and multiplicative scatter correction.
 
 
51
 
 
52
 
 
Figure 4.
9
. (a) PC 1 and (b) PC 2 loadings plot associated with scores plot of spectra after 
applying baseline correction, smoothing, and 
mul
tiplicative 
scatter correction.  
 
 
4.1.5 Optimal Sequence of Data Pretreatment 
 
The optimal set of data pretreatments was determined to be baseline correction, smoothing, 
and SNV normalization. This sequence offered a PCC of 91.5%, which was the highest PC
C 
53
 
 
value of all pretreatment sequences investigated. The application of this set of pretreatment 
procedures results in PCs 1 and 2 accounting for a total variance of 98.9%, with both PCs 
representing variables that are chemically meaningful. 
 
Sloping baseli
nes and noise in the baseline regions are substantially reduced, and replicate 
spectra overlay more closely. Thus, all spectral data were baseline
-
corrected, smoothed, and 
SNV normalized before being used to develop and test the PCR model.   
 
4.2 PCA
 
PCA u
sing the singular value decomposition (SVD) algorithm was performed on pretreated 
spectra of the amphetamine
-
caffeine and methamphetamine
-
caffeine mixtures designated as the 
training set. The first two PCs account for 94.0% of the total variance (Figure 4.
10
a). Sample 
replicates clustered closely and distinctly from samples of different concentrations. Samples 
containing only caffeine are positioned most positively on PC 1, and slightly negatively on PC 2. 
Samples containing 100% amphetamine and methampheta
mine are positioned negatively on PC 
1, but differ on PC 2. Amphetamine samples are positioned negatively on PC 2, whereas, 
methamphetamine samples are positioned positively on PC 2. 
 
The positioning of all samples on this scores plot can be explained with
 
the PCs 1 and 2 
loadings plots (Figure 4.
10
b and 4.
10
c). The PC 1 loadings plot (Figure 4.
10
b) displays 
characteristic caffeine peaks (
i.e.
 
1700 

 
1600 cm
-
1
 
in Figure 3.1
) that are weighted positively 
while those that correspond to both amphetamine 
and me
thamphetamine 
(
i.e.
 
3150 

 
2
6
00 cm
-
1
 
in Figure 3.2 and 3150 

 
2300 cm
-
1
 
in Figure 3.3
) are weighted in the negative direction. The 
peaks in the higher wavenumber region corresponding to methamphetamine (3150 

 
2300 cm
-
1
) 
are not as prominent as those for amphetamine, but overtones of the two controlled substances 
can be observed in this region. Based upon the loadings for PC 1, samples with higher 
54
 
 
amphetamine and methamphetamine content are positioned more negatively on 
PC 1
,
 
while 
those with higher caffeine content are positioned more positively. PC 1 separates samples based 
upon controlled substance and caffeine concentration. 
 
The loadings plot for PC 2 (Figure 4.
10
c) shows distinctive methamphetamine peaks in the 
3150
 

2300 cm
-
1
 
region (due to 

CH stretches) weighted positively while those peaks in the 
1200 

 
900 cm
-
1
 
region (as a result of 

NH bends) which are attributed to amphetamine are 
weighted negatively. Accordingly, all methamphetamine samples are positioned p
ositively on PC 
2, with samples containing higher methamphetamine content positioned more positively. In 
contrast, amphetamine mixtures are positioned negatively on PC 2, with samples containing 
higher amphetamine concentrations positioned more negatively.
 
PC 2 separates samples based 
on controlled substance content.
 
 
Figure 4.
10
. PCA applied to training set.
 
(a) PCA scores plot of training set with both sets of 
mixtures.
 
(b) PC 1 loadings plot. (c) PC 2 loadings plot. 
 
55
 
 
From the
 
scores plot, a larger spread in the amphetamine mixtures as opposed to the 
methamphetamine samples can be observed on PC 1, especially for the 40% amphetamine 
samples. This may be due to slight differences in amphetamine peak intensities among the 
56
 
 
replica
tes, where replicates with slightly lower peak intensities for amphetamine are positioned 
more positively and replicates with lower peak intensities corresponding to caffeine are 
positioned more negatively. As more amphetamine than methamphetamine characte
ristics are 
highlighted in PC 1 (Figure 4.
10
b), the spread among replicates is more apparent in the 
amphetamine 
mixtures. Normalization should have addressed replicate variation as a result of 
differences in peak intensities; the manifestation of this vari
ation in the scores plot indicates that 
perhaps the data pretreatment procedures applied to the 
spectra did not fully remove this non
-
sample source
 
of variance. 
However, the focus of this research was not on data pretreatments for 
spectral data; as close c
lustering of the replicates are observed in the scores plot for the majority 
of the samples, and samples of different concentrations are distinguished, the set of data 
pretreatments applied to the spectra was deemed sufficient for the PCR model. 
 
4.3 PC Se
lection for MLR
 
An internal validation, and more specifically leave
-
one
-
out cross validation (LOO CV), was 
performed on the training set to determine the optimal number of PCs for multiple linear 
regression. The validation was performed on the entire set o
f samples, but with separate results 
for amphetamine and methamphetamine regressions. A total of 9 PCs was selected with which to 
perform the cross validation; the total variance accounted for by this number of PCs was 99.8%. 
The optimal number of PCs for 
the regression model is determined as the total number of PCs 
for which the standard error of validation (SEV) is not significantly different from the SEV 
corresponding to the maximum number of PCs selected. Thus, in this cross validation, the 
optimal numb
er of PCs to retain in the model is that in which the corresponding SEV is not 
significantly different from the SEV for 9 PCs. This method of selecting the appropriate number 

-

 
57
 
 
The SEV was plotted as a function 
of the number of PCs included in the model for both 
amphetamine and methamphetamine (Figure 4.1
1
). Methamphetamine initially shows greater 
error than amphetamine (34.8% compared to 24.4%) when only PC 1 is included in the 
regression. The SEV is quickly reduced with the inclusion of 
PC 2 to 4.2% and 5.4% for 
methamphetamine and amphetamine, respective
ly. With the addition of subsequent PCs up to 9 
PCs, slightly greater error is found with amphetamine mixtures. For example, with 3 PCs in the 
model, the SEV for amphetamine is 5.2% compared to 3.0% for methamphetamine. Also, the 
addition of subsequent PCs
 
to the regression does not substantially reduce the SEV for both sets 
of samples. Eight PCs was determined to be optimal for amphetamine regression because the 
SEV for including 8 PCs was not significantly different compared to the SEV at 9 PCs (
i.e.
 
2.9%
 
at both points)
 
at the 95% confidence level
. For methamphetamine samples, 9 PCs was 
determined to be optimal since the inclusion of less than 9 PCs yielded SEV values that were 
significantly greater than the SEV at 9 PCs (1.9%)
 
at the 95% confidence level
. When a total of 
10 PCs was selected for cross validation, the optimal number of PCs to include in the 
methamphetamine regression was still 9 PCs; this result was observed when increasing the 
maximum number of PCs to 11 PCs. 
 
 
58
 
 
Figure 4.1
1
. The standard 
error of validation plotted as a function of the number of PCs 
included in the validation for amphetamine (black) and methamphetamine (red) mixtures. 
 
 
Even though the SEV may reach a local minimum or be statistically equivalent 
at the 95% 
confidence level
 

-

PC selection may lead to the regression model being overfit. An overfit model is one that is 
specific to the training set used in the development and cannot be used to accurately 
predict other 
datasets. This is because of the inherent nature of PCA, where the PCs attempt to account for all 
variance observed among the samples. As higher PCs often account for very little variance, the 
information that is highlighted by these PCs may 
not be chemically meaningful. This is 
shown in 
Figure 
A.
2
, where 
the loadings plots for 
PC 8 (
Figure 
A.
2a
) and PC 9 (
Figure 
A.
2b
) highlight 
replicate and inter
-
sample variance rather than chemically meaningful variance.
 

-

 
of PC selection is to include variance as a criterion. 
That is, the optimal number of PCs to retain in the model is the number that describes a user
-
defined percent of the total variance. In this work, the criterion was the number of PCs that 
59
 
 
accounted fo
r 95% of the total variance
, as this percentage of total variance accounts for the 
majority of variance described in PCA. 
Based upon the total variance plotted as a function of 
PCs (Figure 4.1
2
), 2 PCs account for 94.0% and 3 PCs account for 98.6% of the t
otal variance. 
Thus, 3 PCs were selected as the maximum number of PCs to include in the LOO cross 
validation. 
 
 
Figure 4.1
2
. Plot of total variance as a function of the number of PCs.
 
Red dashed line 
indicates 95% variance. 
 
 
Using this method, 2 PCs were optimal for 
amphetamine (SEV = 5.4%). 
This is a logical 
result as there exists only two variables in the training set. The first variable is the difference in 
controlled substance in the mixtures; methamphetamine mixtures are 
different from 
amphetamine mixtures. The second variable is the constraint of varying the diluent caffeine in 
the amphetamine and methamphetamine mixtures so that the total amount in each mixture was 
10 mg. Also, as previously discussed, PC 2 accounts for 
samples differing in controlled 
substance content; thus, PC 2 is chemically meaningful and should be included in the regression. 
 
60
 
 
With the methamphetamine regression, 3 PCs were determined to be optimal.
 
PC 3 accounts 
for 4.6% of the total variance. 
However, in examining the PC 3 loadings plot (Figure 4.1
3
), the 
loadings are all weighted negatively. As previously described, this peak pattern indicates that the 
differentiation of samples on PC 3 is based upon differences in peak heights. It is apparent
 
that 
these differences affect methamphetamine mixtures more than the amphetamine mixtures due to 
the characteristic peaks in the 3150 

 
2300 cm
-
1
 
region
 
corresponding to methamphetamine
. 
Similar differentiation of mixtures based on peak heights was observ
ed for amphetamine samples 
on PC 1 (Figure 4.
10
b)
. H
owever, 
this variance was greater in the amphetamine samples and 
thus, affected the samples on PC 1
.
 
Because t
his same variance affected methamphetamine to a 
lesser degree
, it is 
only present in PC 3.
 
To 
avoid multicollinearity, where subsequent PCs 
describe variation in the samples that is observed in the earlier PCs, PC 3 was not included in the 
methamphetamine regression. 
Thus, regression for both amphetamine and methamphetamine 
was performed with 2 PCs
. 
 
 
61
 
 
Figure 4.1
3
. PC 3 loadings 
plot
 
associated with pretreated sample mixtures in the training 
set
. 
 
 
4.4 Model Performance
 
4.4.1 Multiple Linear Regression
 
Multiple linear regression was performed in a single analysis using only 2 PCs for both of the 
r
egressions; two calibration curves were 
generated (Figure 
4.
1
4
). 
The predicted concentrations of 
the training set are plotted as a function of the known concentrations. Ideally, all samples in the 
training sets should be localized in 20% increments along t
he y = x line, with a slope of 1 and a 
y
-
intercept value of 0; however, some deviations from the line do exist. 
 
62
 
 
Figure 4.1
4
. Calibration 
curves generated in (a) amphetamine regression and (b) 
methamphetamine regression using the training set. 
 
 
The deviations 
of the data points from the calibration curves 
that are most noticeable are 
those on the y
-
axis. For the amphetamine regression, the caffeine samples and the 
methamphetamine mixtures range in concentration from 
-
10% 

 
10%. As these samples d
o not 
63
 
 
contain amphetamine, their concentrations should be 0% amphetamine. However, this range in 
concentration is observed due to the differences in the spectral pattern among the samples with 
respect to the pattern observed in the amphetamine regression v
ector (Figure 4.1
5
a).
 
The regression vectors determined in the calibrations are shown in Figure 4.1
5
 
and indicate 
the magnitude of variable contributions to the calibration curve similar to a loadings plot. For the 
amphetamine regression, the vector (Figur
e 4.1
5
a) displays characteristic methamphetamine 
peaks (3150 

 
2300 cm
-
1
) weighted negatively, caffeine peaks between 1700 and 1600 cm
-
1
 
also 
weighted negatively, and peaks characteristic of amphetamine (1200 

 
900 cm
-
1
) weighted 
positively. As this regres
sion was performed to quantify the amount of amphetamine in mixtures, 
it is logical the peaks characteristic of amphetamine would be weighted positively. A positive 
weighting in this vector indicates a positive contribution to the overall calibration curve
; 
variables that are weighted negatively influence the calibration curve so that the samples with 
these peaks have lower predicted concentrations for that regression. 
 
As all methamphetamine mixtures do not contain amphetamine as a component, the known 
con
centration for these samples is 0% amphetamine. Characteristic caffeine peaks are also 
weighted negatively since the binary mixtures with higher caffeine content have lower 
amphetamine concentrations. The caffeine and methamphetamine mixtures that have 
con
centration values ranging from 
-
10% 

 
10% display different intensities of these peaks in 
their spectra, resulting in the different concentration values along the y
-
axis. For example, the 
peak intensities at 3150 

 
2300 cm
-
1
 
are higher for 100% methampheta
mine than for 60% 
methamphetamine or 80% methamphetamine, which results in the 100% methamphetamine 
having more negative predicted concentrations. As negative concentrations cannot exist, the 
predicted concentrations of caffeine and methamphetamine should 
be taken as 0%.
 
64
 
 
Amphetamine peaks in the regression vector are weighted positively since more intense 
amphetamine peaks indicate a higher concentration of amphetamine. It is unusual however, that 
the amphetamine peaks in the 3150 

 
2600 cm
-
1
 
region are not
 
contributing to the amphetamine 
regression; this may be due to the more intense methamphetamine absorptions in this region that 
overshadow the peaks characteristic of amphetamine. Nevertheless, characteristic peaks of all 
components in the mixtures are pr
esent in the amphetamine regression vector as expected, 
indicating substantial contribution to the regression of the samples. 
 
 
Figure 4.1
5
. Regression vector
s
 
for (a) amphetamine regression
 
and (b) methamphetamine 
regression. 
 
 
65
 
 
The deviation of predicted concentrations from the calibration curve are also present in the 
methamphetamine regression (Figure 4.1
4
b), where the measured concentrations of caffeine and 
the amphetamine mixtures range from 
-
10% 

 
10% methamphetamine eve
n though the mixtures 
do not contain methamphetamine. The range in predicted concentrations can be explained in a 
similar manner to the amphetamine regression using the methamphetamine regression vector 
(Figure 4.1
5
b). 
 
The contributions of the variables i
n the methamphetamine regression vector are more 
straight
-
forward; characteristic methamphetamine peaks in the 3150 

 
2300 cm
-
1
 
region are 
weighted positively and characteristic amphetamine peaks in the 1200 

 
900 cm
-
1
 
region are 
weighted negatively. Peaks
 
characteristic of caffeine are not apparent in this regression vector, 
indicating that the most influential components of this regression are the intensities of the 
methamphetamine and amphetamine peaks. This is expected, as the pattern observed in this 
6
6
 
 
r
egression vector resembles that in the PC 2 loadings plot (Figure 4.
10
c). The methamphetamine 
regression was performed using only PCs 1 and 2
. I
t is logical that the methamphetamine 
regression vector would display variable contributions similar to the PC 2
 
loadings plot, as the 
variance accounted for by PC 2 allows the sample mixtures to differentiate based upon controlled 
substance content. Also, the PC 2 scores for caffeine (Figure 4.
10
a) are close to zero, and thus, 
do not contribute heavily to the varia
nce. As such, all caffeine and amphetamine mixtures have 
predicted concentrations close to 0% methamphetamine. 
 
The deviations from the calibration curve are quantified as the root
-
mean
-
square error 
(RMSE). The error associated with the internal validation
 
is described with the root
-
mean
-
square 
error
-
of
-
validation (RMSEV), and is equivalent to SEV. A good model is one with a minimized 
RMSEV. Through LOO CV, the RMSEV for the amphetamine regression using 2 PCs was 5.4% 
whereas, the methamphetamine regression
 
with 2 PCs yielded a validation error of 4.2%. The 
lower RMSEV for methamphetamine regression can be explained with reference to the 
calibration curves in Figure 4.1
4
. It is apparent that the replicates of each sample mixture are 
clustered more closely fo
r the methamphetamine regression (Figure 4.
14b
) than for the 
amphetamine regression (Figure 4.1
4
a). The spread in the replicates of the sample mixtures are 
more apparent in amphetamine mixtures, and this is due to PC 1 accounting for differences in the 
pea
k heights of the replicates in amphetamine mixtures (Figure 4.
10
a). See 
S
ection 4.2 for a 
detailed explanation. 
 
Despite the deviations from the amphetamine calibration curve, the correlation coefficient 
was 0.988, which indicates good linearity between th
e known and predicted concentrations. It 
can be further concluded that the X and Y matrices, that is, the set of spectra and the known 
concentration values, display good linearity. The correlation coefficient for methamphetamine 
67
 
 
regression was 0.993, which
 
is not surprising, given that the majority of the data points are 
well 
positioned on the calibration curve. 
 
4.4.2 Test Set 1
 
Model performance was first assessed using Test Set 1
 
(Table 3.2).
 
T
he model was used to 
predict the concentrations of the contro
lled substances in Test Set 1. The PCA scores of the 
mixtures in Test Set 1 were calculated and projected on the PCA scores plot for the training set 
(Figure 4.
16
). Test Set 1 contains mixtures of amphetamine and caffeine and of 
methamphetamine and caffein
e that have the same concentrations by weight as those in the 
training set. Ideally, the scores of the samples in Test Set 1 should completely overlay with 
scores of the corresponding samples in the training set. This is observed for the majority of the 
sa
mples (Figure 4.1
6
) with the exception of all 80% methamphetamine replicates and some of 
the 40% methamphetamine and 80% amphetamine replicates. The 80% methamphetamine 
samples in Test Set 1 have PC 1 scores that are more negative than the corresponding sc
ores in 
the training set. Visual comparison of the average of the 80% methamphetamine spectra in Test 
Set 1 against that of the training set (Figure 
A.
3) indicates that the test set mixtures have less 
intense peaks characteristic of caffeine as compared to
 
the corresponding mixtures in the training 
set, leading to more negative PC 1 scores. This also indicates that the sample preparation process 
is not readily reproducible; even though the two sets of mixtures have the same composition, 
differences in spect
ral profiles are still apparent, especially using multivariate statistics. 
 
68
 
 
Figure 4.1
6
. PCA scores plot of training set (filled in circles) with Test Set 1 plotted. 
 
 
The concentration of amphetamine and methamphetamine in Test Set 1 was predicted using
 
the amphetamine and methamphetamine regression equations (Figures 4.1
7
a and 4.1
7
b), and 
prediction errors, or RMSEPs, were determined. RMSEPs were determined similarly to RMSEV, 
except using the known and predicted concentrations of the test set. The RMSE
P for Test Set 1 
in the amphetamine regression was 3.8%. This value is lower than the validation error for the 
training set
 
(
5.4%
)
, which is unusual, given that the model should theoretically be able to predict 
concentrations of samples used in its development 
more accurately than 
those 
of 
samples in an 
69
 
 
external validation. However, the predicted concentrations of the samples in Test
 
Set 1 overlay 
on the training set data points used to generate the calibration curve, so a low prediction error is 
not surprising. Further, the PRESS contributions for 100% amphetamine and 100% 
methamphetamine in the training set are 22.2% and 28.4%, resp
ectively, which are the highest of 
all the sample mixtures. This indicates that these samples contribute most to the RMSEV, and 
therefore, the RMSEV is inflated. As Test Set 1 does not contain 100% amphetamine and 100% 
methamphetamine, it is reasonable tha
t the RMSEP values are lower than the RMSEV for the 
amphetamine regression. The RMSEP of 3.8% for amphetamine regression indicates good 
prediction ability of the PCR model. 
 
 
Figure 4.17
. Calibration curves of training set with Test Set 1 plotted for the 
(a) 
amphetamine regression
 
and (b) methamphetamine regression
.
 
S
ee Figure 4.14
 
for slope, 
intercept, and R
2
 
values.
 
 
70
 
 
The prediction ability of the methamphetamine regression was then assessed using Test Set 
1. From the 
calibration curve (Figure 4.1
7
b), a slight curvature 
is observed 
in the predicted 
concentration of the test set mixtures as a function of measured concentration. The predicted 
concentrations of the test set mixtures with lower methamphetamine content are l
ower than 
expected, especially those for the 40% methamphetamine, and the predicted concentration of 
80% methamphetamine is higher than expected. Given the scores of these mixtures on the PCA 
scores plot (Figure 4.1
6
), this trend is not surprising. The clo
se clustering of the replicates on the 
calibration curve indicates good precision despite the slightly higher prediction error of 6.9%. 
Unlike the amphetamine regression, this RMSEP value is higher than the RMSEV value for the 
methamphetamine regression (4
.2%), which is expected since the mixtures in Test Set 1 are not 
71
 
 
the same mixtures used in the training set. As stated previously, this is indicative of the non
-
reproducible nature of sample preparation. 
 
4.4.3 Test Set 2
 
The scores of Test Set 2 mixtures 
(Table 3.3) 
were calculated and projected on the PCA 
scores plot for the training set (Figure 4.1
8
). Since the concentrations of these mixtures are those 
that were not included in the training set, the scores of these mixtures are expected to lie between 
t
he scores of the training set sample. For example, the scores of the replicates of the 30% 
amphetamine should be positioned ideally between the scores of 20% amphetamine mixtures and 
those of 40% amphetamine. This is somewhat reflected in the scores plot, 
where the 30% 
amphetamine replicates lie between 20% and 40% amphetamine
.
 
H
owever, the scores of these 
replicates are more positive on PC 1 than expected, and lie closer to the 20% amphetamine 
mixtures with some overlap. 
A comparison of the average spectru
m of the 30% amphetamine 
replicates and the average spectra of the 20% and 40% amphetamine mixtures (Figure 
A.
4) was 
performed. 
Visual assessment of the 
spectra
 
indicate
s
 
that the peak intensities for those peaks 
that heavily influence the positioning on PC 1 (
i.e.
 
3150 

 
2300 cm
-
1
 
and 1200 

 
900 cm
-
1
) are 
more similar to those of the 20% amphetamine mixtures as opposed to those of the 40% 
amphetamine replicates. This obse
rvation is not surprising as solid sample preparation is not 
exact nor reproducible. Thus, the 30% amphetamine replicates overlay with some of the 20% 
amphetamine mixtures in the training set. With the exception of 50% methamphetamine, 10% 
amphetamine, and
 
30% amphetamine, the majority of the scores for Test Set 2 mixtures lie in the 
expected positions on PCs 1 and 2. 
 
72
 
 
Figure 4.1
8
. Scores 
plots of training set (filled in circles) with Test Set 2 plotted. 
 
 
Model performance was then assessed with Test Se
t 2 (Figures 4.1
9
a and 4.1
9
b). The 
amphetamine regression for Test Set 2 does not yield predicted concentrations that are ideal. The 
positioning of the Test Set 2 sample data points in the calibration curve indicates that the 
calibration data are curved; h
owever, this is not the case (Figure 4.1
9
a). The predicted 
concentrations of the 10% and 30% amphetamine mixtures in Test Set 2 are lower than expected, 
while those of the 50% and 70% amphetamine mixtures in the test set are higher than expected. 
This tren
d is consistent with that observed on the PCA scores plot (Figure 4.1
8
). Despite the 
73
 
 
slight curvature and deviations from the regression vector, the RMSEP for this test set was 4.6%, 
which is lower than the RMSEV value for the amphetamine regression (5.4%)
, and indicates that 
the model has good prediction ability. 
 
 
Figure 4.19
. Calibration 
curve of training set with Test Set 2 for the (a) amphetamine 
regression
 
and (b) methamphetamine regression
. 
S
ee Figure 4.14
 
for slope, intercept, and R
2
 
values.
 
 
74
 
 
The prediction ability of the methamphetamine regression in the PCR model was evaluated. 
A slight curvature is also observed for the Test Set 2 mixtures plotted on the calibration curve 
(Figure 4.1
9
b). The samples that 
deviate from the calibration curve are the 30%, 50%, and 90% 
methamphetamine mixtures. Using the model, the predicted concentrations for the 30% and 50% 
methamphetamine mixtures 
are 
lower than expected, and the predicted concentration of the 90% 
methamphet
amine samples is higher than expected. This observation is analogous to the trend 
observed in the PCA scores plot (Figure 4.1
8
) and can be explained by examining the 
corresponding spectra. The prediction error associated with Test Set 2 is 5.9%, which is o
nly 
slightly higher than the validation error for methamphetamine regression (4.2%), but is lower 
than the prediction error in methamphetamine regression for Test Set 1 (6.9%). This is unusual, 
given that the mixtures in Test Set 2 are those not included i
n the training set, as compared to the 
75
 
 
mixtures in Test Set 1. However, the lower RMSEP value can be explained by the deviations of 
the Test Set 2 sample mixtures from the calibration curve. The predicted concentrations for two 
of the mixtures (
i.e.
 
10% an
d 70% methamphetamine) lie on the calibration curve and those of 
the other mixtures are close to the calibration curve. The higher RMSEP value for 
methamphetamine regression in Test Set 1 is likely due to the large deviation in predicted 
concentrations for
 
the 80% methamphetamine. The slightly higher prediction error as compared 
to validation error is expected, and the low RMSEP for Test Set 2 indicates good prediction 
ability of the model. 
 

After external validation of 
the model using the two test sets, the model was applied to 


samples 
as determined by the amphetamine regression are shown in Table 4.
2
. 
 
 
Table 4.
2

regression.
 
Blind Sample
 
Predicted Concentration (%)
 
Known Concentration (%)
 
1
 
A
 
20
 
14
 
B
 
21
 
2
 
A
 
30
 
28
 
B
 
24
 
3
 
A
 
-
5
 
0
 
B
 
-
6
 
RMSEP (%)
 
5.3
 
 
In the amphetamine regression, negative predicted concentration values are again observed. 

methamphetamine mixtures in the training set, the deviation of the predicted conc
entrations of 
76
 
 
predicted concentration. As concentration values below 0% are physically impossible, these 
values should be taken as 0% amphetamine. 
 

the two samples were correctly identified as containing amphetamine based upon the predicted 

this 
sample is a binary mixture of methamphetamine and caffeine. The known amphetamine 

regression for the two replicates are 20% and 21%, respectively. The averag
e percent error 
determined for this sample is 47%, which is a substantial amount. The large percent error 
associated with this sample may be due to the difficulty with homogenizing small samples, as the 
total mass of the sample was only 5.8 mg. On the othe
r hand, the average percent error 


measured amphetamine content in both
 
mixtures are not high (
i.e.
 
14% and 28% amphetamine, 
respectively), and the calculation of percent error has a tendency to bias samples with small 
expected values. To avoid bias from percent error calculations, RMSEP was used as a measure 
of accuracy in t
he regression models. For the amphetamine regression, the prediction error was 
5.3%, which is comparable to the validation error of 5.4% associated with this calibration curve, 
indicating that the model is able to correctly identify and accurately quantify
 
the amount of 
amphetamine in different samples. 
 

4.
3

77
 
 
Samples 1 and 2 did not conta
in any methamphetamine, the expected concentration values are 
0% methamphetamine; the negative concentration values (
e.g.
 
-
2% methamphetamine) 

quantified, as the percent
 
error for the two replicates are 3% each. The RMSEP for the set of 

associated with the methamphetamine calibration curve. The low prediction error may arise from 
the i


the methamphetamine regression. As the majority of the values used to calcul
ate RMSEP are 

regression is small. However, it is apparent that the methamphetamine calibration curve is able to 
identify the presence of methamphetamine as well a
s quantify methamphetamine content with 
high accuracy. 
 
 
Table 4.
3

methamphetamine regression.
 
Blind Sample
 
Predicted Concentration (%)
 
Known Concentration (%)
 
1
 
A
 
-
2
 
0
 
B
 
-
2
 
2
 
A
 
-
2
 
0
 
B
 
-
1
 
3
 
A
 
38
 
39
 
B
 
41
 
RMSEP (%)
 
1.7
 
 
4.4.5 Summary
 
An internal validation was performed on the PCR model that was developed for predicting 
the concentrations of amphetamine and methamphetamine in sample mixtures. Validation errors 
78
 
 
of 5.4% and 4.2% for amphetamine and methamphetamine, respectively, were obtained. Good 
linearity was observed for both curves, with correlation coefficients of 0.988 and 0.993, 
respectively. The performance of the model was evaluated using two test sets; 
one with mixtures 
at the same concentrations as those used in the training set, and another with concentrations not 
used in the training set. Based on the prediction errors generated (between 3.8% and 6.9% for 
both test sets on both curves), the prediction
 
ability of the model was acceptable. The 
applicability of the model to identify and quantify controlled substance in a single analysis 

assessed using prediction error. T
he low prediction errors for these samples indicate that the PCR 
model is able to accurately identify and predict the concentrations of amphetamine and 
methamphetamine in sample mixtures.  
 
 
79
 
 
APPENDIX
 
 
80
 
 
Figure 
A.
1. Loadings plots for (a) PC 1 and (b) PC 2 corresponding to the baseline
-
corrected spectra of the first collection of amphetamine mixtures. 
 
 
81
 
 
Figure A.2. Loadings plots for (a) PC 8 and (b) PC 9 corresponding to the pretreated 
spectra of the traini
ng set mixtures.
 
 
82
 
 
Figure 
A.
3. Average spectra of 80% methamphetamine mixtures in training set and in Test 
Set 1. 
 
 
83
 
 
Figure 
A.
4. Average spectra of 20% amphetamine and 40% amphetamine from the 
training set as well as the average spectrum of 30% amphetamine from Test Set 2. 
 
 
84
 
 
REFERENCES
 
 
85
 
 
R
EFERENCES
 
 
1.
 
Dhanoa M, Lister S, Sanderson R, Barnes R. The link between multiplicative scatter 
correction (MSC) and standard normal variate (SNV) transformations of NIR spectra. Journal of 
Near Infrared Spectroscopy. 1994;2(1):43
-
7.
 
 
86
 
 
Chapter 
5 Conclusions and Future Work
 
 
5.1 Conclusions
 
 
A PCR model was developed to identify and quantify controlled substances in simulated 
samples with a single analysis. Prior to model development, the optimal set of data pretreatment 
procedures was determined
 
by visually examining spectra and quantifying the improvement in 
the cluster of replicates in the PCA scores. After PCA was performed on the training set, the PCs 

-

del 
performance was evaluated using an internal validation method as well as an external validation 
with two test sets that had similar controlled substance and caffeine content by quantifying 
validation and prediction errors. Not surprisingly, the model p
erformed well and was able to 

samples to ascertain its ability to identify and quantify controlled substances in the mixtures. 
Correct identification and low pred
iction errors were observed, indicating good model 
performance. 
 
Conventionally, forensic analysts have used ATR
-
FTIR for preliminary screening of 
controlled substances in submitted samples despite its ability to provide definitive identification. 
Instead,
 
analysts have preferred gas chromatography
-
mass spectrometry for identification of 
controlled substances, as both separation and identification of components is achieved in a single 
analysis. However, it is advantageous for forensic practitioners to utili
ze multivariate statistics in 
conjunction with ATR
-
FTIR for identification and quantification of multiple controlled 
substances in a single analysis without separation. 
 
 
87
 
 
5.2 Future Work
 
Despite good model performance, the model was developed using simple
 
binary mixtures; it 
is necessary to broaden the scope of the model to accurately identify and quantify a wider range 
of controlled substances in more complex mixtures that are typically observed in a forensic 
setting. More complex samples, such as tertiar
y mixtures that include other common adulterants 
or mixtures that include multiple controlled substances should be investigated. Furthermore, 
more controlled substance mixtures should be included in the model, such as mixtures that 
contain cocaine, heroin,
 
morphine, and so on, to develop a more comprehensive model that will 
allow identification and quantification for a larger sample set. 
 
 
88
 
 
Part II. Development of Mass Defect Filters for the Classification of Novel Synthetic Designer 
Drugs
 
 
Chapter 
6 Introd
uction
 
 
Synthetic designer drugs are psychoactive analogs of traditional controlled substances (
e.g.
 
amphetamine, marijuana, etc.). The abuse of these drugs has increased in the U.S. since 2009, 
with more than 95,000 arrests annually for sale, manufacture, or possession of synthetic drugs 
(1)
. Over 200 synthe
tic designer drugs have been encountered by U.S. law enforcement since 
their emergence 
(2)
, leading forensic laboratories to receive more questioned sam
ples that 
contain them for analysis and identification. Synthetic drugs are widely available commercially, 

(2)
. The primary concern 
with these drugs is the frequency with which novel synthetic drugs emerge on the illicit market 
once more established analogs are regulated under the Controlled Substance
s
 
Act. A multitude of 
analogs are ab
le to be synthetized quickly and frequently with only minor differences in 
compound structure 
(2)
. But because of the turnover frequency with the synthe
sis and emergence 
of these drugs, the identification of novel compounds in forensic laboratories is crucial. 
 
Current forensic analysis of controlled substances utilize
s
 
gas chromatography
-
mass 
spectrometry with a single quadrupole mass analyzer (GC
-
QMS) a
nd the technique is considered 
the gold standard for definitive identification 
(3)
. The chemical information of the compound is 
obtained in the form of a mass spectrum, which plots the ion abundances as a function of mass
-
to
-
charge (
m/z
) ratio, and the patterns of the ions provide
 
unique information from which the 
structure of the compound can be determined. The ions observed have unit mass resolution, 
meaning that only nominal mass is obtained. Definitive identification of controlled substances 
also requires a comparison of the ma
ss spectrum of the unknown in a sample to that of a 
reference standard 
(3)
. Often times, the appropriate reference standard is selected based upon an 
89
 
 
initial comparison of the mass spectrum of the unknown to spectra in a database, whether it is 
one generated in
-
house or a commerci
ally available database. The mass spectrum of the 
reference standard is then generated using the same analytical parameters as that for the 
unknown and the mass spectra of the two are compared. This is typically sufficient for the 
analysis of traditional c
ontrolled substances, as the mass spectra of traditional controlled 
substances are well characterized and available in databases. However, due to the turnover rate 
of novel synthetic drugs, reference standards may not be readily available for comparison, n
or 
would their mass spectra be found in a database, thus necessitating the need for analysts to 
elucidate the structure of the synthetic designer drug using only the mass spectrum.  
 
Complicating the structural elucidation process is the structural similar
ity among classes of 
synthetic designer drugs along with the variety of compounds found within each class. Among 
the most popular synthetic designer drugs are synthetic cannabinoids, synthetic phenethylamines, 
and synthetic cathinones 
(2)
. The classes of interest in this research are the latter two compound 
classes, which have core structures as shown in Figure 6.1. The most distinctive difference 
between
 
the two classes is the presence of a carbonyl functional group positioned on the beta
-
carbon to the amine group; the structures are otherwise extremely similar to one another. 
Furthermore, these compounds can differ in the identity and the position of the
 
substituents 
(2)
, 
but these differences may not be captured in a chemical profile obtained by GC
-
QMS, which 
limits the utility of current methods of an
alysis for definitive identification of synthetic designer 
drugs.   
 
 
90
 
 
(a)
 
 
(b)
 
Figure 6.1. Core structure of (a) phenethylamine and (b) cathinone with possible 
substitution sites 
designated with R
n
. 
 
 
The challenges of using conventional low
-
resolution GC
-
MS for the analysis of synthetic 
designer drugs are illustrated with Figure 6.2, which display mass spectra of two cathinones 
obtained using GC
-
QMS. The two cathinones are 

-
pyrro
lidinopropiophenone (

-
PPP) and 3
-
methyl
-

-
pyrrolidinopropiophenone (3
-
methyl PPP). It is evident that the two compounds 
contain the core cathinone structure, and the only difference between the two structures is the 
methyl group on the benzene ring for 3
-
methyl PPP. As stated previously, this example is 
common in synthetic designer drugs, where compounds differ only in the identity and the 
position of the substituents 
(2)
. Despite this difference, the mass spectra are highly similar; 
neither exhibits a molecular ion (expected at 
m/z
 
203 for 

-
PPP and 
m/z
 
217 for 3
-
methyl PPP), 
91
 
 
and the two compounds have a base peak at 
m/z
 
98. Furthermore, few ions are obs
erved in both 
mass spectra, which limits the chemical information obtained from the spectra. As a result, 
definitive identification of these compounds using the corresponding mass spectra obtained using 
GC
-
QMS is not likely. 
 
 
Figure 6.2. Mass spectrum ob
tained using GC
-
QMS of (a) 

-
pyrrolidinopropiophenone (

-
PPP)
 
and of (b) 3
-
methyl
-

-
pyrrolidinopropiophenone (3
-
methyl PPP).
 
 
92
 
 
Other analytical methods for the identification of synthetic designer drugs have been 
investigated for research purposes. In these studies, high
-
resolution mass spectrometry (HRMS), 
and in particular, liquid chromatography
-
mass spectrometry (LC
-
MS), is use
d to provide 
chemical information necessary for definitive identification 
(4, 5)
. Unlike GC
-
QMS, HRMS is 
able to measure the 
m/z
 
ratio of an ion accurately to the
 
milli
-
Dalton and sub
-
milli
-
Dalton place. 
In doing so, the exact mass of a compound can be obtained. The exact mass information is then 
used to determine the chemical formula, from which definitive identification of the compound is 
possible. However, these
 
studies have used HRMS in combination with other techniques, such as 
Fourier transform infrared (FTIR) spectroscopy and nuclear magnetic resonance (NMR) 
spectroscopy, which are techniques that allow for unequivocal identification of compounds. It is 
appar
ent that any one technique is not sufficient for definitive identification of novel synthetic 
designer drugs. The challenge for forensic analysts in using multiple techniques for structural 
93
 
 
elucidation is the lack of access to the instrumentation, as well 
as the time
-
consuming nature of 
the elucidation process itself. 
 
Therefore, it is advantageous to utilize other tools to prioritize the analysis of synthetic 
designer drugs in questioned samples. A simple and rapid method for classification to compound 
cla
ss can be considered, as the identification of novel compounds is facilitated once structural 
class is determined. Classification to structural class is a preliminary step to unequivocal 
identification of novel synthetic designer drugs. Zuba outlined a cla
ssification scheme to 
distinguish between different classes of synthetic designer drugs based upon mass spectral data 
obtained 
via
 
GC
-
QMS; however, it was concluded in this study that HRMS and other 
complementary techniques are needed for further analysis 
(6)
. 
 
Alternatively, a technique that has been investigated for classification to compound clas
s is 
mass defect filtering. Mass defect refers to the fractional portion of the exact mass. Filtering 
using mass defect is a screening tool that is applied post
-
data acquisition to obtain more chemical 
information, such as compound class. This technique ne
cessitates the use of HRMS, as the mass 
defect is only obtained from exact, or accurate mass. Grabenauer 
et al.
 
used a combination of 
HRMS and mass defect filtering in order to classify a group of synthetic cannabinoids in sample 
mixtures 
(7)
. It was determined that the majority of the compounds in the class contained an 
in
dole core structure and had mass defect values within the range 0.135 

 
0.235 Da. This filter 
was then input into the data acquisition software and applied to chromatographic and mass 
spectral data of sample mixtures containing synthetic cannabinoids. Afte
r application of the 
filter, components in the sample mixtures that had mass defect values within the range were 
observed while components with mass defect values outside of the specified range were 
removed. Using this technique, successful classification 
based upon mass defect was possible. 
94
 
 
Furthermore, co
-
eluting chromatographic peaks were resolved and additional synthetic 
cannabinoids previously masked in the analysis were identified with this technique 
(7)
. From this 
study, it is apparent that mass defect filtering is a powerful tool. However, this technique has not 
been
 
applied to other classes of synthetic designer drugs, namely the phenethylamine and 
cathinone classes, which show high structural similarity. 
 
Other types of mass defects have been shown to be useful for classification purposes. 
Kendrick mass defect (KMD)
 
is used to identify compounds in a class that span a wide range of 
masses. Its original purpose was to characterize compounds with a wide range of masses to a 
class known as a homologous series 
(8)
. This is not feasible with ma
ss defect, as the range of 
masses in a homologous series can span an order of magnitude or more, meaning that the mass 
defect values associated with this range of masses are vastly different. KMD can be used since it 
is characteristic of a series of compou
nds differing only in the number of methylene (CH
2
) 
groups (
e.g.
 
butane, pentane, hexane, etc.). Hughey 
et al.
 
utilized KMD to group compounds 
found in petroleum crude oil into different classes, types, and alkylation series, thus, simplifying 
the identifi
cation of compounds in the mixture 
(9)
. This type of mass defect is potential
ly useful 
for discrimination of phenethylamines and cathinones for classification across a large mass 
range. 
 
A third type of mass defect reported to be useful for classification purposes is relative mass 
defect (RMD). This scale is based upon mass defect,
 
but is normalized to the exact mass of the 
compound and rescaled to a parts per million (ppm) range. The RMD value of a compound is 
indicative of its fractional hydrogen content, as hydrogen contributes substantially to mass 
defect. A high RMD value (
e.g.
 
800 ppm) is indicative of a compound with a high hydrogen 
content, termed hydrogen
-
rich, whereas, a low RMD value (
e.g.
 
200 ppm) is indicative of a 
95
 
 
compound with a low hydrogen content but larger oxygen content, termed oxygen
-
rich 
(10)
. 
Using RMD, compounds can be rapidly screened by applying a filter charact
eristic of a structural 
class to identify compounds for further investigation. Stagliano 
et al.
 
utilized this technique to 
filter out all compounds not of interest and retain all compounds consistent with a lipid, which 
was the class of interest, with RMD 
values that ranged from 600 

 
1000 ppm 
(10)
. With this 
technique, not only are similar compounds classified together, but the patterns of fragment ions 
across a 
m/z
 
range 
are also able to be utilized to group similar compounds to a class 
(11)
. Thus, 
this type of mass defect filter is potentially useful for classification of phenethylamines and 
cathinones, not only using the molecular ions of these compounds, but also including fragment 
ion informati
on. 
 
From the three different types of mass defects, it is evident that different chemical 
information in a compound is probed by each mass defect filter. It may be advantageous to 
utilize all three types of mass defect filter in a classification scheme. T
his would allow better 
discrimination between two classes of synthetic designer drugs that show high structural 
similarity, thus facilitating identification of novel synthetic drugs. 
 
The objective in this research was to devise methods to overcome the lim
itations in the 
analysis of synthetic designer drugs, and in particular, those in the phenethylamine and cathinone 
classes. To that end, the first goal was to investigate the utility of high
-
resolution mass spectral 
data for forensic practitioners to use f
or comparison to chemical profiles of unknown 
compounds. All reported HRMS analyses in conjunction with mass defect filtering thus far have 
utilized liquid chromatography
-
mass spectrometry (LC
-
MS); however, the instrumentation 
associated with this analytic
al method is typically not available to forensic practitioners. 
Furthermore, a crucial difference between LC
-
MS and GC
-
MS is the ionization method 
96
 
 
associated with this technique; the soft ionization used in LC
-
MS produces minimal 
fragmentation while the ha
rd ionization in GC
-
MS allows for a multitude of fragment ions to be 
generated in a mass spectrum. As the chemical and structural information provided by the 
fragment ions are crucial for identification of novel compounds, a technique that incorporates 
HRM
S and the hard ionization needed to generate fragment ions is advantageous. Gas 
chromatography
-
mass spectrometry with a time
-
of
-
flight mass analyzer (GC
-
TOFMS) is ideal 
for this research, as HRMS data 
are
 
obtained, which provides the accurate mass informat
ion 
necessary for classification 
via
 
mass defect filtering. Further, the hard ionization method in GC
-
TOFMS provides the chemical information from fragment ions that is similar to the information 
obtained with a low
-
resolution mass spectrometer currently u
sed in forensic laboratories. Utility 
of HRMS was investigated by analyzing phenethylamine and cathinone reference standards 
using GC
-
QMS and GC
-
TOFMS to determine whether HRMS data would be comparable to 
mass spectra obtained by low
-
resolution mass spectr
ometry. Substantial similarity in mass 
spectra obtained by these two techniques would indicate the applicability of HRMS in assisting 
practitioners in the identification of novel synthetic drugs 
via
 
comparison to mass spectral data 
generated using low
-
resolution mass spectrometry. The second goal was to develop mass defect 
filters using absolute, Kendrick, and relative mass defects to allow for rapid classification of 
novel drugs to the phenethylamin
e or cathinone classes. These filters were developed using sets 
of phenethylamine and cathinone reference standards. Filter performance was then evaluated 
using test sets consisting of other phenethylamine and cathinone standards. 
 
The success of this rese
arch will result in tools that can then be incorporated into a 
classification scheme and find application in prioritizing the analysis of different questioned 
samples in forensic laboratories. By incorporating mass defect filters into the classification 
97
 
 
sc
heme, chemical information of unknown compounds can quickly be obtained and a 
preliminary classification to compound class can be made, which will allow for further resources 
to be directed towards identification. 
 
 
98
 
 
REFERENCES
 
 
99
 
 
R
EFERENCES
 
 
1.
 
FBI. Crime in the U.S. 2013: Estimated Number of Arrests. FBI; 2015 [updated 2015; 
cited]; Available from: 
https://www.fbi.gov/about
-
us/cjis/ucr/crime
-
in
-
the
-
u.s/2013/crime
-
in
-
the
-
u.s.
-
2013/tables/table
-
29/table_29_estimated_number_of_arrests_united_states_2013.xls
.
 
 
2.
 
DEA. 2014 National Drug Threat Assessment Summary; 2015 Contract No.: Document 
Number|.
 
 
3.
 
SWGDRUG. SWGDRUG Recommendations Version 7
-
0,  Contract No.: Document 
Number|.
 
 
4.
 
Uchiyama N, Shimokawa Y, Kawamura M, Kikura
-
Hanajiri R, Hakamatsuka T. 
Chemical analysis of a benzofuran derivative, 2
-
(2
-
ethylaminopropyl)benzofuran (2
-
EAPB), 
eight synthet
ic cannabinoids, five cathinone derivatives, and five other designer drugs newly 
detected in illegal products. Forensic Toxicol. 2014 2014/08/01;32(2):266
-
81.
 
 
5.
 

-
dimethoxy
-
3,4
-
dimethyl
-

-
pheneth
ylamine (2C
-
G) 

 
A new designer drug. Drug Testing and Analysis. 2013;5(7):549
-
59.
 
 
6.
 

mass spectrometric methods. TrAC Trends in Analytical Chemistry. 2012 2//;32(0):15
-
3
0.
 
 
7.
 
Grabenauer M, Krol WL, Wiley JL, Thomas BF. Analysis of Synthetic Cannabinoids 
Using High
-
Resolution Mass Spectrometry and Mass Defect Filtering: Implications for 
Nontargeted Screening of Designer Drugs. Analytical Chemistry. 2012 2012/07/03;84(13):
5574
-
81.
 
 
8.
 
Kendrick E. A Mass Scale Based on CH2 = 14.0000 for High Resolution Mass 
Spectrometry of Organic Compounds. Analytical Chemistry. 1963 1963/12/01;35(13):2146
-
54.
 
 
9.
 
Hughey CA, Hendrickson CL, Rodgers RP, Marshall AG, Qian K. Kendrick Mass Def
ect 

-
Resolution Broadband Mass Spectra. 
Analytical Chemistry. 2001 2001/10/01;73(19):4676
-
81.
 
 
10.
 
Stagliano MC, DeKeyser JG, Omiecinski CJ, Jones AD. Bioassay
-
directed fractionation 
for discovery of 
bioactive neutral lipids guided by relative mass defect filtering and multiplexed 
collision
-
induced dissociation. Rapid Communications in Mass Spectrometry. 
2010;24(24):3578
-
84.
 
 
11.
 
Ekanayaka EP, Celiz MD, Jones AD. Relative mass defect filtering of mass 
spectra: a 
path to discovery of plant specialized metabolites. Plant physiology. 2015;167(4):1221
-
32.
 
 
100
 
 
Chapter 
7 Theory
 
 
7.1 Gas Chromatography
-
Mass Spectrometry (GC
-
MS)  
 
 
Gas chromatography
-
mass spectrometry (GC
-
MS) enables both separation and identif
ication 
of compounds in complex mixtures in a single analysis by coupling a gas chromatograph, which 
performs the separation, and the mass spectrometer, which is the detector 
(1)
. This technique is 
widely used in forensic chemistry for the analysis of questioned drug samples, ignitable
 
liquids 
and fire debris, and other types of evidence where the separation and subsequent identification of 
compounds in complex mixtures is necessary.
 
7.1.1 Chromatography
 
The two most important components of chromatography are the mobile and stationary 
p
hases. The physical state of the mobile phase defines the type of chromatography that is used 
(1)
. A gaseous mobile phase is indicative of gas chromatography while liquid chromatography 
utilizes a liquid mobile phase. Separation of compounds can be performed by introducing the 
solution
 
of analytes into the mobile phase and allowing the mobile phase to flow across the 
stationary phase. This is typically done in a column, where the stationary phase is located inside 
the column, and 
the mobile phase flows through the column at a certain ra
te.
 
As the analytes 
travel through the column, interactions between the analytes and the stationary phase occur. The 
extent of these interactions differ among analytes, which leads to physical separation of the 
analytes in the same sample as they travel th
rough the column. 
 
7.1.2 Gas Chromatography
 
In GC, the mobile phase is the inert carrier gas and the stationary phase is a thin film coated 
on the inside wall of the capillary column. In order for the analytes to be separated, the sample 
must first be load
ed onto the column. 
 
101
 
 
Compounds in solution are first introduced into the gas chromatograph (see Figure 7.1) 
via
 
a 
syringe into a hot injection port (
e.g.
 
at 250 
°
C), where they are quickly volatilized 
(1)
. A steady 
flow of the carrier gas (
e.g.
 
at a rate of 1 mL/min), such as helium, i
nto the injection port moves 
the gaseous compounds onto the column. The analytes spend a certain amount of time in the 
column depending on the conditions of the mobile and stationary phase as well as the retention 
mechanism 
(1)
. Retention of the compounds in the column in GC is based upon 
the vapor 
pressure of the analyte as well as 
partitioning into the stationary phase. 
Compounds with higher 
vapor pressure are less retained in the column than those with lower vapor pressures, 
and can 
then elute in less time. 
More interaction between the analytes and the stationary phase lead to 
longer retention in the column while analytes that do not have strong interactions with the 
stationary phase are less retained and will elute from the c
olumn in a shorter amount of time 
(1)
. 
The elution of compounds from the column is defined as the time taken to leave the column and 
reach the detector from when the sample is injected. The retention time and abundance of each 
analyte can then be plotted in a chromatogram. 
 
 
Figure 7.
1. Schematic of a gas chromatograph attached to a detector. 
 
 
The separation of compounds by gas chromatography can be performed either at a fixed 
temperature (isothermal) or by using a temperature gradient (temperature programming) 
(1)
. 
102
 
 
Using the former method, 
the temperature is cons
tant throughout the analysis. A
nalytes with 
high vapor pressures and boiling points below that of the fixed temperature used in the analysis 
will elute rapidly from the column and the likelihood of co
-
elution with unretained species is 
high. On the other h
and, analytes with high boiling points are more likely to remain in the 
column, leading to increased analysis times. With temperature programming, a temperature ramp 
at a constant rate can be used, and this method is considered to be advantageous for analy
tes that 
span a range of boiling points. It is also a high
-
throughput method as the analytes elute based on 
both boiling point
 
and partitioning
, allowing for reduced analysis times for a wider range of 
analytes 
(1)
. 
 
7.1.3 Separation Efficiency
 
The efficiency of the separation can be 
defined by the resolution between two closest eluting 
compounds in the chromatogram as well as the overall peak shape of the compound. Optimized 
separations are those where the compounds of interest are well
-
separated with narrow Gaussian
-
like peaks 
(Figur
e 
7.2
)
 
(1)
. These 
conditions c
an be influenced by diffusion, mass transfer
,
 
and 
equilibration between the two phases. All of these effects contribute to peak broadening and can 
affect the separation of compounds that elute close to one another. 
 
 
Figure 7.2. Example 
chromatogram of a 4
-
component mixture, displaying ideal separation
 
(
i.e.
 
Gaussian peak shapes and baseline
-
resolved peaks)
. 
 
 
103
 
 
A thin film of polymer as a stationary phase is typically used in GC. This film is coated on 
the inside wall of the column, and compounds can diffuse into and out of the stationary phase 
depending on their affinity for the polymer 
(1)
. Typical polymers that are used as stationary 
phase
s include silic
one
 
with various functional groups attached such as methyl groups and 
phenyl groups. The different functional groups possess different polarities that influence the 
types of interactions that can occur with the analytes, and subsequently, af
fect retention time. For 
example, a more polar stationary phase (
e.g.
 
polyethylene glycol) will have more interactions 
with a polar analyte whereas a more nonpolar stationary phase (
e.g.
 
polymethylsiloxane) will 
have less interactions with the same analyte
 
and the analyte will elute from the column in a 
shorter amount of time 
(1)
. In order to ensure that all of the analytes are able to reach the 
stationary phase, especially the compounds that are traveling along the center of the column, the 
diameter of the column must be small; general
ly, capillary columns have internal diameters in 
the sub
-
millimeter range. 
 
Diffusion can also increase band broadening due to analyte molecules partitioning into and 
out of the stationary phase at slightly different rates. Ideally, all molecules of the sa
me analyte 
should travel through the column as a narrow band; however, depending on the extent of 
diffusion for different analytes, the analytes may not travel in a narrow band towards the end of 
the column
 
(1)
. 
This is due to the large diffusion coefficients for analytes in the gas ph
ase; as the 
analytes travel through the column, longitudinal diffusion occurs, which allow the band of 
analytes to widen, resulting in broadened peaks. 
Band broadening due to diffusion can be 
reduced by using a shorter column so that the extent of longitud
inal diffusion is reduced as the 
analytes travel through the column 
(1)
. 
 
104
 
 
Another method to reduce band broadening due to diffusion is to increase the flow rate of the 
mobile phase.
 
An increase in flow rate results in the reduced retention of analytes in the column, 
as there is less ti
me for the analytes to diffuse and form wider bands. 
A caveat to increasing the 
flow rate, however, is the limited mass transfer that occurs with faster mobile phase flow. Mass 
transfer of analytes between the two phases occurs naturally due to the need fo
r the analytes in 
the two
-
phase system to reach dynamic equilibrium 
(1)
. However, if the flow rate is increased, 
the time needed to reach equilibrium is not reached, and therefore, the band of analytes become 
broadened as they move in and out of the two phases.
 
Resistance to mass trans
fer is dependent on the analyte. 
The analytes most affected by 
mass 
transfer 
are those with a stronger affinity for the stationary phase; as more of these molecules 
diffuse into and out of the stationary phase, this analyte travels through the column at a 
slower 
rate and the molecules are more likely to be spread over a wider range
 
(1)
. Band broadening due 
to 
resistance to mass transfer 
can also be compounded by increased film thickness of the 
stationary phase. Despite being able to increase sample load, thicker films limit the diffusio
n 
rates of the analytes, since the analytes will require more time to leave the stationary phase, 
which can lead to broader peaks 
(1)
. Therefore, it may be advantageous to use columns with 
stationary phase films that are thin; common film thicknesses are in the sub
-
micrometer range. 
 
Another point of concern in chromatography is the amount of sample injected into the 
column 
(2)
. The issue of sample overloading was briefly discussed above in conjunction with 
film thickness. As the separation of analytes is dependent on the partitioning of the analytes in
to 
the stationary phase, the thickness of the film is a concern. The film contains 
stationary phase 
polymers 
that are available for interactions with analytes. If a large volume of sample is loaded 
onto the column, the analytes
 
may not be able
 
to partition
 
into
 
the stationary phase film
, and as 
105
 
 
such, some of the analytes may be left to flow through with the mobile phase, resulting in earlier 
elution out of the column 
(2)
. This phenomenon manifests itself in chromatograms as fronting, 
where the peak is broadened at the front,
 
leading to an earlier retention time, and rises gradually 
to the apex
 
(Figure 
7.3
). 
While thicker films do minimize fronting, peak broadening due to 
analytes requiring more time to diffuse into and out of the thick films can occur. Therefore, it is 
necess
ary to reduce the amount of sample loaded onto the column; a split injection is typically 
used to eliminate fronting 
(2)
. This type of injection utilizes a split valve that is located in the 
injection port. When the syringe injects the sample into the liner in the injection
 
port, a pre
-
determined volume of sample is 
moved
 
to the split valve and eventually flows to waste while the 
other portion of the sample is successfully introduced onto the column. By doing so, the 
likelihood of sample overload is reduced 
(2)
. Typical split ratios in these 
injections range from 
25:1 to 100:1; the former indicates that 1 part of sample is introduced onto the column while the 
other 25 parts of sample flow to waste. The latter split ratio indicates that 1 part of the sample is 
loaded onto the column while the o
ther 100 parts of the sample flow to waste. 
 
 
Figure 7.3. Example of a peak showing fronting. 
 
 
7.1.4 Mass Spectrometry
 
Once the separated analytes elute from the column, they are introduced into a detector 
through a heated transfer line, which interface
s the gas chromatograph and the mass 
106
 
 
spectrometer. In GC
-
MS, the detector is a mass spectrometer, from which each analyte can be 
detected and identified 
via
 
mass information. Since the mass spectrometer can only analyze ions, 
the analytes from the gas chro
matograph must first be ionized. Ions of each analyte are created in 
the ion 
source (Figure 
7.4
)
 
(3)
. In the case of electron ionization, which is often the ionization 
method that is used in GC
-
MS, ions are created once fast
-
moving electrons h
ave accelerated past 
the analyte molecules. Th
e electrons are continuously generated in the ion source with 70 eV of 
energy. Some of the kinetic energy of the electrons is then imparted to the analyte, and an 
electron is ejected, creating a positively
-
char
ged radical. However, the molecule can still have 
excess energy after ejecting an electron, and this manifests as vibrational energy. With these 
vibrations in the molecule, the weakest bonds are likely to break in order to reduce instability to 
the molecul
e, leading to the formation of various fragment ions that are more stable 
(3)
. The ions 
are then introduced to the mass analyzer 
via
 
a positively
-
charged repeller located at 
one end
 
of 
the ion source and a negatively
-
charged focusing lens positioned between the ion source and the 
mass analyzer. 
 
 
Figure 7.4. Schematic 
of an ion source for electron ionization. 
 
 
107
 
 
The mass analyzer in the mass spectrometer is the region in which the io
ns are distinguished 
based upon mass
-
to
-
charge information. The accuracy by which the mass
-
to
-
charge (
m/z
) 
ratio 
of 
the ions is detected is dependent on the resolution of the mass analyzer 
(3)
. Discussions on mass 
resolution and the mechanism 
by which mass information is obtained by the mass analyzer are 
detailed in Section 7.1.4.1. 
 
Once the mass
-
to
-
charge information of ions is known, the ions travel to a detector in order 
to measure the quantity of each ion at the specific mass
-
to
-
charge
 
rat
io
. Common detectors in 
mass spectrometers include an electron multiplier and a microchannel plate (MCP). 
 
As the name suggests, an electron multiplier detects and quantifies electrons as well as 
increases
,
 
by a fixed number, 
the electrons that are detecte
d 
(4)
. Through a continuous dynode 
that is connected to a power source, the ions that come through the mass analyzer collide ag
ainst 
the walls of the dynode and each collision results in the emission of a secondary electron. This 
electron can then go on to collide against the walls of another region of the dynode which 
generates between 1 and 3 electrons in a process called second
ary emission 
(4)
. This process 
continuously occurs until the electrons that are generated for each ion are on the order of 2
6
. 
The 
electrons are 
passed through a resistor and are 
then converted into a voltage, and the magnitude 
of the voltage is indicative of the abundance of each ion. 
 
The microchannel plate utilizes a similar mechanism to the electron multiplier; however, 
there 
are many smaller dynodes that are located inside each channel on a plate that allow for 
more ions to be simultaneously 
detected
 
(5)
. A
n
 
MCP detector usually contains over a million 
channels, all of which have diameters in the single micron range and contain 
dynodes. Secondary 
electron emission occurs once ions collide against the walls of the dynode and continue to occur 
until there is approximately 10
6
 
electrons generated for each ion 
(6)
. Similar to the electron 
108
 
 
multiplier, the current from the electron flow is converted into a voltage, which corresponds to 
the abundance of an ion. 
 
Due to the high gain associated with the MCP detector, the saturation limit for ion detection 
is 1 
×
 
10
4
 
counts 
(7)
. Once the abundance of an ion with a certain mass
-
to
-
charge 
ratio 
surpasses 
this limit, the MCP enters into detector dead time, and is not able to accurately count the 
subsequent ions that reach the 
detector. This is particularly detrimental in high
-
resolution mass 
spectrometers that depend upon time to determine the mass
-
to
-
charge ratio of the ions. Further 
discussion of high
-
resolution mass spectrometers is detailed in Section 7.1.4.3. Saturation of
 
the 
detector manifests itself in the mass spectrum with peaks shifted to the left, or to a lower mass
-
to
-
charge
 
ratio
 
(7)
. When detector dead time occurs, all ions that reach the detector at this time 
are not counted, re
gardless of the mass
-
to
-
charge
 
ratio
, and the recovery from this state may not 
be rapid enough to be able to count the ions of a different mass
-
to
-
charge
 
ratio
, depending on the 
similarity of the mass
-
to
-
charge ratio of the new ion compared to that of the 
previous ion 
(7)
. 
This leads to lower mass accuracy, which is not desired in high
-
resolution mass spectrometers. 
 
A technique that is used to overcome detector saturation at 10
4
 
counts is dynamic range 
extension 
(6)
. Dynamic range is defined as the range of abundances for which the signal count is 
linear. The lower limit of the dynamic range is the detection limit and the upper limit is the 
satu
ration limit. A 
universal 
detector generally possesses a wide dynamic range in order to 
analyze samples in which a large range of abundances is present 
(8)
. The electron multiplier has 
a wide dynamic range that spans approximately 7 orders of magnitude, whereas the dynamic 
range of the microchannel plate is 4 orders of magnitude. But with dynamic range e
xtension, the 
saturation limit can be increased in order to preserve linearity in signal counting at the higher 
abundances 
(6)
. 
 
109
 
 
This can be done by replacing saturated data wit
h unsaturated data that 
are
 
then corrected 
with a magnification factor 
(6)
. When dynamic range extension is selected, the instrument 
automates the scanning so that the analysis 
parameters alter between those in a normal scan and 
those for dynamic range extension. The replacement of saturated data only occurs for an ion 
when the abundance is 
above 
the saturation limit in the spectrum. The magnification factor is 
applied by changin
g the potentials at various lenses in the region of the mass spectrometer 
between the ion source and the mass analyzer 
(6)
. Adjustments in the potential will result in the 
defoc
using o
f
 
the ion beam that passes through the lens, which leads to decreased ion intensity. 
The adjusted potential is related to a magnification factor in the lens 
(6)
. For exam
ple, if the 
magnification factor for dynamic range extension were set to 40, the lens potential would be 
adjusted by an amount that corresponds to an ion intensity reduction of 40 times. The defocused 
ion beam would then be sent to the mass analyzer and re
ach the MCP detector, resulting in a 
signal count. This signal count would then be corrected with the magnification factor. Using this 
technique, the saturation limit of the MCP detector can be extended from 1 
×
 
10
4
 
counts to 
         
4 
×
 
10
5
 
counts 
(6)
. 
 
7.1.4.1 Resolution in Mass Spectrometry
 
 
Resolution in mass spectrometry is characterized by the ability of the mass analyzer to 
distinguish between ions of similar mass
-
to
-
charge ratio 
(3)
. The resolution of a mass 
spectrometer can be determined by using the following equation (Eq. 
7.1) in conjunction with 
two adjacent peaks in a mass spectrum 
(Figure 
7.5
):
 
 
(7.1)
 
where M is the mass of the first peak and 

m is the difference between the masses of adjacent 
peaks
 
that are resolved
 
(3)
. Highe
r resolution values are desirable as they indicate better 
110
 
 
discrimination between two adjacent peaks. In low
-
resolution mass spectrometers, the mass 
analyzer is typically only able to distinguish ions that differ by 1 atomic mass unit or 1 Da. Low
-
resolutio
n mass spectrometers typically have resolution values on the order of 10
2
, whereas high 
resolution mass spectrometers can have resolution values ranging from 10
3
 

10
5
. 
 
 
Figure 7.5. Example 
mass spectrum illustrating mass resolution.
 
 
7.1.4.2 Low
-
Resolut
ion Mass Spectrometry
 
An example of a mass analyzer with low or unit resolution is the quadrupole. This mass 
analyzer consists of four conducting rods that run parallel with the ion path and are arranged in 
the configuration 
shown in 
Figure 7.6 
(
3)
. The 
charges on the top and bottom rods are the same, 
and they are different from the right and left rods. The rods are connected to an alternating 
current (AC) power source that produces radio frequency (RF) voltages as well as a direct 
current (DC)
 
power source 
(3)
. For example, at one moment in time, the top and bottom rods will 
be positively charged while the right and left rods will be negatively charged. In the next 
moment, these charges will change so that the right and left rods w
ill be positively charged while 
the top and bottom rods will be negatively charged. This configuration and the alternating 
charges allow for the ions to take on a circular path as they travel through the rods. Furthermore, 
the quadrupole can selectively fi
lter out ions based on the RF and DC voltages applied
.
 
A
n ion of 
111
 
 
a specific mass
-
to
-
charge ratio will have a stable trajectory and pass through at particular RF and 
DC voltages while all other ions will collide with the walls of the rods, be neutralized and 
pumped away, and not be able to pass through 
(3)
. 
Therefore, at a specific moment in time with 
certain RF and DC voltages, only ions at a certain mass
-
to
-
charge will be allowed through, and 
in the next moment, a change in the RF and DC voltages will allow other ions at a particular 
mass
-
to
-
charge to have 
a stable trajectory. By scanning the RF and DC voltages at a fixed rate, 
all ions in the scan range can pass through the quadrupole at different moments in time, resulting 
in the separation of ions based upon mass
-
to
-
charge (
m/z
) ratio 
(3)
. A 
scan range can span over 
one order of magnitude, depending on the upper and lower limits of RF and DC voltages that can 
be applied. Typically, the ions created 
via
 
electron ionization are singly charged; therefore, the 
ions that are separated by the quadru
pole are resolved based upon their mass.
 
 
Figure 7.6. Schematic of a quadrupole with blue and red lines indicating two possible 
trajectories of ions at the same moment in time.
 
The red line depicts the path of an ion that is 
neutralized in a collision wi
th one of the rods while the blue line displays the path of an ion with 
a stable trajectory through the quadrupole and travels to the detector. 
 
 
112
 
 
7.1.4.3 High
-
Resolution Mass Spectrometry
 
 
An example of a high
-
resolution mass analyzer is a time
-
of
-
flight m
ass analyzer 
(3)
. Unlike 
the quadrupole, the time
-
of
-
flight analyzer is able to resolve ions that differ by 1 mDa, with 
resolution between 4000 

 
9000. The accurate mass of ions is able to be obtained, with mass 
error of 10 ppm or less. Ions l
eaving the ion source are focused into a narrow band as they travel 
through a series of lenses that also decelerate the ions. After this, a pusher to which voltage can 
be applied allows for the acceleration of ions since the applied voltage is transduced i
nto kinetic 
energy, resulting in the acceleration of ions in a field
-
free region 
(3)
. The amount of kinetic 
energy applied to all ions is the same, and due to relationships depicted by Eq. 7.2 and 7.3, the 
speed at which the ions travel in the
 
flight tube is dependent on their mass (Eq. 7.4). 
 
 
(7.2)
 
 
(7.3)
 
   
(7.4)
 
where 
E
 
is kinetic energy, 
z
 
is the charge of the ion, 
e
 
is the charge of an electron, 
V
 
is the 
applied voltage, 
m
 
is the mass of the ion, and 
v
 
is the velocity of the ion 
(3)
. Another important 
relationship is: 
 
 
(7.5)
 
where 
d
 
is the distance traveled by the ion or 
the flight tube length and 
t
 
is the flight time. 
Substitution of Eq. 7.5 into Eq. 7.4 will yield 
 
 
(7.6)
 
113
 
 
which relates the 
m/z
 
ratio of an ion to its flight time (t) 
(3)
. Since mass information is conveyed 
only t
hrough the flight time, the time
-
of
-
flight mass analyzer theoretically does not have an 
upper 
m/z
 
ratio limit; all ions can be detected regardless of size as long as the analysis time is 

(3)
. This is 
vastly different than the quadrupole, which is 
limited by the high voltages that must be applied in order to detect larger ions and allow them 
through to the detector. 
 
In 
a 
conventional time
-
of
-
flight
 
mass analyzer
, the flight path is linear; the ions tra
vel as a 
tight band, or packet, in a linear path from the pusher to the detector and the entire region is 
field
-
free. Ideally, all ions of the same 
m/z
 
ratio should travel at the same rate and have the same 
flight time. However, there may be differences in
 
the kinetic energy imparted to all the ions, 
which would affect the speed at which ions with the same 
m/z
 
ratio travel, and subsequently, the 
flight time 
(3)
. Differences in the kinetic energy of the ions can arise due to the positioning of t
he 
ions at the 
pusher 
(Figure 7.7
). 
Ions of the same 
m/z
 
ratio located at different points at the pusher 
can have slightly different kinetic energies, where those ions positioned closer to the pusher may 
have more energy imparted to them. This is not the c
ase for those ions positioned farther away 
from the pusher in the packet of ions. 
 
The positioning of ions in a packet may also influence the flight times of ions of the same 
m/z
 
ratio 
(3)
. The ions closest to the pusher, as opposed to the ion
s closest to the entrance of the 
flight tube, may have a slightly longer flight path. This leads to a longer flight time than for ions 
that were located closer to the flight tube entrance prior to being accelerated by the pusher. 
 
To minimize flight time d
ifferences among ions of the same 
m/z
 
ratio, the flight path can be 
altered so that the ions can be partially deflected as shown 
in 
Figure 7.7 
(9)
. Deflec
tion of the 
ions is achieved 
through the presence of a reflectron, which is a region where an electric field is 
114
 
 
induced. The purpose of the reflectron at one end of the flight tube is to correct for differences in 
flight time for ions of the same 
m/z
 
ratio. Those ions that are traveli
ng at a higher velocity 
penetrate more deeply into the reflectron as opposed to ions traveling more slowly 
(9)
. 
Therefore, when the ions are deflected towards the detector, the ions t
hat penetrate more deeply 
into the reflectron travel a slightly longer distance than ions that originally traveled more slowly, 
and the result is that these ions travel at the same rate after leaving the reflectron region and 
reach the detector at the same
 
time 
(9)
. 
 
Another method of reducing flight time differences among ions of the same 
m/z
 
ratio is to 
place the flight tube orthogonal to the ion path leading from the source 
(10)
. This is also 
depicted 
in 
Figure 7.7
, where the 
ion path from the ion source creates a 90
°
 
angle with the flight tube. The 
orthogonality minimizes the differences in ion path length through the
 
flight tube, thereby 
reducing positional differences that may result in ions of the same 
m/z
 
ratio with different flight 
times 
(10)
.
 
 
Figure 7.7. Diagram of an orthogonal acceleration
-
time
-
o
f
-
flight (oa
-
TOF) mass analyzer in 
a high
-
resolution mass spectrometer.
 
The flight tube is indicated by the dashed lines. The blue 
 
115
 
 
path
 
indicates the trajectory of a packet of ions from the ion source to the deflection by both the 
pusher and the reflectron to the detector. 
 
 
Once the ions of an analyte are detected by the detector and tabulated into a mass spectrum, 
the accurate masses of
 
the ions can be obtained. A method of quantitatively assessing mass 
accuracy of these accurate masses is needed. High mass accuracy is defined with low error, 
which is given in parts per million (ppm) using the following equation 
(3)
:
 
 
(7.7)
 
where accurate mass is the measured mass of a compound and the exact mass is the theoretical 
value given the exact masses of the elements that comprise the compound. Thus, the mass 
accuracy of all ions in high
-
resolution ma
ss spectra are able to be evaluated once 
their
 
chemical 
composition is known. 
 
7.2 Mass Defect
 
 
Mass defect is the difference between the measured mass and the sum of the masses of the 
components of an element or compound. The mass of an element is defined
 
as the mass of its 
atom, which includes protons, neutrons, and electrons, using atomic mass units (u). The 
measured mass of an element is its exact atomic mass while another method to determine the 
atomic mass is to sum the masses of the protons, neutrons
, and electrons found in the element. 
The difference in mass exists due to the nuclear binding energy that is associated with each 
element, which is the amount of energy required to break apart the atom into its separate 
components: protons, neutrons, and 
electrons 
(11)
. This is illustrated in the following exam
ple:
 
 
(7.8)
 
116
 
 
where 
 
represents 
the most abundant isotope of 
elemental oxygen, 
 
is an electron, 
 
is a 
proton, and 
 
is a neutron. The measured mass of oxygen
-
16 is 15.9994 u, but the mass derived 
from the sum of its protons, neutrons, and electrons is 16.1320 u. Therefore, the mass defect 
associated with oxygen is 0.1326 u. The binding energy is related to mass def

equation:
 
 
(7.9)
 
where 
 
is the binding energy, 
 
is the mass defect, and 
 
is the speed of light. In this 
equation, the mass defect is represented in kilograms, rather than atomic mass unit. The 
conversion factor between the two units is: 
 
       
1 u = 1.6605 
×
 
10
-
27
 
kg.
 
The atomic mass unit is defined as one
-
twelfth of the mass
 
of carbon
-
12 in kilograms, which 
results in carbon
-
12 having an exact mass of 12.0000 u 
(11)
. 
 
The exact mass measurement can only be performed using a high
-
resolution mass 
spectrometer. 
Exact masses of elements can be summed to give the exact mass of a compound 
with a certain composition
. Thus, exact mass 
measurements are useful in ascertaining the 
elemental compositions of molecules
 
with a high degree of certainty
.
 
This is particularly 
valuable when encountering unknown compounds. High
-
resolution mass spectrometers are able 
to measure exact masses and reso
lve those that differ by 10
-
3
 
u and more. Once the elemental 
composition and subsequent formula of a compound 
are
 
determined, the identification of an 
unknown is facilitated. 
 
A
lthough the concept of mass defect originates from binding energies and exact m
asses of 
subatomic particles, different types of mass defects have been created using other scales, and 
these are discussed below. 
 
117
 
 
7.2.1 Absolute Mass Defect
 
 
Similar to the definition of mass defect, the absolute mass defect of an element is the 
differen
ce between the exact mass and the calculated mass from the individual components. 
However, in this case, the calculated mass is the nominal mass of the element. 
 
Nominal mass is commonly used to express the mass of each element or compound, and 
approximate
s the mass of a proton and a neutron to be 1 u (or Da) each, with the mass of an 
electron assumed to be negligible. Thus, the nominal mass of an element such as oxygen is 16 
Da, since it contains 8 protons and 8 neutrons. The exact mass of oxygen
-
16 is 15.
9994 Da. 
Thus, the absolute mass defect of oxygen
-
16 is 
-
0.0006 Da. Whereas all mass defect values 
calculated using the exact masses of protons and neutrons are positive, negative mass defects 
exist using the proton and neutron mass approximation in absolu
te mass defect. In fact, all 
elements with atomic number greater than or equal to 8 have negative mass defects while all 
other elements have positive absolute mass defects with the exception of carbon. The absolute 
mass defect of carbon
-
12 is 0.0000 Da due
 
to its exact mass of 12.0000 Da, as discussed above, 
which is equivalent to its nominal mass. 
 
The absolute mass defects of compounds are calculated as the sums of the absolute mass 
defects of the elemental components. The largest contributor to absolute 
mass defect is 
hydrogen, which has the highest positive absolute mass defect of 0.0078 Da. Therefore, 
compounds whose mass defects are largely positive are hydrogen
-
rich. Those molecules with 
negative mass defects are hydrogen
-
deficient; in organic molecul
es, the compounds are thought 
to be oxygen
-
rich, since one of the main contributors to negative mass defect is oxygen. These 
relationships can hint at elemental composition of molecules, from which chemical information 
118
 
 
can then be obtained. For example, tw
o compounds with nominal mass 175 Da can have slightly 
different elemental 
composition (Figure 
7.8
): 
 
 
(a)
 
 
(b)
 
Figure 7.8. Structures 
of (a) 4
-
APB and (b) 5
-
APDI which hav
e elemental formulae 
C
11
H
13
NO and C
12
H
17
N, respectively. 
 
 
The two structures have different exact masses (175.0997 Da and 175.1361 Da, respectively). 
Their absolute mass defects are 0.0997 Da and 0.1361 Da, respectively. Even if the molecular 
structures a
re unknown, chemical information can still be obtained using absolute mass defect. In 
this case, the more positive mass defect of 0.1361 Da suggests that 5
-
APDI is more hydrogen
-
rich as compared to 4
-
APB, which has an absolute mass defect of 0.0997 Da. The
 
lower mass 
defect value in 4
-
APB can also suggest that the compound is more oxygen
-
rich. Both are found 
to be true in a comparison of the elemental formulae, where the presence of an oxygen atom in 4
-
APB leads to a smaller mass defect value. In summary, c
hemical information can be obtained 
using absolute mass defect. 
 
7.2.2 Kendrick Mass Defect 
 
 
Kendrick mass defect is the difference between the Kendrick exact mass and the Kendrick 
nominal mass, and is a specific type of mass defect that is standardized t
o the methylene (CH
2
) 
functional group 
(12)
. Kendrick exact mass is calculated in the following manner: 
 
            
(7.10)
 
119
 
 
where
 
the exact mass is multiplied by a ratio of the nominal mass of methylene to its exact mass. 
Kendrick nominal mass is obtained by approximating the Kendrick exact mass to the nearest 
integer, and the difference between the two values is the Kendrick mass d
efect (KMD). 
 
 
This mass defect scale is particularly useful in identifying homologous series of compounds, 
such as those compounds that differ by CH
2
 
units (e.g. butane, pentane, hexane, etc.) 
(13)
. Since 
the KMD is normalized to the methylene unit, all compounds that only differ by the number of 
CH
2
 
units will have the same K
MD value. This is suitable for identification of compounds in a 
series that span a wide range of masses 
(13)
. The Kendrick mass scale can also be adjusted so 
that the masses of the compounds of interest can be normalized to other functional groups, if the 
methylene group is not as influential in the homologous series. 
 
7.2.3 Relat
ive Mass Defect
 
 
Relative mass defect (RMD) is another mass defect scale that normalizes to a compound; 
however, unlike Kendrick mass defect, all absolute mass defect values are normalized to the 
exact mass 
(14)
. RMD is calculated as:
 
        
(7.11)
 
The RMD of a compound is scal
ed by 10
6
 
in order to obtain an integer value that can be 
expressed in parts per million (ppm). The range of RMD values can span from 10
2
 

10
3
 
ppm, 
and is indicative of hydrogen richness or deficiency 
(14)
. This concept is similar to that of 
absolute mass defect; however, only positive RMD values exist. Those
 
compounds with a large 
RMD are hydrogen rich while those with low RMD values are hydrogen deficient. 
 
This mass defect scale is useful in filtering and assigning molecules to various compound 
classes because compounds in the same class have similar RMD va
lues. Also, the relative mass 
defect scale is independent of the linearity associated with large molecules and mass defect 
(14)
. 
120
 
 
Absolute mass defect increases with mass due to the large contribution from the hydrogen atoms. 
This is undesirable in a filter because large molecules may fall outside of an absolut
e mass defect 
filter as a result of having a large mass rather than dissimilar class characteristics. Normalization 
to exact mass removes this linear relationship; therefore, RMD is advantageous for classification. 
 
 
121
 
 
REFERENCES
 
 
122
 
 
R
EFERENCES
 
 
1.
 
Skoog DA, Holler FJ, Crouch SR. Principles of Instrumental Analysis: Thomson 
Brooks/Cole, 2007.
 
 
2.
 
Grob K. Split and Splitless Injection for Quantitative Gas Chromatography: Concepts, 
Processes, 
Practical Guidelines, Sources of Error: Wiley, 2008.
 
 
3.
 
Watson JT, Sparkman OD. Introduction to Mass Spectrometry: Instrumentation, 
Applications, and Strategies for Data Interpretation: John Wiley & Sons, 2007.
 
 
4.
 
Goodrich GW, Wiley WC. CONTINUOUS CHANNE
L ELECTRON MULTIPLIER. 
Journal Name: Rev Sci Instr; Journal Volume: Vol: 33; Other Information: Orig Receipt Date: 
31
-
DEC
-
63. 1962:Medium: X; Size: Pages: 761
-
2.
 
 
5.
 
Wiza JL. Microchannel plate detectors. Nuclear Instruments and Methods. 
1979;162(1):587
-
60
1.
 
 
6.
 
Waters. Waters Microcrass LCT Premier Mass Spectrometer Operator's Guide. 2004.
 
 
7.
 
Dynamic Range Extension. Personal Communication with MSU Analytical Chemistry 
Faculty Member ed; 2014.
 
 
8.
 
Harris DC. Quantitative Chemical Analysis: W. H. Freeman, 
2010.
 
 
9.
 
Mamyrin B, Karataev V, Shmikk D, Zagulin V. The massreflect ron, a new non
-
magnetic 
time
-
of
-
flight mass spectrometer with high resolution. Zh Eksp Teor Fiz. 1973;64:82
-
9.
 
 
10.
 
Dawson J, Guilhaus M. Orthogonal

acceleration time

of

flight mass 
spectrometer. Rapid 
Communications in Mass Spectrometry. 1989;3(5):155
-
9.
 
 
11.
 
Wapstra AH, Gove NB. 1971 ATOMIC MASS EVALUATION. PART I. ATOMIC 
MASS TABLE. 1971.
 
 
12.
 
Kendrick E. A Mass Scale Based on CH2 = 14.0000 for High Resolution Mass 
Spectrometry of 
Organic Compounds. Analytical Chemistry. 1963 1963/12/01;35(13):2146
-
54.
 
 
13.
 
Hughey CA, Hendrickson CL, Rodgers RP, Marshall AG, Qian K. Kendrick Mass Defect 

-
Resolution Broadband Mass Spectra. 
Analytical 
Chemistry. 2001 2001/10/01;73(19):4676
-
81.
 
 
14.
 
Stagliano MC, DeKeyser JG, Omiecinski CJ, Jones AD. Bioassay
-
directed fractionation 
for discovery of bioactive neutral lipids guided by relative mass defect filtering and multiplexed 
123
 
 
collision
-
induced dissoci
ation. Rapid Communications in Mass Spectrometry. 
2010;24(24):3578
-
84.
 
 
124
 
 
Chapter 
8 Materials and Method
 
 
8.1 Sample Preparation
 
 
Eight phenethylamine and eight cathinone standards were purchased from Cayman Chemical 
Co. (Ann Arbor, MI). Five standards
 
for each class were designated as the training sets and the 
remaining three standards from each class were the test sets (Figures 8.1 and 8.2). Standards in 
the training sets were used to develop the mass defect filters. Standards in the tests sets were 
u
sed to evaluate the efficacy of the developed filters. One milligram of each standard was 
dissolved in one milliliter of methanol (Sigma
-
Aldrich, St. Louis, MO) for analysis. 
 
 
(a)
 
 
(b)
 
 
(c)
 
 
(d)
 
Figure 8.1. Structures of phenethylamines 
(a 

 
h) 
used in this research.
 
Common 
abbreviations for each compound and designation as training or test set are given in parentheses.
 
125
 
 
(e)
 
 
(f)
 
 
(g)
 
 
(h)
 
 
126
 
 
(a)
 
 
(b)
 
 
(c)
 
 
(d)
 
 
(e)
 
 
(f)
 
Figure 8.2. Structures of cathinones 
(a 

 
h) 
used in this research.
 
Common abbreviations and 
designation as training or test set are given in parentheses.
 
 
127
 
 
(g)
 
 
(h)
 
 
8.2 Instrument Parameters
 
 
The standard solutions were analyzed by both GC
-
QMS and GC
-
TOFMS; the training set 
standard solutions were analyzed in replicate of n = 5 whil
e the test set standard solutions were 
analyzed in triplicate. The GC
-
QMS consisted of an Agilent 7890A gas chromatograph coupled 
to an Agilent 5975C single quadrupole mass spectrometer with an Agilent G4513A injector 
(Agilent Tech., Santa Clara, CA). The 
column was coated with a (5% phenyl)
-
95% 
methylpolysiloxane stationary phase film with dimensions of 30 m 
×
 
0.25 mm 
×
 
0.25 
µ
m
 
(
Agilent J&W DB
-
5ms, Agilent Tech., Santa Clara, CA)
. The injector temperature was set to 
210 
o
C and a 50:1 split injection was us
ed. The injection volume was 1 
µ
L. The carrier gas was 
ultra
-
high purity helium at a nominal 1 mL/min flow rate. The oven temperature program was as 
follows: 50 
o
C for 1 min, 15 
o
C/min to 280 
o
C with a final hold of 2 min. The transfer line 
temperature was
 
280 
o
C. Electron ionization at 70 eV was used; the source was 
kept at 180 
o
C 
128
 
 
while 
the mass analyzer was 
held at 130 
o
C. The scan 
range was 35 

 
300 u and the rate was 5.19 
scans s
-
1
. 
 
The GC
-
TOFMS instrument was a Waters Micromass GCT Premier (Waters, Mi
lford, MA), 
which consists of an Agilent 6890N gas chromatograph coupled to a Waters GCT mass 
spectrometer with an Agilent 7683B autosampler. Similar instrument parameters and the same 
column type and stationary phase to the GC
-
QMS were used. However, ther
e were some 
exceptions.  A splitless injection, where the purge flow was 50 mL/min at 1 min, a 1.3 mL/min 
flow rate, a scan rate of 5.00 scans s
-
1
, a low
-
mass cutoff of 40 Da to reduce transmission of 
background ions, and dynamic range extension were used.
 
To ensure high mass accuracy, there 
was a constant infusion of the calibrant, perfluoro
-
tertbutylamine (PFTBA), during each analysis. 
Because of the constant infusion of calibrant, the baseline was at a higher intensity than that 
without the use of calibr
ant. Thus, splitless injection on the GC
-
TOFMS was performed in order 
to obtain chromatographic peaks that are three orders of magnitude more intense than the 
baseline. 
 
8.3 Data Processing
 
 
Low
-
resolution mass spectral data were obtained after GC
-
QMS ana
lysis by taking a single 
scan at the apex of the peak in the total ion chromatogram and subtracting this scan 
by
 
a scan at 
17.00 minutes. The scan at 17.00 minutes represented the baseline conditions and contained 
common background ions such as those at 
m/z
 
281, 207, and 73. These originated from the 
stationary phase and did not contain ions originating from any of the reference standards. All 
mass spectra were then exported to Microsoft Excel (Microsoft, Albuquerque, NM) for further 
processing. 
 
129
 
 
High
-
res
olution mass spectra were generated by taking a single scan within the peak in the 
total ion chromatogram and subtracting this scan from a single scan in the baseline region 
immediately before the peak using MassLynx v.4.1. (Waters, Milford, MA). The singl
e scan in 
the baseline region before the chromatographic peak represented the current baseline condition 
and contained background ions at 
m/z
 
281, 207, and 73, as well as ions from the calibrant at 
m/z
 
218, 131, and 69. The baseline region did not contain 
ions originating from the reference 
standards. The mass accuracy of the ions in the background
-
subtracted mass spectra were 
assessed using the elemental composition algorithm in MassLynx. The algorithm tabulated the 
accurate masses of these ions against a 
list of possible elemental formulae with exact masses and 
an associated mass accuracy in ppm. The number of possible elemental formulae was restricted 
by a tolerance of 50 ppm in mass accuracy and by the number of carbon, hydrogen, nitrogen, and 
oxygen ato
ms. Acceptable mass spectra were those where the majority of the ions, especially 
those characteristic of the reference standards, displayed a mass accuracy of 

 
20 ppm. These 
mass spectra were then exported to Microsoft Excel for processing.
 
Ion threshold
 
values were 1% of the base peak for phenethylamines and 0.5% of the base 
peak for cathinones. These threshold values were applied to mass spectral data obtained by GC
-
QMS and GC
-
TOFMS. The lower threshold value was applied to the mass spectra of cathinone
s 
due to the low number of fragment ions present at an abundance 

 
1% of the base peak. 
However, a lower threshold value (
e.g.
 
0.1% of the base peak) was not selected in order to avoid 
including background ions that were present even after background subtr
action. Average mass 
spectra were generated for each standard by averaging both the mass
-
to
-
charge ratios and the 
relative abundance across all replicates of each standard. Ions not common to all replicates were 
removed in order to reduce interference from
 
background ions. 
 
130
 
 
8.4 Ion Selection for Mass Defect Filters
 
For the development of the mass defect filters, only high
-
resolution mass spectral data were 
used. The phenethylamine and cathinone mass defect filters were developed using ions from the 
mass spe
ctra of the respective training set standards. The molecular ion of each standard in the 
training set was first identified and used to develop specific to molecular ions. Every other ion in 
the mass spectrum was then considered in the development of fragme
nt ion filters. Fragment ion 
selection for specific mass defect filters and filter development are detailed below. 
 
8.4.1 Absolute Mass Defect Filters
 
For the phenethylamine training set, the absolute mass defect for the molecular ion in each 
spectrum was 
calculated by taking the difference between the accurate mass and the nominal 
mass of the ion. This was done for the five replicates of three of the phenethylamine training set 
standards, as the compounds 5
-
MAPB and 5
-
MAPDB did not exhibit molecular ions, 
and thus, 
were not included in the molecular ion filter. The average absolute mass defect values for each of 
the standards was then determined from the replicate values. The filter centroid was established 
by averaging the absolute mass defect value from e
ach standard. The tolerance or filter window 
was based on the lowest confidence interval to encompass the observed absolute mass defect 
values, including those of the replicates. As no molecular ions were observed in the high
-
resolution mass spectra of the
 
cathinone standards, the molecular ion filter was only developed 
for the phenethylamine class.
 
Fragment ion filters for phenethylamines were developed by selecting the fragment ions that 
were common to all five standards in the training set. The absolute 
mass defects of these 
fragment ions were then determined as described above; the average mass defect for each 
standard was determined, and the filter centroid was defined as the average mass defect across 
131
 
 
the five standards. The filter window was defined a
s described above. Only one fragment ion 
was found to be common to all five standards in the phenethylamine training set; the fragment 
ion filter was developed at 
m/z
 
77. Fragment ion filters for cathinones were developed similarly, 
with filters at 
m/z
 
56,
 
77, and 91. 
 
The absolute mass defect values for the molecular ions and the common fragment ions in the 
average mass spectra of the test sets were calculated as described above. The molecular ion filter 
was assessed with the phenethylamine test set while 
the fragment ion filters were evaluated with 
both the phenethylamine and cathinone test sets based on the number of true positives, false 
positives, and false negatives. 
 
8.4.2 Kendrick Mass Defect Filters
 
For the phenethylamine training set, Kendrick mass defect (KMD) values were first 
calculated for the ions in the average mass spectrum for each standard. The molecular ion for 
each standard was identified. Since the KMD values of compounds in a homologous 
series are 
theoretically equivalent, each molecular ion filter was established with compounds that were in a 
homologous series. The filter window was determined as the lowest confidence interval that 
encompasses the observed KMD values, including those of 
replicates. 
 
All other KMD values were used in the fragment ion filters for phenethylamines. Since KMD 
allowed for normalization to the methylene unit, ion selection for a filter was based upon a 
difference only in the number of CH
2
 
functional groups. This
 
was determined based on 
proximity of KMD values and the exact mass of each fragment ion in order to prevent other ions 
not in a homologous series from being used to develop the filter. Each filter centroid was defined 
as the average of the KMD values for 
the set of ions selected. The filter window was defined in a 
132
 
 
similar manner to that for absolute mass defect filters. Fragment ion filters for cathinones were 
developed using a similar process as above. 
 
For the test sets, only the KMD values from the ions
 
in the average mass spectra for each 
standard were used. The molecular ion filter was evaluated using the phenethylamine test set and 
the fragment ion filters for both classes were assessed using both test sets based on the number of 
true positives, false
 
positives, and false negatives. 
 
8.4.3 Relative Mass Defect Filters and Profiles
 
For the phenethylamine training set, relative mass defect (RMD) values were first calculated 
for all ions in the average mass spectrum for each standard. Molecular ion RMD va
lues were 
then used to develop the molecular ion filter. The filter centroid and tolerance were defined as 
discussed above. 
 
Rather than fragment ion filters, a fragment ion profile was generated for the 
phenethylamines using RMD values of fragment ions. T
he profile is a plot of the fragment ion 
RMD values as a function of 
m/z
 
ratios. Ion selection for the profile was based upon ion 
abundance for a designated number of ions in a spectrum. The ions in each average mass 
spectrum were sorted by decreasing ion 
abundance. Then, the total number of ions for each 
standard was tabulated. The standard with the least number of ions was 5
-
MAPB, with a total of 
17 ions. Thus, for the phenethylamine training set, the first 17 ions from each standard were 
selected. Ion se
lection was performed in this manner in order to avoid biasing the profile towards 
standards that exhibited a larger number of fragment ions.  
 
An RMD profile was generated for the cathinones using a similar process. The compound 
that exhibited the least n
umber of ions was 2
-
methyl MC, with a total of 14 ions. Thus, the first 
14 ions from each standard was selected for the RMD profile.
 
133
 
 
The molecular ion filter was assessed using the phenethylamine test set, again based upon the 
number of true positives, fal
se positives, and false negatives. RMD values were only calculated 
using the ions in the average mass spectra of the test sets. The patterns in the RMD profiles for 
both classes generated with the training sets were compared and contrasted. RMD profiles fo
r the 
test sets were then generated using the same number of ions as in the profiles for the training sets 
(
i.e.
 
17 ions from each phenethylamine test set standard and 14 ions from each cathinone test set 
standard). The pattern in the RMD profile of the ph
enethylamine test set was compared to the 
pattern of the RMD profile of the phenethylamine training set; a similar comparison was 
performed for the cathinone test set to the cathinone training set. 
 
 
134
 
 
Chapter 
9 Results and Discussion
 
 
9.1 Comparison of GC
-
QMS and GC
-
TOFMS Spectra for Phenethylamines 
 
 
The similarity of mass spectra acquired by GC
-
QMS and GC
-
TOFMS was first investigated 
to determine the potential of adapting a classification scheme developed using high
-
resolution 
mass spectral data to low
-
r
esolution mass spectrometry. As forensic practitioners perform 
identifications of synthetic designer drugs using low
-
resolution mass spectral data, it is 
advantageous to be able to relate mass spectral data collected by both techniques in the event that 
no
 
low
-
resolution mass spectral data for novel compounds are available. Similarity in the mass 
spectra is indicative of the ability to adapt a classification scheme from GC
-
TOFMS data to GC
-
QMS data. 
 
Average mass spectra of all s
tandards were generated and 
compared.
 
Most of the 
phenethylamines exhibited molecular ions with both techniques, with the exception of 5
-
MAPB, 
5
-
MAPDB, and 3,4
-
MDPA. Exemplar spectra for the phenethylamine class are 
discussed
, with a 
comparison of spectra and proposed fragmentation p
athways. All other spectra of the 
phenethylamine reference standards 
not discussed 
are displayed in 
Figure
s
 
B
.
1 

 
B.
5
. 
 
In the mass spectrum of 2C
-
H (Figure 9.1), all of the characteristic ions observed in the mass 
spectrum acquired by low
-
resolution mass spectrometry (Figure 9.1a) were observed in the high
-
resolution mass spectrum (Figure 9.1b). 
The compound exhibits a mol
ecular ion at 
m/z
 
181, a 
base peak ion at 
m/z
 
152, and characteristic ions at 
m/z
 
137, 121, 91, and 77. A proposed 
fragmentation pathway is displayed in Figure 9.2. The base peak ion is the result of a loss of       

CH
2
NH from the molecular ion on the ami
ne portion of the molecule, and a subsequent loss of 
either of the two methoxy functional groups on the benzene ring results in a fragment ion at 
m/z
 
121. This ion can further fragment to give the ion at 
m/z
 
91, with chemical formula C
7
H
7
+
. The 
135
 
 
ion at 
m/z
 
137 is due to a loss of 

CH
2
CH
2
NH
2
 
that is attached to the benzene ring from the 
molecular ion. From these two fragmentation pathways, subsequent losses of side chains on the 
benzene ring result in the fragment ion at 
m/z
 
77, which has a chemical formula o
f C
6
H
5
+
. 
 
With the exception of the base peak, the ions listed above were more abundant in the high
-
resolution mass spectrum (
e.g.
 
21% relative abundance for 
m/z
 
121 in TOFMS spectrum in 
comparison to 15% abundance in QMS spectrum). While the splitless inj
ection performed in the 
GC
-
TOFMS analysis partially contributes to this observation, another explanation is the higher 
sensitivity of the GC
-
TOFMS. Splitless injections result in the higher abundances of all ions 
since more of the analyte is introduced int
o the instrument
.
 
H
owever, the abundances of all ions 
in the mass spectra are normalized to the abundance of the base peak, and thus, differences 
between a splitless and a 50:1 split injection do not affect the relative abundances of the ions 
greatly. 
 
 
F
igure 9.1. Average mass spectrum of 2C
-
H acquired via (a) GC
-
QMS
 
and (b) GC
-
TOFMS
. 
Characteristic ions are labeled with associated mass accuracy for the high
-
resolution 
136
 
 
mass spectrum.
 
The ion at m/z 162.
0911 in the TOFMS spectrum corresponds to a fragment ion 
from a compound that 
results from
 
interaction between 2C
-
H and solvent
. 
 
 
The mass accuracy of the ions listed above is also displayed in Figure 9.1b. All characteristic 
ions exhibit high mass accuracy, with 
error < 20 ppm
. While the same pattern of ions is shown in 
the QMS and TOFMS spectra, the accurate masses of the ions are 
known in the TOFMS 
spectrum, which enables assignation of elemental formulae to the ions, providing more 
confidence to the identification of compounds.
 
 
137
 
 
Figure 9.2
.
 
Proposed fragmentation pathway for 2C
-
H. 
 
 
Despite differ
ences in abundance, the patterns and the ratios of the ions in the mass spectra 
are similar between the two techniques for the other reference standards that display a molecular 
ion (
Figure B.
1 

 
B.
2
). The 
only exception to this is the high
-
resolution mass
 
spectra for 4
-
APB 
and 6
-
APB. The low
-
resolution and high
-
resolution mass spectra for 4
-
APB are shown in Figure 
9.3 with a proposed fragmentation pathway displayed in Figure 9.4. The mass spectra and 
fragmentation pathway corresponding to 6
-
APB are shown i
n 
Figure B.
3. 
 
The same characteristic ions are observed in the low
-
 
and high
-
resolution mass spectra of 4
-
APB; the molecular ion is at 
m/z
 
175, the base peak is present at 
m/z
 
44, with other fragment ions 
at 
m/z
 
131, 77, and 132. The ions at 
m/z
 
44 and 131 are the result of the alpha
-
beta cleavage of 
the carbon
-
carbon bond located two bonds away from the amine group. The ion at 
m/z
 
132 arises 
from the loss of 

C
2
H
5
N and the ion at 
m/z
 
77 corresponds to the benzylic ion with chemical 
formula C
6
H
5
+
 
that was present in the mass spectrum of 2C
-
H. In the mass spectrum acquired by 
138
 
 
GC
-
TOFMS, the mass accuracy of these ions is high, 
with error < 10 ppm. The
 
only difference 
in the two average mass spectra are the ion abundances relative to the base peak. I
t is apparent 
that the abundances of the characteristic ions and the molecular ion are substantially higher in the 
TOFMS spectrum than in the QMS spectrum (
e.g.
 
42% abundance for 
m/z
 
77 in TOFMS 
spectrum as compared to 17% in QMS spectrum). The increase in
 
relative ion abundances for all 
ions is indicative of signal reduction for the base peak ion.   
 
An explanation for this observation is that the use of a low
-
mass cutoff at 40 Da affected ion 
transmission at 
m/z
 
44. The low
-
mass cutoff at this value was a
pplied in order to eliminate 
background ions originating from the ion source such as water (18 Da), oxygen (32 Da), and 
nitrogen (28 Da) from being detected in the mass spectra. Due to the proximity of the low
-
mass 
cutoff value to the ion at 
m/z
 
44, the nu
mber of ions at 
m/z
 
44 that are transmitted through the 
mass spectrometer is reduced. As 
m/z
 
44 is the base peak ion for this compound, the reduction in 
the ion transmission affects the relative ion abundances of the other ions in the mass spectrum, 
but no
t the base peak abundance, as the ion abundances are normalized to that of the base peak. 
Furthermore, the ratios of the characteristic ions remain the same in the two spectra. The ion at 
m/z
 
131 is 2.8 times more abundant than the ion at 
m/z
 
132 in the QM
S spectrum while the ratio 
of these two ions in the TOFMS spectrum is 2.7:1. The 
m/z
 
131/
m/z
 
77 ratio is 1.6:1 in the low
-
resolution mass spectrum as opposed to a 1.9:1 ratio in the high
-
resolution mass spectrum. Thus, 
the pattern of the characteristic ion
s in the mass spectrum is maintained despite the differences in 
relative ion abundance. 
 
The fragment ions in the mass spectra of 6
-
APB (
Figure B.
3) are the same as those observed 
for 4
-
APB and the only difference is the 
m/z
 
131/
m/z
 
132 ratio. Whereas the 
ratio is 
approximately 3:1 for 4
-
APB, the low
-
resolution mass spectrum of 6
-
APB displays a ratio of 
139
 
 
1.3:1, indicating a higher abundance of the ion at 
m/z
 
132. This pattern is conserved in the high
-
resolution mass spectrum of 6
-
APB, where the ratio is 1.2:1. As with the high
-
resolution mass 
spectrum of 4
-
APB, the TOFMS spectrum of 6
-
APB shows higher relative ion abundances of the 
characteristic ions compare
d to the QMS spectrum. This is also due to the use of the low
-
mass 
cutoff at 40 Da, which reduces the transmission of the ion at 
m/z
 
44, resulting in higher ion 
abundances for the characteristic ions. Similar to the mass spectra of 4
-
APB, however, the rati
os 
of the characteristic ions are still maintained. 
 
 
Figure 9.3. Average mass spectrum of 4
-
APB
 
obtained via (a) GC
-
QMS
 
and (b) GC
-
TOFMS
. 
Characteristic ions are labeled with associated mass accuracy in the high
-
resolution mass 
spectrum.
 
 
140
 
 
Figure
 

141
 
 
Figure 9.4. Proposed fragmentation pathway for 4
-
APB. 
 
 
For the reference standards that do not display a molecular ion, the accurate masses of the 
fragment ions are especially valuable for structural elucidation. Mass spectral data obtained by 
GC
-
QMS and GC
-
TOFMS for 5
-
MAPB are displayed in Figure 9.5, with a 
proposed 
fragmentation pathway shown in Figure 9.6. In a comparison of the two mass spectra, it is 
apparent that the same characteristic ions are observed. The first fragment ion that is observed 
for 5
-
MAPB is at 
m/z
 
174, which indicates a loss of the meth
yl functional group on the alpha
-
carbon. The ions at 
m/z
 
58 and 131 are the results of the alpha
-
beta cleavage of the carbon
-
carbon bond that is located two bonds away from the amine group. Subsequent losses from both 
pathways lead to the formation of the 
benzylic ion at 
m/z
 
77. 
 
142
 
 
Figure 9.5. Average mass spectrum of 5
-
MAPB
 
obtained via (a) GC
-
QMS
 
and (b) GC
-
TOFMS
. Characteristic ions are labeled with associated mass accuracy in the high
-
resolution 
mass spectrum.
 
 
143
 
 
Figure 9.6. Proposed fragmentation pathway for 5
-
MAPB. 
 
 
Some differences between the high
-
resolution and low
-
resolution mass spectra are observed 
for
 
5
-
MAPDB and 3,4
-
MDPA. In the low
-
resolution mass spectrum of 5
-
MAPDB (
Figure B.
4), 
a loss of the methyl f
unctional group on the alpha
-
carbon yields the ion at 
m/z
 
174, which was 
observed in the fragmentation of 5
-
MAPB
.
 
H
owever, the first fragment ion observed in the high
-
resolution mass spectrum is at 
m/z
 
134, which corresponds to a loss of 
-
CH
2
CH
2
NHCH
2

, 
res
ulting from hydrogen abstraction of a hydrogen from the methyl functional group and 
subsequent cleavage of the carbon
-
carbon bond between the carbon on the benzene ring and the 

-
carbon (
i.e.
 
the carbon atom beta to the amine
). Differences in instrument de
sign and most 
likely path length from the source to the detector between the GC
-
QMS and GC
-
TOFMS may 
lead to differences in ions observed. While the TOF is typically the more sensitive instrument, 
the ion at 
m/z
 
176 may not 
have as long a mean free path co
mpared to the ion
 
at 
m/z
 
134, which 
would lead to the 
collision and subsequent 
degradation of the ion prior to reaching the detector. 
144
 
 
As for 3,4
-
MDPA (
Figure B.
5), the first fragment ion observed is due to the loss of the 
propylamine functional group, and 
this is the same in spectra obtained by both techniques. 
Despite these small differences which can be explained by differences in the geometry of the 
instrument design, similar fragment ions at the lower 
m/z
 
range
 
are still observed and more 
chemical infor
mation is obtained with HRMS. 
 
In the comparison of high
-
resolution and low
-
resolution mass spectra from phenethylamine 
reference standards, it is apparent that 
the mass spectral data acquired via 
GC
-
TOFMS is 
comparable to the data obtained by GC
-
QMS in the fragment ions observed and the overall 
patterns in the mass spectra. This shows promise in the ability to adapt a classification scheme 
developed using HRMS to low
-
resolution mass spectra. However, the sensi
tivity of the GC
-
TOFMS is greater as evident with the higher ion abundances. Accurate masses of the ions in the 
high
-
resolution mass spectra are also obtained, with error 

 
20 ppm, to which elemental formulae 
are assigned
 
with high confidence. In short, ma
ss spectral data of phenethylamines acquired via 
GC
-
TOFMS 
are
 
useful for the identification of novel synthetic designer drugs. 
 
9.2. Comparison of GC
-
QMS and GC
-
TOFMS Spectra for Cathinones
 
While most phenethylamine reference standards investigated in this
 
research displayed a 
molecular ion in the mass spectra, molecular ions were not observed in the mass spectra of the 
cathinone reference standards. This was found to be true in all of the low
-
resolution and high
-
resolution mass spectra, with the exception 
of the mass spectrum of 2
-
methoxy MC acquired via 
GC
-
QMS, where the molecular ion at 
m/z
 
193 was observed at an abundance of 0.7% with 
respect to the base peak. The average mass spectra of this compound obtained by both techniques 
are displayed in Figure 9
.7, with a proposed fragmentation pathway shown in Figure 9.8. 
 
145
 
 
While the molecular ion of 2
-
methoxy MC is present in the GC
-
QMS mass spectrum, more 
fragment ions and their accurate masses are displayed in the mass spectrum acquired by GC
-
TOFMS, which prov
ides more chemical and structural information that are useful for definitive 
identification. The ion at 
m/z
 
191 is the result of the loss of two hydrogen atoms from the 
molecule, while the ions at 
m/z
 
58 and 135 arise from the alpha
-
beta cleavage of the ca
rbon
-
carbon bond located two bonds away from the amine group. 
The 
m/z
 
160 ion is the result of a 
loss of 

CH
5
O, while the loss of both the oxygen in the carbonyl functional group and the 
methoxy group results in the ion at 
m/z
 
146. The ion at 
m/z
 
92 is obt
ained with a loss of               

C
5
H
11
O
2
, 
and subsequent losses in the majority of these fragment ions result in the benzylic ion 
at 
m/z
 
77. The relative ion abundances of these ions in the TOFMS spectrum are also higher than 
those observed in the QMS s
pectrum, further confirming that GC
-
TOFMS is a more sensitive 
technique.
 
 
Figure 9.7. Average mass spectrum of 2
-
methoxy MC obtained via (a) GC
-
QMS and (b) 
GC
-
TOFMS.
 
Characteristic ions are labeled with associated mass accuracy in the high
-
resolution mass
 
spectrum.
 
146
 
 
147
 
 
Figure 9.8. Proposed fragmentation pathway for 2
-
methoxy MC. 
 
 
Similar trends to the mass spectra of 2
-
methoxy MC were also observed 
for
 

-
PPP (Figure 
9.9) and other cathinone reference standards used in this research 
(
Figure
s
 
B.
6 

 
B.
11
). 
Both the 
QMS and the TOFMS spectra of 

-
PPP 
(Figure
s
 
9.9
a and 9.9b, respectively
) display a base peak 
ion at 
m/z
 
98 and characteristic 
ions at 
m/z
 
56, 77,
 
and 105. 
The main difference between the two 
spectra is
 
the presence of ions in the higher m/z range, namely the ions at 
m/z
 
172 and 20
1, 
whereas the majority of fragment ions in the low
-
resolution mass spectrum are those at 
the lower 
m/z
 
range
 
(
i.e.
 
below
 
m/z
 
100). The ions observed in the TOFMS spectrum also display high 
mass accuracy, with error 

 
5 ppm.
 
The proposed fragmentation pathway for 

-
PPP is shown in Figure 9.10. 
The ions at 
m/z
 
98 
and 105 correspond to the fragment ions resulting from th
e 

-

 
cleavage of the compound, 
whereas the ion at 
m/z
 
56 corresponds to an ion with chemical composition C
3
H
6
N
+
, and the ion 
148
 
 
at 
m/z
 
77 is the benzylic ion with chemical formula C
6
H
5
+
. The
 
ion
 
at 
m/z
 
172 only observed in 
the TOFMS spectrum 
is due to the
 
losses of carbonyl oxygen and the methyl group on the 

-
carbon, and the ion at 
m/z
 
201 results from the loss of two hydrogens, likely from the carbons in 
the pyrrolidine functional group. 
The presence of more fragment ions in high
-
resolution mass 
spectra a
t greater relative abundances with high mass accuracy indicates higher instrument 
sensitivity and is more useful in spectral interpretation for compound identification.  
 
 
Figure 9.9. Average mass spectrum of 

-
PPP obtained via (a) GC
-
QMS and (b) GC
-
TOFMS
. 
Characteristic ions are labeled with associated mass accuracy in the high
-
resolution 
mass spectrum.
 
 
149
 
 
150
 
 
Figure 9.10. Proposed fragmentation pathway for 

-
PPP. 
 
 
Overall, the high
-
resolution mass spectra obtained by GC
-
TOFMS display similar, if not 
more, chemical information for the cathinone reference standards investigated, and thus, 
demonstrate the utility of HRMS in assisting forensic practitioners in the ident
ification of novel 
synthetic designer drugs. 
 
9.3 Absolute Mass Defect 
 
 
The potential of absolute mass defect filters to classify synthetic designer drugs to the 
phenethylamine and cathinone structural classes was investigated. The molecular ion filter fo
r 
the phenethylamine class is shown below in Table 9.1. The experimental exact mass for the three 
standards displayed high mass accuracy, with error ranging between 2 and 5 ppm. The standard 
deviation in the mass defects of the replicates for each standard
 
was observed to be small, 
indicating high precision in the accurate mass and subsequently, the mass defect measurements. 
The filter window of 
±
35.8 mDa was determined at the 82% confidence level so that the filter 
151
 
 
was as narrow as possible while still enc
ompassing all of the absolute mass defects for all 
replicates of the three standards. 
 
 
Table 9.1. Molecular ion filter for phenethylamines using absolute mass defect.
 
 
4
-
APB
 
2C
-
P
 
2C
-
H
 
Theoretical Exact 
Mass (Da)
 
175.0997
 
223.1572
 
181.1103
 
Experimental 
Exact 
Mass (Da)
 
175.0989
 
223.1568
 
181.1109
 
Experimental Mass 
Defect (mDa)*
 
98.9 ± 0.8
 
156.8 ± 0.7
 
110.9 ± 0.6
 
Mass Defect Filter 
(mDa)
 
122.2 ± 35.8 **
 
*Average mass defect 
±
 
standard deviation (n = 5). 
 
**Confidence interval calculated at 82% confidence
 
level (CL) based on n = 3.
 
 
Evaluation of the molecular ion filter is shown in Figure 9.11, where the phenethylamine 
standards in the training and test sets are plotted. Though the test set contains three reference 
standards, the compound 3,4
-
MDPA did not
 
exhibit a molecular ion, and thus, was not included 
in the evaluation of this filter. Successful classification of phenethylamines in the test set was 
achieved using the molecular ion filter in that the absolute mass defects of these standards were 
within
 
the filter window. Even so, the filter window is large and similar to the 
±
50 mDa 
tolerance that Grabenauer 
et al.
 
used in their mass defect filter 
(1)
. A large filter window 
increases the possibility that compounds belonging to other structural classes would have mass 
defect values that fall within this filter. For exampl
e, a traditional controlled substance such as 
cocaine would be falsely classified as a phenethylamine with this filter. This is because the exact 
mass of cocaine is 303.1471 Da, and its absolute mass defect is 147.1 Da, which is a value that 
falls within t
he molecular ion filter. However, the structure of cocaine does not resemble that of a 
152
 
 
phenethylamine (
Figure B.1
2
) an
d cocaine is not a member of the phenethylamine class. Thus, 
filters with increased specificity need to be investigated. 
 
 
Figure 9.11. 
Molecular ion filter for the phenethylamine class using absolute mass defect 
(82% CL, n = 3). 
Error bars 
(smaller than the symbols) 
represent the standard deviation in the 
mass defect of the replicates (n = 5) for each standard in training set. The filter 
was tested with 
the phenethylamine test set. 
 
 
Fragment ion filters were developed for both classes using absolute mass defect to increase 
the specificity of the classification as well as 
to 
extract chemical information from compounds 
that do not exhibit m
olecular ions. The fragment ions in the filters were common to all the 
standards in the training set.
 
The ion at 
m/z
 
77 was common among all five phenethylamine standards. This fragment ion 
is the benzylic ion with chemical formula C
6
H
5
+
. A filter was deve
loped at 39.5 
±
 
2.2 mDa 
(99.9% confidence level, n = 5) (Table 9.2). The filter window for fragment ions is substantially 
smaller than that for a molecular ion filter (2.2 mDa compared to 35.8 mDa). This is because the 
153
 
 
filter is specific to this ion and it
s development is based only on the absolute mass defect value of 
m/z
 
77 rather than a range of mass defects as is the case for the molecular ion filter. The 
possibility of false positives is reduced with a small filter window; however, the possibility of 
f
alse negatives increases.  
 
 
Table 9.2. Fragment ion filter at 
m/z
 
77 for phenethylamines using absolute mass defect. 
 
 
4
-
APB
 
5
-
MAPB
 
5
-
MAPDB
 
2C
-
P
 
2C
-
H
 
Theoretical Exact 
Mass (Da)
 
77.0391
 
Experimental Exact 
Mass (Da)
 
77.0397
 
77.0390
 
77.0387
 
77.0400
 
77.0400
 
Experimental Mass 
Defect (mDa)*
 
39.7 ± 0.4
 
39.0 ± 0.7
 
38.7 ± 0.6
 
40.0 ± 0.6
 
40.0 ± 0.3
 
Mass Defect Filter 
(mDa)
 
39.5 ± 2.2 **
 
*Average mass defect 
±
 
standard deviation (n = 5). 
 
**Confidence interval calculated at 99.9% confidence level (CL) 
based on n = 5.
 
 
The 
m/z
 
77 fragment ion filter was first assessed with the phenethylamine test set (Figure 
9.12). Successful classification was achieved, as the three phenethylamine standards in the test 
set had absolute mass defect values at 
m/z
 
77 that were within the filter, and no false positives or 
negatives were observed. The absolute mass defect values at 
m/z
 
77 were the same for two of the 
standards, 6
-
APB and 3,4
-
MDPA, and thus, the two corresponding data points are overlaid on 
the figure
. 
 
The 
m/z
 
77 fragment ion filter was then assessed using the cathinone test set (Figure 9.12). 
Only two of the cathinone test set standards, mephedrone and 2
-
methoxy MC, exhibited an ion at 
m/z
 
77. The absolute mass defect values of the fragment ions at 
m
/z
 
77 for the two cathinone 
standards were within the filter, indicating that the two standards would be classified as 
phenethylamines using this filter alone. Clearly, this filter is not discriminatory between the two 
154
 
 
classes. However, this is expected, a
s the benzylic ion is observed in the core structures of both 
phenethylamine and cathinone, and is common to aromatic compounds. 
 
Despite the benzylic ion being common to aromatic compounds, the cathinone test set 
standard, pyrovalerone, did not exhibit th
is ion. A possible explanation for the absence of this ion 
is that the formation of the 
m/z
 
77 ion is not favored given the structure of the compound. It is 
likely that the methyl substitu
ent
 
on the benzene ring that is
 
located
 
para
 
relative 
to the carbony
l 
group leads to the formation of the 
m/z
 
91 ion rather than the 
m/z
 
77 ion, as the difference 
between the two ions is a CH
2
 
group. Given the abundance of the 
m/z
 
91 ion in the mass 
spectrum of pyrovalerone (18%), this explanation is reasonable. More detai
l on the 
m/z
 
91 ion is 
provided below. 
 
 
Figure 9.12. Fragment ion filter for phenethylamines at 
m/z
 
77 using absolute mass defect 
(99.9% CL for n = 5).
 
Error bars represent the standard deviation in the mass defect of the 
replicates (n = 5) for each stan
dard in training set. The filter was tested with the phenethylamine 
and cathinone test sets. 
 
 
155
 
 
Fragment ion filters for the cathinone class were developed for ions at 
m/z
 
56, 77, and 91. 
The ion at 
m/z
 
56 has chemical formula C
3
H
6
N
+
,
 
while the ion at 
m/z
 
91 is the tropylium ion with 
chemical formula C
7
H
7
+
. The ion at 
m/z
 
77 is the benzylic ion that was common to the 
phenethylamine standards, discussed previously. As this ion is common to both classes, this 
fragment ion filter developed for the cathinones
 
is not discussed. 
 
However, that is not to say that the 
m/z
 
77 filter is not useful in any way; the presence of an 
ion at 
m/z
 
77 that falls within this absolute mass defect filter is indicative of an aromatic 
compound. This filter may be used as an initia
l filter for novel compounds to determine its 
aromaticity; it is merely not discriminatory between the two compound classes of interest. 
 
The cathinone fragment ion filter developed for 
m/z
 
56 is centered at 49.8 
±
 
1.9 mDa 
(99.99% CL, n = 5) (Figure 9.13).
 
This fragment ion is from the aliphatic portion of the 
compound that typically arises from alpha
-
beta cleavage of the carbon
-
carbon bond that is 
located two bonds away from the amine. A figure of this fragment ion is 
shown in 
Figure B.1
3
. 
Successful class
ification of the cathinone test set was achieved, as the absolute mass defect 
values of the 
m/z
 
56 ions for the three standards are well within the filter window. 
 
The 
m/z
 
56 fragment ion filter was then assessed with the 
m/z
 
56 ions for the phenethylamine
 
test set. Of the three standards, 2C
-
D did not exhibit a fragment ion at 
m/z
 
56, while the ion was 
present in 6
-
APB and 3,4
-
MDPA. The mass defect values for the 
m/z
 
56 ion in 6
-
APB and 3,4
-
MDPA were not within the filter window. At first, this appears to 
indicate that the filter is 
actually discriminatory
, 
especially in light of the fact that
 
the ion at 
m/z
 
56 is not exclusive to 
C
3
H
6
N
+
. 
An ion with chemical composition of C
4
H
8
+
 
also exists with a nominal mass of 56 Da, 
but with exact mass of 56.
0626 Da. 
H
owever, 
given the structure of the two test set compounds, 
the identity of the 
m/z
 
56 ion is likely the ion with the former chemical composition as opposed 
156
 
 
to the latter. Furthermore, 
both ions exhibited poor mass accuracy (43 ppm for 3,4
-
MDPA and 
45 ppm for 6
-
APB). The source of the poor mass accuracy may be the result of instrument drift; 
the exact mass was observed to be higher than expected, meaning that the flight time of these 
ions was longer. As described in Section 7.1.4.3, a multi
tude of possibilities exist for flight time 
deviations, including positional and kinetic energy differences. The poor mass accuracy observed 
in these ions may be attributed to these differences, and thus, the 
m/z
 
56 filter does not provide 
discrimination b
etween the phenethylamine and cathinone classes. 
 
 
Figure 9.13. Fragment ion filter at 
m/z
 
56 using absolute mass defect for cathinones 
(99.99% CL for n = 5).
 
Error bars represent the standard deviation in the mass defect of the 
replicates (n = 5) for eac
h standard in training set. The filter was tested with the phenethylamine 
and cathinone test sets.
 
 
The cathinone fragment ion filter developed for 
m/z
 
91 was centered at 55.0 
±
 
2.4 mDa 
(99.998% CL for n = 5) (Figure 9.14). This fragment ion is the result 
of a rearrangement of the 
methylated benzylic ion to form a 7
-
member ring (
Figure B.1
3
). Successful 
classification was 
157
 
 
achieved for the cathinone test set; all absolute mass defect values of the 
m/z
 
91 ion for the test 
set were within 0.3 mDa of the filter
 
centroid, indicating high mass accuracy. 
 
The filter was then assessed using the phenethylamine test set. Only two compounds 
displayed an ion at 
m/z
 
91: 6
-
APB and 2C
-
D. Similar to the ion filter at 
m/z
 
77, the mass defect 
values for the opposing test set 
were within the 
m/z
 
91 filter, which indicates that the filter is not 
useful for discrimination between the two classes. As the ion at 
m/z
 
91 is found in the core 
structures corresponding to both phenethylamines and cathinones, the absolute mass defect fil
ter 
for this ion does not provide useful information towards the classification of a synthetic drug to 
either class. 
 
 
Figure 9.14. Fragment ion filter at 
m/z
 
91 developed for the cathinones (99.998% CL, n = 
5).
 
Error bars represent the standard deviation
 
in the mass defect of the replicates (n = 5) for each 
standard in training set. The filter was tested with the phenethylamine and cathinone test sets.
 
 
Despite the presence of this ion in the core structures, 
m/z
 
91 was not identified as a common 
fragment
 
ion for the phenethylamine class due to its absence in one of the phenethylamine 
158
 
 
standards; only 4 of the 5 phenethylamine training set standards displayed this ion. An 
explanation for this observation is the difference in substituents to the core structu
re, where the 
substituent identity and position dictates the presence and abundance of the 
m/z
 
91 ion. For 
example, 5
-
MAPB does not exhibit the 
m/z
 
91 ion while 5
-
MAPDB does. In fact, ions at both 
m/z
 
77 and 91 are present in 5
-
MAPDB while only the 
m/z
 
77 
ion is present in the mass spectrum 
of 5
-
MAPB. The difference between the two phenethylamines is the presence of the double bond 
located in the furan portion of the two structures. In 5
-
MAPB, the double bond is present while it 
is absent in 5
-
MAPDB. The pr
esence of this double bond results in the absence of the 
m/z
 
91 ion. 
This is likely due to the inability of the compound to form the tropylium ion after cleavage of 
this bond. Prior to rearrangement, the 
m/z
 
91 ion is that of a benzene ring with a methylene 
functional group. But, because only one hydrogen atom is located on each of the two carbons 
participating in the double bond, cleavage of this bond will not allow for two hydrogen atoms to 
be present on th
e carbon atom to form the methylene group. Thus, the identity of the substituent 
in this example dictates the formation of characteristic fragment ions. 
 
In summary, a molecular ion and a fragment ion filter at 
m/z
 
77 were developed for the 
phenethylamine 
class while fragment ion filters at 
m/z
 
56, 77, and 91 were developed for the 
cathinone class based on absolute mass defect. Successful classification was achieved with both 
of the filters for the phenethylamines with the phenethylamine test set. However, 
a wide filter 
window (
±
35.8 mDa) was observed in the molecular ion filter and the filter for 
m/z
 
77 is not 
discriminatory.
 
Of the standards investigated, no molecular ions were observed for cathinones, suggesting 
that this feature may be a point of discrim
ination for the two classes. Despite successful 
159
 
 
classification of the cathinone test set for the three fragment ion filters, these filters are not 
discriminatory. 
 
It is evident that the use of absolute mass defect filters alone is not feasible for classif
ication 
to the phenethylamine and cathinone classes. However, these filters can be used at the early 
stages of a classification scheme to filter out compounds that belong to structural classes other 
than phenethylamines and cathinones. The wide window for 
the molecular ion is optimal for an 
initial screening to retain more compounds for further classification, thus reducing the possibility 
of missing potential phenethylamines. The common fragment ion filters for 
m/z
 
77 and 
m/z
 
91 
can then be applied to dete
rmine the aromaticity of the screened compounds and the fragment 
ion filter at 
m/z
 
56 to determine the presence of the nitrogen
-
containing aliphatic chain C
3
H
6
N
+
 
in the interrogated compounds. Compounds with absolute mass defect values that fall within 
the
se filters are more likely to have structures similar to phenethylamines and cathinones. 
 
An alternative to absolute mass defect filters is also necessary, as a limitation exists for 
absolute mass defect. A positive correlation exists between 
m
/z
 
and mass 
defect (Figure 9.15), 
which is 
a dis
advantag
e
 
for mass defect filtering
.
 
This correlation is due to the large contribution 
of hydrogen to absolute mass defect (7.8 mDa). For synthetic drugs with high molecular masses, 
the hydrogen content is substantial. S
ince hydrogen atoms are the most influential in positive 
mass defect, the larger hydrogen content of the compound is reflected in the increase in absolute 
mass defect. This positive correlation is most likely to affect molecular ion filters, since those 
co
mpounds with masses outside the investigated range may have mass defect values that do not 
fall within the filter, but are known to belong to the same structural class (
i.e.
 
false negatives). 
Other types of mass defect filters need to be investigated for t
heir potential to overcome this 
limitation in absolute mass defect. 
 
160
 
 
Figure 9.15. Absolute mass defect values of synthetic designer drugs plotted as a function of 
their exact mass. 
 
 
9.4 Kendrick Mass Defect
 
 
Kendrick mass defect (KMD) filters were 
investigated for their potential to overcome the 
limitations
 
and non
-
specificity observed with the
 
absolute mass defect filters. KMD is extremely 
useful at identifying homologous series since members in a series have the same KMD. Not all 
compounds in a cl
ass will differ only by methylene (CH
2
) units. Therefore, the possibility of 
multiple KMD filters for the same class exist. For example, of
 
the 5 phenethylamines in the 
training set, two homologous series can be identified. The first homologous series is t
hat of the 
2C
-
phenethylamines, which includes the compounds 2C
-
P and 2C
-
H (members of the 2C
-
phenethylamines), since their chemical formulae differ by C
3
H
6
, or 3 methylene groups. The 
second homologous series includes the compounds 4
-
APB and 5
-
MAPB, and de
spite the 
different position of the furan functional group, the difference in chemical formulae is CH
2
. 5
-
MAPDB, on the other hand, does not fit into either of the two homologous series, since the 
161
 
 
compound is not a 2C
-
phenethylamine, nor is the compound pu
rely an aminopropylbenzofuran 
(APB) since it contains two more hydrogen atoms than 5
-
MAPB. 
 
Two KMD molecular ion filters for phenethylamines are expected based upon the 
homologous series present. However, no molecular ion was present in the mass spectrum of 5
-
MAPB, and therefore, the filter was not developed for the homologous series containing 
4
-
APB 
and 5
-
MAPB. 
More reference standards in this series would need to be analyzed in order to 
develop this KMD molecular ion filter. 
 
The molecular ion filter that was developed was for the 2C
-
phenethylamines at 91.8 
±
 
1.5 
mDa (78% CL, n = 2) (Table 9.3)
. Unlike the molecular ion filter developed using absolute mass 
defect, this filter window is extremely narrow. This is expected since the KMD values that 
comprise the filter are theoretically equivalent, and the filter centroid is not the average of value
s 
that span a wide mass defect range, as was observed for the molecular ion filter using absolute 
mass defect.
 
 
Table 9.3. Molecular ion filter for phenethylamines using Kendrick mass defect.
 
 
4
-
APB
 
2C
-
P
 
2C
-
H
 
Theoretical KMD 
(mDa)
 
95.8
 
92.0
 
92.0
 
Experime
ntal KMD 
(mDa)*
 
96.6 ± 0.8
 
92.4 ± 0.7
 
91.3 ± 0.6
 
KMD Filter (mDa)
 

91.8 ± 1.5 **
 
*Average mass defect 
±
 
standard deviation (n = 5). 
 
**Confidence interval calculated at 78% confidence level (CL) based on n = 2.
 
 
The efficacy of the 2C
-
phenethylamine 
filter was then tested with the phenethylamine test set 
(Figure 9.16). Only two standards (2C
-
D and 6
-
APB) were used in the test set as 3,4
-
MDPA did 
not exhibit a molecular ion. As 6
-
APB is a positional isomer of 4
-
APB and is not in the 2C
-
phenethylamine h
omologous series while 2C
-
D is a 2C
-
phenethylamine, it is expected that only 
162
 
 
the KMD of 2C
-
D will fall within the filter. Successful classification of 2C
-
D was observed; the 
KMD value of 6
-
APB was not within the filter. 
 
 
Figure 9.16. Molecular ion filter
 
for 2C
-
phenethylamines using KMD (78% CL for n = 2).
 
Error bars represent the standard deviation in the mass defect of the replicates (n = 5) for each 
standard in training set. The filter was tested with the phenethylamine test set.
 
 
It is evident that in
creased specificity can be achieved with KMD molecular ion filters; these 
filters can be used to further subclassify within the phenethylamine structural class. This is 
extremely useful in a classification scheme; more structural information of a compound 
can be 
obtained with filters that can indicate subclass. The narrow window in the KMD molecular ion 
filter is also advantageous since this reduces the possibility of false positives that are likely to 
occur with a wide window, as observed in the molecular 
ion filter developed using absolute mass 
defect. 
 
Fragment ion filters based on Kendrick mass defect were then investigated for the 
phenethylamine and cathinone classes. Seven fragment ion filters were developed for 
163
 
 
phenethylamines (Table 9.4). Representat
ive filters for the class, including Filters 2, 4, and 6, are 
discussed. Filters 1 and 3 (
Figure
s
 
B.
14 and 
B.
15
) are 
similar to Filter 2, Filter 5 (
Figure B.1
6
) 
is 
similar to Filter 4, and 
Filter 7 (
Figure B.1
7
) 
is similar to Filter 6. 
 
 
Table 9.4. List of the ions included in each fragment ion filter for the phenethylamine class 
using Kendrick mass defect. 
 
Filter
 
m/z
 
R
ange
 
Chemical Formula Range
 
1
 
51 

 
79
 
C
4
H
3
+
 

C
6
H
7
+
 
2
 
63 

 
105
 
C
5
H
3
+
 

C
8
H
9
+
 
3
 
62 

 
104
 
C
5
H
2
+
 

C
8
H
8
+
 
4
 
75 

 
131
 
C
6
H
3
+
 

C
10
H
11
+
 
5
 
87 

 
129
 
C
7
H
3
+
 

C
10
H
9
+
 
6
 
138 

 
194
 
C
8
H
10
O
2
+
 

C
12
H
18
O
2
+
 
7
 
137 

 
193
 
C
8
H
9
O
2
+
 

C
12
H
17
O
2
+
 
 
Filter 2 ranges from C
5
H
3
+
 
to C
8
H
9
+
 
and includes the ions C
6
H
5
+
 
and C
7
H
7
+
. The 
m/z
 
range
 
that the filter spans is from 63 

 
105 Da. Four of the five phenethylamine training set standards 
exhibited the 
m/z
 
63 ion; 
this ion was not observed for 
2C
-
P. As expected, the 
m/z
 
77 ion was 
present in all five phenethylamine standards, and as stated pre
viously, only 5
-
MAPB did not 
exhibit the 
m/
z
 
91 ion. The 
m/z
 
105 ion was present in all standards with the exception of 5
-
MAPB and 2C
-
H. 
 
Based upon the chemical formulae of the ions included, it is apparent that fragment ion Filter 
2 targets ions that 
have aromatic characteristics with a small aliphatic component (
i.e.
 
the CH
2
 
group in 
m/z
 
91 and the C
2
H
4
 
group in 
m/z
 
105). The filter is centered at 46.5 
±
 
2.5 mDa 
(99.9999998% CL, n = 16) (Figure 9.17). Of the fragment ions filters developed for 
pheneth
ylamines, this filter included the most ions in development. An extremely high 
confidence level was applied in order to encompass the KMD values of two of the ions used in 
filter development: the 
m/z
 
91 ion from 2C
-
H and the 
m/z
 
105 ion from 2C
-
P. The mass
 
accuracy 
164
 
 
of the two ions was lower than expected, at 23 ppm and 28 ppm, respectively whereas the mass 
accuracy of the other ions in the filter ranged between 1 and 11 ppm.  
 
 
Figure 9.17. Fragment ion Filter 2 for phenethylamines using KMD (99.9999998% C
L, n = 
16), with test sets from both classes plotted. 
 
 
Filter 2 was
 
first evaluated with the phenethylamine test set. From the three test set standards, 
10 ions were identified as having KMD values within the filter range. Successful classification 
was ac
hieved for all ions in the phenethylamine test set that exhibited the fragment ions at 
m/z
 
63, 77, 91, and 105. No false positives or negatives were observed for the phenethylamine test 
set. Both 6
-
APB and 2C
-
D exhibited the four fragment ions, but only io
ns at 
m/z
 
77 and 105 
were observed in 3,4
-
MDPA. Despite the absence of the ions at 
m/z
 
63 and 91, 3,4
-
MDPA would 
still be classified as a phenethylamine using this filter. 
 
Filter 2 was then assessed with the cathinone test set. Nine ions from three cathin
one test set 
standards were identified as having KMD values within the filter range. 2
-
M
ethoxy MC 
165
 
 
exhibited ions at 
m/z
 
63, 77, and 91 that were within the filter window. The ions at 
m/z
 
91 and 
105 for pyrovalerone were also within the filter window, and a
ll four ions exhibited by 
mephedrone had KMD values in that range. However, the 
m/z
 
105 ion for 2
-
methoxy MC and 
the 
m/z
 
63 ion for pyrovalerone had KMD values that were outside the filter window, and more 
specifically, KMD values above the upper limit of 
the filter window. These are false negatives, 
as the ions have chemical formulae that correspond to the ions in the filter. An explanation for 
this occurrence is the low mass accuracy for these ions; errors of 41 and 36 ppm for the 
m/z
 
63 
and 
m/z
 
105 ions,
 
respectively, are observed. As the filter window expands only to encompass 
ions with maximum error of 28 ppm (
m/z
 
105 ion for 2C
-
P), it is not surprising that the two ions 
at 
m/z
 
63 and 105 from the cathinone test set are outside the filter.  
 
Despite suc
cessful classification of all ions in the phenethylamine test set, this filter is non
-
discriminatory between the two classes
,
 
since the cathinone test set contained ions that fell 
within this filter. This filter contains ions at 
m/z
 
77 and 91, which, as discussed previously, are 
ions that are common to both classes and indicate aromaticity of compounds. 
 
Fragment ion Filter 4 was developed for the phenethylamine class with ions ranging in 
chemical formulae from C
6
H
3
+
 
to
 
C
10
H
11
+
, whic
h spans one of the largest 
m/z
 
range
s in the 
phenethylamine fragment ion filters (75 

 
131 Da). The filter includes ions with chemical 
formulae C
7
H
5
+
, C
8
H
7
+
, and C
9
H
9
+
, with structures that contain 5
-
 
or 6
-
member rings with a 
small aliphatic component. 4
-
A
PB exhibits the ions at 
m/z
 
75, 89, and 103 while 5
-
MAPB 
displays only ions at 
m/z
 
89 and 103. The fragment ions at the higher 
m/z
 
range
 
(
i.e.
 
m/z
 
103, 
117, and 131) are present in 2C
-
P
,
 
whereas only the 
m/z
 
89 ion is observed in 2C
-
H. Finally, the 
ions at
 
m/z
 
103 and 117 are present in 5
-
MAPDB. This filter is centered at 60.0 
±
 
1.9 mDa 
(99.995% CL, n = 11) (Figure 9.18). A high confidence level was applied in order to encompass 
166
 
 
the 
m/z
 
117 ion for 5
-
MAPDB, which displayed a mass accuracy of 14 ppm. While t
his value is 
acceptable from the analysis, the mass accuracy displayed by the other ions is 
<
 
10 ppm, and 
thus, the error in this 
m/z
 
117 ion is higher in comparison. 
 
Filter 4 was first evaluated with the phenethylamine test set. Six ions from the three s
tandards 
were identified as having KMD values that were within the filter; 6
-
APB exhibited ions at 
m/z
 
75, 89, and 103, 
2C
-
D exhibited ions 
at 
m/z
 
89 and 103, and 3,4
-
MDPA only exhibited the 
m/z
 
103 ion. Successful classification of these ions in the phenethylamine test set was achieved, and 
no false positives or negatives were observed. 
 
The fragment ion filter was then assessed with the cathinone test set. Five ions from two of 
the test set sta
ndards were identified 
as
 
having KMD values that were within this filter. The ions 
at 
m/z
 
75, 89, and 103 from 2
-
methoxy MC and the 
m/z
 
89 and 117 ions from mephedrone had 
KMD values that were within the filter. One false negative was identified in the cat
hinone test 
set, and this 
wa
s the 
m/z
 
89 ion in pyrovalerone. Given the chemical formula corresponding to 
this ion, it is expected to have a KMD value within this filter; however, the associated mass 
accuracy is 20 ppm, which is high in comparison to the m
ass accuracy displayed for the 
phenethylamine training set standards (maximum error = 14 ppm for 
m/z
 
117 ion from 5
-
MAPDB). 
 
While the ions in the phenethylamine test set were within the filter, the majority of cathinone 
ions found to be within this filter
. This
 
indicate
s
 
the non
-
discriminatory nature of this filter for 
classification to either class. 
 
167
 
 
Figure 9.18. Fragment ion Filter 4 for phenethylamines using KMD (99.995% CL, n = 11), 
with test sets from both classes plotted. 
 
 
Thus far, the filters d
eveloped for the phenethylamine class only included ions that contain 
carbon and hydrogen atoms. Filter 6 was developed for ions that contain not only carbon and 
hydrogen atoms, but also two oxygen atoms. Fragment ion Filter 6 contains ions with chemical 
f
ormulae that range from 
C
8
H
10
O
2
+
 
to C
12
H
18
O
2
+
, including C
9
H
12
O
2
+
, C
10
H
14
O
2
+
, and 
C
11
H
16
O
2
+
. While 
the two 
filters previously discussed were developed for ions in the lower
 
to 
middle
 
m/z
 
range
, this filter spans
 
a higher 
m/z
 
range
, from 138 to 194 Da. The compound 2C
-
H 
exhibits ions at 
m/z
 
138 and 152 while the ions at 
m/z
 
166, 180, and 194 are present in 2C
-
P. Not 
surprisingly, only 2C
-
P and 2C
-
H exhibit ions in this filter, as these are the only training set 
standards that have t
wo oxygen atoms. It is apparent that this filter is specific to the 2C
-
phenethylamines. This fragment ion filter is centered at 86.9 
±
 
2.6 mDa (98% CL, n = 5) (Figure 
9.19). 
 
168
 
 
Figure 9.19. Fragment ion Filter 6 for phenethylamines (98% CL, n = 5) using KM
D. 
Test 
sets from both classes are plotted. 
 
 
Filter 6 was first evaluated with the phenethylamine test set. It is expected that only ions that 
are characteristic of 2C
-
phenethylamines would have KMD values within this filter. Six ions 
were identified from two standards in the test set as having KMD v
alues within the filter. The 
ions at 
m/z
 
127 and 133 for 6
-
APB and the ions at 
m/z
 
152, 164, 166, and 196 for 2C
-
D were 
within the filter. Successful classification was achieved for 2C
-
D for the ions at 
m/z
 
152 and 166. 
However, the other four ions (all io
ns from 6
-
APB and 
m/z
 
164 and 196 for 2C
-
D) are false 
positives. This is because the filter was developed with KMD values for 
m/z
 
152 and 166, and 
the chemical formulae corresponding to the 
m/z
 
127, 133, 164, and 196 ions are C
10
H
7
+
, C
9
H
9
O
+
, 
C
10
H
14
NO
+
, and
 
C
11
H
18
NO
2
+
, respectively, which are compounds that are not in a homologous 
series with the compounds in Filter 6. 
 
169
 
 
Two possible explanations for the presence of the four false positives are the wide filter 
window and the reduced mass accuracy in these ion
s. The theoretical KMD value for this filter is 
86.1 mDa, but the filter centroid is at 86.9 mDa. While this is not a large difference, the filter 
window of 2.6 mDa means that the filter spans KMD values between 84.3 mDa and 89.5 mDa, 
and this is a large e
nough range 
to include
 
other ions, such as the KMD value at 
m/z
 
127 (86.5 
mDa) to fall within this range. The false positive at 
m/z
 
196 arises from the wide filter window as 
well as the reduced mass accuracy of this ion, since its theoretical KMD value is 
85.3 mDa, but 
an error of 12 ppm shifted its KMD value to 87.5 mDa. The reduced mass accuracy in the ions at 
m/z
 
133 and 164 account for the false positives, with errors of 27 and 52 ppm, respectively. The 
theoretical KMD values for these ions are 83.2 and
 
75.7 mDa, respectively, and these values are 
not within the range of the filter. 
 
Fragment ion Filter 6 was then assessed with the cathinone test set. Since the filter is specific 
to 2C
-
phenethylamines, it is expected that no cathinone ions have KMD value
s that fall within 
the filter. However, four ions were identified from the cathinone test set as having KMD values 
within the filter. These ions are 
m/z
 
148 for 2
-
methoxy MC, 
m/z
 
141 and 199 for pyrovalerone, 
and 
m/z
 
162 for mephedrone. The chemical formul
ae that correspond to these ions are 
C
9
H
10
NO
+
, C
11
H
9
+
, C
14
H
17
N
+
, and C
10
H
12
NO
+
, respectively, and these compounds are clearly not 
in a homologous series with the compounds in the filter. The wide filter window and the reduced 
mass accuracy are the primary 
reasons for these false positives. The KMD values observed for 
the ions at 
m/z
 
141 and 
m/z
 
199 are close to the lower limit of the filter. The KMD values for the 
ions at 
m/z
 
148 and 
m/z
 
162 are close to the filter centroid (87.5 and 87.4 mDa, respectively)
; 
however, the KMD values for the two compounds were lowered (theoretically 89.1 mDa) as a 
result of reduced mass accuracy (error of 11 and 10 ppm, respectively). It is apparent that even 
170
 
 
error of 10 ppm is detrimental to KMD, especially for ions at higher
 
m/z
 
ratios
. It may seem that 
mass accuracy of 10 ppm for larger ions (
e.g.
 
> 100 Da) is comparable to that for smaller ions 
(
e.g.
 
< 100 Da), but because mass accuracy is determined by normalizing the difference 
between 
theoretical and experimental
 
mass by
 
the exact mass of the compound, larger compounds that 
exhibit the same mass difference as smaller compounds display less error and higher mass 
accuracy. 
For example, 
a mass difference of 
1
 
mDa for 
ion
s
 
at 
m/z
 
77.0391 
and 
m/z
 
115.0997 
corresponds to mass accuracies of 13 ppm and 
9 ppm
, respectively. 
With a smaller filter window 
and higher mass accuracy, these ions would not fall within the filter, and the compounds in the 
cathinone test set would not be falsely classified as belonging t
o the phenethylamine class, and 
more specifically, to the 2C
-
phenethylamine class. 
 
Successful classification for two of the ions from 2C
-
D was achieved with this filter
.
 
Hence, 
2C
-
D 
wa
s correctly classified as belonging to the phenethylamine class, and mo
re specifically to 
the 2C
-
phenethylamine class. However, false positives for 2C
-
D and 6
-
APB were observed. 
Furthermore, false positives for the three cathinone test set standards were also observed. 
Clearly, refinement of the filter window and ions with hi
gher mass accuracy are necessary. In 
contrast to Filters 2 and 4, the filter is theoretically discriminatory, as the ions in the cathinone 
test set are not in a homologous series with the compounds in Filter 6. 
 
Fragment ion filters for cathinones were the
n investigated in order to assess their potential to 
provide discrimination between the two classes (Table 9.5). Only Filters 2, 7, and 9 are 
discussed. Filters 1 and 3 (
Figure
s
 
B.1
8 and 
B.
19
) are similar 
to Filter 2, Filters 4 

 
6 (
Figure
s
 
B.
20 

 
B.
22
) co
ntain 
ions that were used in filters for phenethylamines (Filters 1 

 
3, 
respectively) and are clearly not discriminatory, and Filter 8 (
Figure B.
23
) is similar 
to Filter 7. 
171
 
 
Chemical information may be obtained from the cathinone fragment ion filters that 
can provide 
classification to the cathinone class and discriminate between phenethylamines and cathinones. 
 
 
Table 9.5. List of the ions included in each fragment ion filter for the cathinone class using 
Kendrick mass defect.
 
Filter
 
m/z
 
R
ange
 
Chemical 
Formula Range
 
1
 
58 

 
72
 
C
3
H
8
N
+
 

C
4
H
10
N
+
 
2
 
42 

 
98
 
C
2
H
4
N
+
 

C
6
H
12
N
+
 
3
 
54 

 
96
 
C
3
H
4
N
+
 

C
6
H
10
N
+
 
4
 
51 

 
79
 
C
4
H
3
+
 

C
6
H
7
+
 
5
 
63 

 
105
 
C
5
H
3
+
 

C
8
H
9
+
 
6
 
62 

 
118
 
C
5
H
2
+
 

C
9
H
10
+
 
7
 
132 

 
174
 
C
9
H
10
N
+
 

C
12
H
16
N
+
 
8
 
130 

 
186
 
C
9
H
8
N
+
 

C
13
H
16
N
+
 
9
 
105 

 
119
 
C
7
H
5
O
+
 

C
8
H
7
O
+
 
 
Fragment ion Filter 2 for the cathinone class was developed for 
ions
 
that ranged in chemical 
formulae from C
2
H
4
N
+
 
to C
6
H
12
N
+
 
in the lower 
m/z
 
range
 
(42 

 
98 Da), and includes the 
ions
 
C
3
H
6
N
+
 
and C
4
H
8
N
+
. The ion at 
m/z
 
42 is present only in me
thcathinone, but the 
m/z
 
56 ion is 
exhibited by all the training set standards. The 
m/z
 
70 ion is present in 3
-
MEC, 

-
PPP, and 3
-
methyl PPP, and the 
m/z
 
98 ion is observed in 

-
PPP and 3
-
methyl PPP. These ions correspond 
to the aliphatic chain attached to the amine, and the increase 
in 
m/z
 
ratio 
is 
the addition of CH
2
 
groups to the chain. The filter is centered at 12.7 
±
 
0.6 mDa (99.99% CL, n = 11) (Figure 9.20). 
A small filter 
window is observed despite the high confidence level, with mass accuracy of the 
training set standards 

 
10 ppm. 
 
Filter 2
 
was first evaluated with the cathinone test set. Six ions from the test set were 
identified as having KMD values that were within the
 
filter. These ions are the 
m/z
 
56 ion for all 
three standards, and the ions at 
m/z
 
70, 98, and 126 for pyrovalerone. Successful classification of 
these ions was achieved, and it was observed that the homologous series continues past 
m/z
 
98 to 
172
 
 
incorporate 
the 
m/z
 
126 ion for pyrovalerone, which has chemical composition C
8
H
16
N
+
. 
However, false negatives were found, as three ions with chemical formulae that correspond to 
belonging to this homologous series were identified. These are 
m/z
 
42 for 2
-
methoxy MC an
d 
mephedrone, and 
m/z
 
84 for pyrovalerone, with chemical formulae C
2
H
4
N
+
 
and C
5
H
10
N
+
, 
respectively. The primary reason for the false negatives at 
m/z
 
42 for 2
-
methoxy MC and 
mephedrone is the reduced mass accuracy of these ions, with errors of 69 and 67 pp
m, 
respectively. On the other hand, the false negative at 
m/z
 
84 for pyrovalerone is attributed to the 
narrow filter window (0.6 mDa). Further refinement of the filter window by analyzing more 
reference standards is needed in order to reduce false negative
s. 
 
 
Figure 9.20. Fragment ion Filter 2 for cathinones (99.99% CL, n = 11) using KMD.
 
The 
cathinone test set is plotted. No ions from the phenethylamine test set were found to contain 
KMD values within this filter. 
 
 
Filter 2
 
was then assessed with the phenethylamine test set. No ions from the test set were 
found to have KMD values within the filter range, indicating that none of the phenethylamine 
173
 
 
test set standards are classified as a cathinone using this filter. Also, no fa
lse positives or 
negatives were observed. 
 
Despite the three false negatives that were identified, successful classification of the 
cathinone test set ions was achieved. Furthermore, Filter 2 is discriminatory as no ions in the 
phenethylamine test set were
 
found to have KMD values that were within the filter. The ability 
of this filter to extend up to the mid
-
m/z
 
range
 
to incorporate the 
m/z
 
126 
ion 
shows not only 
specificity of the homologous series but also demonstrates the utility of KMD filters to be 
ap
plied to larger molecules in the homologous that may exhibit fragment ions in the higher 
m/z
 
range
. In the event of this occurrence, the KMD filters are still able to correctly classify the ions, 
in contrast to the absolute mass defect filters, which are l
imited by the specified 
m/z
 
range
. 
 
Fragment ion Filter 7 spans 
the
 
m/z
 
range
 
from 132 

 
174 Da, and is one of the filters 
developed with ions at the higher 
m/z
 
range
. The chemical formulae associated with this filter 
range from 
C
9
H
10
N
+
 
to
 
C
12
H
16
N
+
,
 
and in
clude the 
ion
 
C
11
H
14
N
+
 
at 
m/z
 
160. Methcathinone 
exhibits the 
m/z
 
132 ion, while 3
-
MEC and 

-
PPP display the ion at 
m/z
 
160, and the 
m/z
 
174 ion 
is present in 3
-
methyl PPP. The structures that correspond to these ions are based on the 
cathinone core structure without the carbonyl oxygen and the methyl group on the alpha
-
carbon. 
Additional CH
2
 
groups most likely on the amine group result in
 
the 
higher 
m/z
 
ions
 
in the filter. 
Filter 7 was developed at 66.3 
±
 
1.4 mDa (95% CL, n = 4) (Figure 9.21).
 
Fragment ion Filter 7 was first evaluated using the cathinone test set. Five ions from two of 
the test set standards were identified as having KMD v
alues that were within the filter. The ions 
at 
m/z
 
74, 132, 146, and 147 present in 2
-
methoxy MC, and at 
m/z
 
174 
in
 
pyrovalerone, had 
KMD values within this filter. No ions for mephedrone were observed to have KMD values that 
fell within this filter. Of th
e five ions, the three ions at 
m/z
 
132 and 146 for 2
-
methoxy MC and 
174
 
 
m/z
 
174 for pyrovalerone were expected to have KMD values in this range, as the chemical 
formulae for these ions (C
9
H
10
N
+
, C
10
H
12
N
+
, and C
12
H
16
N
+
) correspond to compounds that are in 
a hom
ologous series with the compounds in Filter 7. However, the remaining two ions at 
m/z
 
74 
and 147 for 2
-
methoxy MC are false positives, with chemical formulae C
6
H
2
+
 
and C
10
H
13
N
+
, 
respectively. Clearly, the two ions are not in a homologous series with the co
mpounds in 
F
ilter 7. 
The false positive at 
m/z
 
74 is attributed to the proximity of the filter to other KMD values in 
conjunction with slightly reduced mass accuracy of the ion. The error associated with this ion is 
9 ppm, and while this is an acceptable v
alue, the reduction in mass accuracy led to a shift in 
KMD from 67.0 mDa to 66.3 mDa. The false positive at 
m/z
 
147 is mainly due to reduced mass 
accuracy, as the error associated with the ion is 46 ppm. Despite the two false positives for 2
-
methoxy MC, th
e compound exhibited two ions that were within the filter, indicating that the 
compound is correctly classified as belonging to the cathinone class with this filter. 
 
 
Figure 9.21. Fragment ion Filter 7 for cathinones (95% CL, n = 4) using KMD.
 
Test sets 
from both classes are plotted.
 
175
 
 
Filter 7 was then evaluated with the phenethylamine test set. Three ions from two test set 
standards were identified as having KMD values within the filter. Of these, the ions at 
m/z
 
74 
and 102 were present in 6
-
AP
B and the 
m/z
 
136 ion in 2C
-
D had KMD values in this range. 
Using this fragment ion filter, both 6
-
APB and 2C
-
D would have been incorrectly classified as 
cathinones. However, these ions are false positives, as the compounds corresponding to these 
ions (C
6
H
2
+
, C
8
H
6
+
, and C
9
H
12
O
+
) are not in a homologous series with the compounds in this 
filter. The reason that the ions at 
m/z
 
74 and 102 for 6
-
APB had KMD values within the filter is 
the same as above for the 
m/z
 
74 ion for 2
-
methoxy MC. Reduced mass accuracy 
for the ion at 
m/z
 
136 for 2C
-
D is observed (error of 30 ppm), resulting in a false positive.  
 
Successful classification of the majority of the ions in the cathinone test set was achieved; 
however, two false positives were identified from the test set. Fu
rthermore, three false positives 
were identified from the phenethylamine test set, indicating that the filter window needs to be 
refined and higher mass accuracy for the ions in the higher 
m/z
 
range
 
is needed. Despite the false 
positives, the filter is tec
hnically discriminatory for the two classes, as all other ions in the 
phenethylamine test set did not have KMD values in this range, and most phenethylamine 
fragment ions at the higher 
m/z
 
range
 
do not contain a nitrogen atom. 
 
Fragment ion Filter 9 was th
e only filter for cathinones that included ions composed of 
carbons, hydrogens, and one oxygen atom. The 
m/z
 
range
 
that the filter spans is from 105 

 
119 
Da, with corresponding chemical formulae 
C
7
H
5
O
+
 
to C
8
H
7
O
+
. The structures of these two ions 
are based
 
on the benzoyl ion, with the 
m/z
 
119 ion corresponding to the methylated benzoyl ion. 
It is important to note that there is another ion with nominal mass of 105 Da; however, that ion 
corresponds to C
8
H
9
+
, which is a structure containing a benzene ring and an ethylene group, and 
was not used in the development of this filter. Using HRMS, these two 
ions
 
are discriminated, as 
176
 
 
the former has an exact mass of 105.0340 Da while the latter has an exact mass of 1
05.0704 Da. 
The continued reference to the 
m/z
 
105 in this discussion is to the ion with exact mass of 
105.0340 Da. 
 
Methcathinone and 

-
PPP exhibit the ion at 
m/z
 
105 while the substituted compounds 3
-
MEC, 2
-
methyl MC, and 3
-
methyl PPP exhibit the 
m/z
 
119
 
ion. The filter was developed at 83.1 
±
 
0.5 mDa (98% CL, n = 5) (Figure 9.22). The filter contains a narrow filter window, which 
increases the possibility of false negatives. 
While this is undesirable, 
filter windows are
 
statistically determined 
based upo
n the confidence interval
s
 
associated with the 
KMD values for 
each ion in the 
filter
s
. 
Thus, the confidence intervals are different depending on the range of the 
KMD values, and the smallest confidence interval that encompasses the range of KMD values is 
u
sed as the filter window in order to reduce false positives. 
 
Filter 9 was first assessed with the cathinone test set. Only the 
m/z
 
119 ion for mephedrone 
was identified to have a KMD value that was within the filter. Successful classification of this 
ion 
was achieved; however, four other ions at 
m/z
 
105 for 2
-
methoxy MC and 
m/z
 
119 for 2
-
methoxy MC and pyrovalerone were expected to have KMD values within this range. These ions 
are false negatives, as their KMD values are outside the filter window. The prim
ary reason for 
this occurrence is the narrow filter window, as the mass accuracy of these false negatives range 
from 7 

 
12 ppm, which are acceptable values. Furthermore, the ion at 
m/z
 
119 for mephedrone 
that was within the filter has a KMD value that is 
the exact value of the lower limit, indicating 
that the refinement of the filter window is needed in order to reduce the number of false 
negatives. 
 
Filter 9 
was then evaluated using the phenethylamine test set. Only one ion was identified in 
the test set 
as having a KMD value within the filter; this was the 
m/z
 
167 ion in 2C
-
D, which 
177
 
 
corresponds to C
10
H
15
O
2
+
 
and is clearly not part of the homologous series in this filter. This false 
positive is attributed to the reduced mass accuracy of the ion (26 ppm) th
at resulted in a shift in 
KMD from 79.4 mDa to 83.7 mDa, and was thus within the filter. The 
m/z
 
119 ion also in 2C
-
D 
was a false negative due to the narrow filter window, as its mass accuracy is 14 ppm, which is 
acceptable. It is likely that the 
m/z
 
119 i
on present in 2C
-
D corresponds to a structure that 
contains a benzene ring and a methoxy group, rather than the methylated benzoyl group since 
2C
-
D contains two methoxy groups and does not have a carbonyl functional group attached to 
the benzene ring. Impr
oved mass accuracy in the higher 
m/z
 
range
 
is needed in order to reduce 
the number of false positives and a wider filter window is needed to minimize false negatives. 
 
 
Figure 9.22. Fragment ion Filter 9 for cathinones (98% CL, n = 5) using KMD. 
Test sets
 
from both classes are plotted.
 
 
Successful classification of mephedrone to the cathinone class using Filter 9 was achieved; 
further analysis of a wider range of standards to refine the filter window is needed in order to 
178
 
 
reduce the number of false negativ
es. Despite the false positive in the phenethylamine test set, 
this filter is still 
potentially 
useful for discrimination between the two classes, as the compounds 
in the filter are based upon a structure that is found in the cathinone core structure but n
ot in the 
phenethylamine structure. 
Despite the false positives from compounds that exhibit ions 
with 
structure
s
 
that 
are
 
similar to the benzoyl functional group, this filter is still useful in a 
classification scheme since compounds 
with
 
ions 
that have 
KM
D values in this range are more 
likely to possess a benzoyl functional group than compounds with ions that have KMD values 
not within the filter. 
 
In summary, 
the majority 
of the fragment ion filters for phenethylamines were developed 
with ions that contai
n only carbon and hydrogen. Most of the filters were developed for ions in 
the lower 
m/z
 
range; however, two filters were established with ions at the higher 
m/z
 
range. 
These two filters provide subclassification specificity within phenethylamines, and are
 
highly 
discriminatory despite the presence of false positives. As briefly discussed above, the KMD 
values for ions at higher 
m/z
 
ratios 
are more susceptible to reduced mass accuracy as compared 
to ions in the lower 
m/z
 
range, despite acceptable error (
i.e
.
 
below 20 ppm). The other filters 
for 
the phenethylamine class, 
however, do not provide discrimination between phenethylamines and 
cathinones.
 
 
T
he majority of the fragment ion filters for the cathinone
 
class included ions that contained 
carbon, hydrogen,
 
and nitrogen. This was observed throughout the 
m/z
 
range
, indicating that 
more nitrogen
-
containing ions are found in the mass spectra of cathinones
,
 
in contrast to the 
carbons and hydrogens that comprise the majority of the phenethylamine fragment ions. Because 
the filters developed for the two classes show distinct differences in chemical and structural 
179
 
 
information, a preliminary compound class can b
e predicted using KMD filters depending on the 
KMD values of the majority of the ions in an unknown compound. 
 
The specificity of the KMD filters is also important, as this allows KMD filters to be used in 
the later stages of a classification scheme rather
 
than at the preliminary stages, as observed with 
absolute mass defect filters. The large 
m/z
 
range
s that the KMD filters span is also advantageous 
to the classification, as it overcomes the challenges associated with absolute mass defect. 
However, some li
mitations of KMD do exist, such as the need for high mass accuracy across the 
entire 
m/z
 
range
, which was found to be crucial in order to reduce false positives, especially in 
the higher 
m/z
 
range
. Thus, another type of mass defect filter that is not as af
fected by mass 
accuracy is needed in the classification scheme.    
 
9.5 Relative Mass Defect 
 
 
Relative mass defect (RMD) is useful for determining whether a compound is hydrogen
-
 
and 
nitrogen
-
rich or oxygen
-
rich by normalizing the absolute mass defect of 
the compound to its 
exact mass. High RMD values indicate hydrogen
-
 
and nitrogen
-
richness, while low RMD values 
are indicative of oxygen
-
richness. This chemical information is useful for classification.
 
 
A molecular ion filter for the phenethylamine class w
as developed using the 
phenethylamine 
training set
 
(Figure 9.23). The 
filter is 
centered at 627 ± 82 ppm
 
(82% CL for n = 3). The filter 
lies in the mid
-
range RMD, which is expected as the phenethylamine compounds in the filter all 
contain a nitrogen atom a
nd between 1 and 2 oxygen atoms. Hydrogen
-
 
and nitrogen
-
richness 
along with oxygen
-
richness is balanced in these compounds. The filter was assessed using the 
phenethylamine test set; 3,4
-
MDPA did not exhibit a molecular ion, and therefore, only 6
-
APB 
and 2
C
-
D were used in the test set to evaluate the molecular ion filter. Both compounds have 
RMD values that lie within the filter, indicating successful classification with this filter. 
180
 
 
However, because the filter is in the mid
-
range RMD, the amount of chemica
l information 
obtained regarding these compounds is limited. 
 
 
Figure 9.23. Molecular ion filter for phenethylamines using RMD (82% CL, n = 3), with 
phenethylamine test set plotted. 
 
 
Instead of fragment ion filters, profiles of the RMD for the fragment ions of the standards for 
each class were generated. The RMD values of 17 ions from each of the five phenethylamine 
training set standards were plotted 
against 
m/z
 
values
 
to generate the
 
RMD 
profile (Figure 9.24). 

m/z
 
range
 

number of fragment ions plotted, only 12% of the ions exhibit high RMD values, and all are in 
the lower 
m/z
 
range
. This indicates that the majority of the fragment ions in the phenethylamine 
compounds are neither hydrogen
-
 
nor nitrogen
-
rich. The remaining fragment ions have RMD 
values between 400 and 600 ppm throughout the entire 
m/z
 
range
, indicating that the majori
ty of 
fragment ions are oxygen
-
rich. These fragment ions are likely to be composed of carbon, 
181
 
 
hydrogen, and oxygen, with low hydrogen content. The fragment ions from the phenethylamine 
test set were then plotted onto the profile. The majority of the ions h
ave RMD values within the 
400 

 
The pattern of the fragment ions in the test set is similar to that of the training set standards.
 
 
Figure 9.24. Fragment ion profile for p
henethylamines using RMD, with fragment ions 
from the phenethylamine test set plotted.
 
Red box o
utlines high RMD r
a
n
ge in the lower 
m/z
 
range.
 
 
RMD values of 14 ions from each of the 5 cathinone training set 
standards
 
was plotted 
against 
m/z
 
values to gen
erate
 
the profile 
(Figure 9.25). The 
pattern of fragment ions in the 
cathinone standards show some similarity to that of the fragment ions in the phenethylamine 

r
ange for the cathinones. Of the 70 fragment ions included in the profile, 40% of the ions have 
high RMD values (
i.e.
 

182
 
 
this region. This may be a point of discrimination between the two clas
ses for the compounds 
investigated in this research. 
 
 
Figure 9.25. Fragment ion profile for cathinones using RMD, with fragment ions from 
cathinone test set plotted. 
Red box outlines high RMD range in the lower 
m/z
 
range.
 
 
It is apparent that 
cathinones exhibit more ions that are hydrogen
-
 
and nitrogen
-
rich, 
especially in the lower 
m/z
 
range
 
as compared to phenethylamines. These fragment ions are 
likely the nitrogen
-
containing aliphatic portions of the compounds as opposed to the aromatic 
compo
nent. This is in agreement with the KMD fragment ion filters developed for cathinones, as 
the majority of the ions in the filters contained nitrogen and likely have high RMD values. The 
cathinone standards do display ions in the 400 

 
600 ppm region across
 
the 
m/z
 
range
; however, 
the pattern of the ions in this region is not as concentrated as the one observed in the 
phenethylamine profile. The fragment ions in this RMD range most likely contain only carbon 
and hydrogen comprising the aromatic component of 
the compound; however, not many of these 
fragment ions are observed. The pattern of the cathinone test set resembles that of the cathinone 
183
 
 
standards. The differences observed in the patterns of both standards and test sets are potentially 
useful in discrim
inating between phenethylamines and cathinones.
 
The RMD molecular ion filter does not provide discriminatory information since it is in the 
mid
-
range of RMD; however, the RMD profiles generated from the fragment ions for both 
classes display some differenc
es that are attributed to chemical differences between the two 
classes. These differences are potentially useful for classification to the phenethylamine and 
cathinone classes. 
 
9.6 Classification Scheme 
 
The three types of mass defects can be incorporated
 
into a classification scheme. Figure 9.26 
illustrates a proposed classification scheme in order to be able to classify novel synthetic drugs to 
the cathinone or phenethylamine class. Absolute mass defect is most suited for a preliminary 
classification in 
order to determine whether the compound of interest is a phenethylamine
-
 
or 
cathinone
-
like compound. Because the 
molecular ion and fragment ion filters probe the 
aromaticity and aliphaticity 
of compounds, a synthetic designer drug exhibiting both component
s 
is likely to be a cathinone or phenethylamine. 
 
The second step in the classification scheme is to utilize the RMD molecular ion filter and 
RMD profiles. 
Once absolute mass defect filters have indicated whether the unknown compound 
is likely to have both
 
aromatic and aliphatic components, RMD filter and profiles will then 
indicate whether the compound has more characteristics belonging to cathinones or 
phenethylamines
.
 
Finally, 
KMD filters are able to provide more specific information regarding 
the struct
ure of the compound
, including the subclass
. 
The outputs of all the mass defect filters 
and profiles will then be combined to give an overall output class of either cathinone or 
phenethylamine. 
This research demonstrates that different types of mass defect
 
filters probe 
184
 
 
different aspects of a compound, and that the combination of the different mass defect filters 
provide more chemical and structural information necessary for classification. 
 
 
Figure 9.26. Diagram of a proposed classification scheme using t
he three types of mass 
defects for classification of novel synthetic designer drugs to the cathinone or 
phenethylamine class. 
 
 
185
 
 
APPENDIX
 
 
186
 
 
Figure 
B.
1. Mass spectrum of 2C
-
P obtained by (a) GC
-
QMS and (b) GC
-
TOFMS with (c) 
proposed fragmentation pathway. 
 
 
187
 
 
(c)
 
 
188
 
 
Figure 
B.
2. Mass spectrum of 2C
-
D obtained by (a) GC
-
QMS and (b) GC
-
TOFMS
 
with (c) 
proposed fragmentation pathway.
 
 
189
 
 
(c)
 
 
190
 
 
Figure 
B.
3. Mass spectrum of 6
-
APB obtained by (a) GC
-
QMS and (b) GC
-
TOFMS with 
(c) proposed fragmentation pathway. 
 
 
191
 
 
(c)
 
 
192
 
 
Figure 
B.
4. Mass spectrum of 5
-
MAPDB obtained by (a) GC
-
QMS and (b) GC
-
TOFMS 
with (c) proposed fragmentation pathway.
 
 
193
 
 
(c)
 
 
194
 
 
Figure 
B.
5. Mass spectrum of 3,4
-
MDPA obtained by (a) GC
-
QMS and (b) GC
-
TOFMS 
with (c) proposed fragmentation pathway. 
 
 
195
 
 
(c)
 
 
196
 
 
Figure 
B.
6. Mass spectrum of 3
-
methyl PPP obtained by (a) GC
-
QMS and (b) GC
-
TOFMS 
with (c) proposed fragmentation pathway. 
 
 
197
 
 
(c)
 
 
198
 
 
Figure 
B.
7. Mass spectrum of methcathinone
 
obtained by (a) GC
-
QMS and (b) GC
-
TOFMS with (c) proposed fragmentation pathway. 
 
 
199
 
 
(c)
 
 
200
 
 
Figure 
B.
8. Mass spectrum of 2
-
methyl MC obtained by (a) GC
-
QMS and (b) GC
-
TOFMS 
with (c) proposed fragmentation pathway. 
 
 
201
 
 
(c)
 
 
202
 
 
Figure 
B.
9. Mass spectrum of 3
-
MEC obtained by (a) GC
-
QMS and (b) GC
-
TOFMS with 
(c) proposed fragmentation pathway. 
 
 
203
 
 
(c)
 
 
204
 
 
Figure 
B.
10. Mass spectrum of pyrovalerone
 
obtained by (a) GC
-
QMS and (b) GC
-
TOFMS 
with (c) proposed fragmentation pathway. 
 
 
205
 
 
(c)
 
 
206
 
 
Figure 
B.
11. Mass spectrum of mephedrone
 
obtained by (a) GC
-
QMS and (b) GC
-
TOFMS 
with (c) proposed fragmentation pathway. 
 
 
207
 
 
(c)
 
 
208
 
 
Figure 
B.
12. Structure of cocaine.
 
 
209
 
 
Figure 
B.
13. Structures of the (a) 
m/z
 
56, (b) 
m/z
 
77, and (c) 
m/z
 
91 ions common to 
cathinone training set standards. 
 
 
(a)
 
 
(b)
 
 
(c)
 
 
210
 
 
Figure 
B.
14. Fragment ion Filter 1 developed for the phenethylamine class using KMD.
 
The 
filter is centered at 32.6 
±
 
4.2 mDa (99.999% CL, n = 13). 
 
 
211
 
 
Figure 
B.
15. Fragment ion Filter 3 developed for the phenethylamine class using KMD. 
The 
filter is centered at 53.2 
±
 
1.5 mDa (99.9% CL, n = 8). 
 
 
212
 
 
Figure 
B.
16. Fragment ion Filter 5 developed for the phenethylamine class using KMD.
 
The 
filter is centered at 73.8 
±
 
1.5 mDa (99.5% CL, n = 8). 
 
 
213
 
 
Figure 
B.
17. Fragment ion Filter 7 developed for the phenethylamine class using KMD.
 
The 
filter is centered at 91.1 
±
 
2.7 mDa (99% CL, n = 6).  
 
 
214
 
 
Figure 
B.
18. Fragment ion Filter 1 developed for the cathinone class using KMD.
 
The filter 
is centered at 
-
0.6 
±
 
0.2 mDa (80% CL, n = 3).  
 
 
215
 
 
Figure 
B.
19. Fragment ion Filter 3 developed for the cathinone class using KMD.
 
The filter 
is centered at 26.0 
±
 
0.3 mDa (99.8% CL, n = 6). No phenethylamine ions were observed. 
 
 
216
 
 
Figure 
B.
20. Fragment ion Filter 4 developed for the cathinone class using KMD.
 
The filter 
is centered at 33.5 
±
 
0.7 mDa
 
(99.9% CL, n = 8). 
 
 
217
 
 
Figure 
B.
21. Fragment ion Filter 5 developed for the cathinone class using KMD.
 
The filter 
is centered at 46.7 
±
 
1.0 mDa (99.99% CL, n = 15). 
 
 
218
 
 
Figure 
B.
22. Fragment ion Filter 6 
developed for the cathinone class using KMD.
 
The filter 
is centered at 54.0 
±
 
1.9 mDa (99.95% CL, n = 8). 
 
 
219
 
 
Figure 
B.
23. Fragment ion Filter 8 developed for the cathinone class using KMD.
 
The filter 
is centered at 79.0 
±
 
0.5 mDa
 
(90% CL, n = 4). 
 
 
220
 
 
REFERENCES
 
 
221
 
 
R
EFERENCES
 
 
1.
 
Grabenauer M, Krol WL, Wiley JL, Thomas BF. Analysis of Synthetic Cannabinoids 
Using High
-
Resolution Mass Spectrometry and 
Mass Defect Filtering: Implications for 
Nontargeted Screening of Designer Drugs. Analytical Chemistry. 2012 2012/07/03;84(13):5574
-
81.
 
 
222
 
 
Chapter 
10 Conclusions and Future Work
 
 
10.1 Conclusions
 
This proof
-
of
-
concept research aimed to develop tools to assist forensic practitioners in the 
identification of synthetic designer drugs, particularly by reducing the time
-
consuming nature of 
structural elucidation and allowing analysts to prioritize novel
 
synthetic drugs for identification. 
High
-
resolution mass spectrometry (HRMS) as an alternative to GC
-
QMS was first investigated 
to assess the feasibility of using mass spectra acquired via GC
-
TOFMS as references to which 
low
-
resolution mass spectra can be
 
compared. It was observed that high
-
resolution mass spectral 
data provide similar, if not more, chemical information than low
-
resolution mass spectra, which 
is ideal for identification of synthetic drugs. In the event that mass spectra of reference standa
rds 
acquired via GC
-
QMS are not available for forensic analysts, it is advantageous to compare mass 
spectra 
of submitted samples 
obtained by GC
-
QMS to 
those of reference standards obtained by 
GC
-
TOFMS 
that are 
available to practitioners. 
 
The second goal w
as to develop mass defect filters from mass spectra obtained via GC
-
TOFMS to allow discrimination between the synthetic phenethylamine and synthetic cathinone 
classes. Three types of mass defect filters were investigated: absolute, Kendrick, and relative 
m
ass defect. Each type of mass defect filter probed a different aspect of a compound, and 
differed in their specificity of classification. Absolute mass defect is non
-
specific and is better 
suited for preliminary screening of compounds to distinguish synthe
tic phenethylamine
-
 
and 
cathinone
-
like compounds from other compounds. On the other hand, Kendrick mass defect is 
highly specific and is able to subclassify within a structural class, indicating that these filters are 
ideal in the later stages of a classif
ication scheme. Relative mass defect filters and profiles 
displayed higher specificity than absolute mass defect, but classification was not as specific as 
223
 
 
Kendrick mass defect, indicating that it should be incorporated between absolute and Kendrick 
mass d
efect filters in a classification scheme. By combining all three types of filters, the 
specificity of the classification is increased. Using mass defect filters, classification of synthetic 
designer drugs is rapid and simple, and allows forensic analyst to
 
prioritize the analysis of novel 
synthetic drugs, so that other resources are directed towards identification. 
 
10.2 Future Work
 
Further investigation of absolute mass defect filters is necessary to obtain useful information 
for classification to the phen
ethylamine and cathinone classes; this preliminary study focused on 
fragment ions that were common to all five standards in each class. However, the specificity of 
the classification to either class can be increased with the use of subsequent mass defect f
ilters 
for ions that may only be present in certain groups of compounds within the class. 
Additionally, 
a more in
-
depth study on the ideal characteristics for mass defect filters would be particularly 
useful in order to develop filters that provide the lev
el of specificity and accuracy needed for 
classification. 
 
A
 
wider range of compounds 
also 
needs to be investigated to ensure that the mass defect 
filters developed are representative of the structural classes. Furthermore, a classification scheme 
develope
d using the mass defect filters would benefit from a confidence
-
based classification
.
 
This type of classification would entail the assignation of a confidence level to 
the output class 
from 
the 
screening based upon how well the chemical information from an
 
unknown compound 
is captured by the filters. Even though the filter windows are developed at different confidence 
levels, the specificity of each filter itself is variable, and by weighting each filter based on its 
specificity for classification, an overa
ll confidence value can be assigned to the resulting 
classification
 
as a measure of how well t
he novel compound fits into a
 
particular class
.