CLASSIFICATION OF SYNTHETIC PHENETHYLAMINES AND TRYPTAMINES USING 

MULTIVARIATE STATISTICAL PROCEDURES   

 

By 

Amanda Lynn Setser 

 

 

 

 

 

 

 

A THESIS 

Submitted to 

Michigan State University 

in partial fulfillment of the requirements 

for the degree of  

Forensic Science – Master of Science 

2019 

 

 

 

 

 

ABSTRACT 

CLASSIFICATION OF SYNTHETIC PHENETHYLAMINES AND TRYPTAMINES USING 

MULTIVARIATE STATISTICAL PROCEDURES  

 

By 

Amanda Lynn Setser 

Identification of newly emerging synthetic designer drugs is challenging for forensic drug 

analysts due to the lack of available reference materials for a visual comparison of mass spectra. 

The focus in this work was the classification of synthetic drugs according to class or subclass 

using multivariate statistical procedures, specifically linear discriminant analysis (LDA). 

Reference materials representative of tryptamines and phenethylamines were analyzed by gas 

chromatography-mass spectrometry with a single quadrupole mass analyzer. Before LDA 

models could be developed, variable reduction was necessary. Two methods of variable selection 

were used. The first method used principal components analysis (PCA) as an objective approach 

and the second method used an informed chemical approach where mass spectra were probed for 

ions characteristic of each class or subclass of compounds.  

 

Ultimately, two variable sets were compared for classification success rates: the variable 

set selected by PCA (including nine m/z values) and the variable set selected using the informed 

chemical approach (including 13 m/z values). Two models were defined using the different 

variable sets and a common training set. A test set was then introduced to the model for 

classification. The LDA model using the informed chemical approach performed better, with a 

93% classification success rate as opposed to the 86% success rate observed when using the 

variables selected by PCA. Overall, this research provides a classification procedure for 

compounds not identifiable by standard methods of comparison to a known reference material. 

 

ACKNOWLEDGMENTS 

 

 

I would like to first thank my advisor, Dr. Ruth Smith for her guidance and support on 

this research project. Without her knowledge, expertise, and moral support, this project would 

not have been possible. She provided opportunities to grow as a scientist and gain the confidence 

necessary to complete this thesis. I would also like to thank Dr. Victoria McGuffin for her 

support and feedback throughout the development of this work. She was always willing to 

challenge me to take the next step and thus allow me to push the boundaries. I would also like to 

thank Dr. Scott Wolfe for being so enthusiastic in agreeing to serve on my committee on such 

short notice. 

 

In addition to the personal support I received to complete this project, I am also grateful 

to the National Institute of Justice for supporting this research through grant number 2015-IJ-

CX-K008. Points of view in this thesis are those of the author and do not necessarily represent 

the official position or policies of the U.S. Department of Justice. 

 

Finally, I would like to thank all of my friends, family, and members of the Forensic 

Chemistry group that helped and supported me along the way. A special thank you to Alex 

Anstett for her mentorship in the beginning stages of this project and to Natasha Eklund, Emma 

Stuhmer, and Becca Boyea for making the lab a fun and engaging place to work. Working on 

this project never had a dull moment because of the three of you and I will forever be grateful for 

our time together. I would also like to thank my good friend Thomas Diaz for always being there 

when I needed a good laugh through the stressful times. He always timed his visits exactly when 

I needed them most. I would also like to extend a huge thank you to Todd Burkhart for being the 

rock on which I stand while in graduate school. Without his support and encouragement, nothing 

 

iii 

I do would be possible. Last but certainly not least, I would like to thank my family, especially 

my parents, Theresa and Jeff, and my sister, Olivia. Everything I am and everything I do is a 

direct result of the love and support of my family. I was grateful to have been given many 

opportunities by my parents and I wouldn't be the person I am today without them. This thesis is 

dedicated to them.  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

iv 

TABLE OF CONTENTS 

 

LIST OF TABLES .................................................................................................................... vii 

LIST OF FIGURES .................................................................................................................. viii 

I. Introduction .............................................................................................................................. 1 
1.1 Synthetic Designer Drugs .................................................................................................. 1 
1.2 Single Quadrupole Mass Spectrometry ............................................................................. 4 
1.3 Multivariate Statistical Procedures .................................................................................... 7 
1.3.1 Principal Components Analysis ............................................................................... 8 
1.3.2 Linear Discriminant Analysis ................................................................................. 10 
1.4 Research Objectives ........................................................................................................ 12 
REFERENCES .......................................................................................................................... 14 

II. Materials and Methods ......................................................................................................... 17 
2.1 Synthetic Designer Drug Reference Materials ................................................................ 17 
2.2 Gas Chromatography-Mass Spectrometry (GC-MS) Analysis ....................................... 19 
2.3 Data Processing ............................................................................................................... 21 
2.4 Statistical Models in R .................................................................................................... 22 
APPENDIX ............................................................................................................................... 25 

III. Variable Selection for Linear Discriminant Analysis ......................................................... 29 
3.1 Variable Selection by Principal Components Analysis (PCA) ....................................... 29 
3.2 Variable Selection based on Characteristic Ions ............................................................. 36 
3.2.1 Characteristic Ions for Tryptamines ....................................................................... 38 
3.2.1 Non-Aromatically Substituted Tryptamines ................................................... 38 
3.2.1 Aromatically-Substituted Tryptamines ........................................................... 42 
3.2.3 Characteristic Ions for APB-Phenethylamines ....................................................... 48 
3.2.4 Characteristic Ions for NBOMe-Phenethylamines ................................................. 52 
3.2.2 Characteristic Ions for 2C-Phenethylamines .......................................................... 54 
3.3 Summary ......................................................................................................................... 58 
APPENDICES ........................................................................................................................... 61 
APPENDIX A: Relative Intesity Values of Characteristic Ions for each Class or Subclass  62 
APPENDIX B: Low-Resolution Mass Spectra of 2C-, APB-, and NBOMe-Phenethylamines 
and Tryptamines Investigated ............................................................................................... 66 

 
IV. Linear Discriminant Analysis for the Classification of Synthetic Phenethylamines and 
Tryptamines ............................................................................................................................... 78 
4.1 Variable Set Selection for PCA ....................................................................................... 78 
4.2 LDA using Selected Variable Set from PCA .................................................................. 83 
4.3 LDA using Variable Set from the Informed Chemical Approach ................................... 88 
4.4 Comparison of Variable Selection Methods ................................................................... 92 
4.5 Summary ......................................................................................................................... 93 

 

v 

APPENDIX ............................................................................................................................... 94 

V. Conclusions and Future Work .............................................................................................. 97 
5.1 Conclusions ..................................................................................................................... 97 
5.2 Future Work .................................................................................................................... 98 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

vi 

 

LIST OF TABLES 

 

Table 2.1 Substituents for 2C-phenethylamines investigated ....................................................... 18 

Table 2.2 Substituents for NBOMe-phenethylamines investigated .............................................. 18 

Table 2.3 Training Set of Reference Standards for Classification Models ................................... 24 

Table 2.4 Test Set 1 of Reference Standards for Classification Models ...................................... 24 

Table 2.5 Test Set 2 of Reference Standards for Classification Models ...................................... 24 

Table A.1 Compound abbreviations with full chemical names .................................................... 26 

Table A.2 R Code for Inputting Data............................................................................................ 27 

Table A.3 R Code for PCA ........................................................................................................... 27 

Table A.4 R Code for LDA ........................................................................................................... 28 

Table 3.1 m/z Values Identified using PCA .................................................................................. 35 

Table A.5 Relative Intensity (%) of m/z Values Identified as Characteristic of Tryptamines...... 63 

Table A.6 Relative Intensity (%) of m/z Values Identified as Characteristic of APB-
Phenethylamines ........................................................................................................................... 63 

Table A.7 Relative Intensity (%) of m/z Values Identified as Characteristic of NBOMe-
Phenethylamines ........................................................................................................................... 64 

Table A.8 Relative Intensity (%) of m/z Values Identified as Characteristic of 2C-
Phenethylamines ........................................................................................................................... 65 

Table 4.1 Posterior Probabilities for CV with 20% Relative Loadings Threshold ....................... 82 

Table 4.2 Summary of LDA Classification .................................................................................. 92 

Table A.9 Posterior Probabilities for CV with 30% Relative Loadings Threshold ...................... 95 

Table A.10 Posterior Probabilities for CV with 15% Relative Loadings Threshold .................... 96 

 

vii 

 

LIST OF FIGURES 

 

Figure 1.1 Proportion of new psychoactive substances .................................................................. 2 

Figure 1.2 Core structure of A) phenethylamines and B) tryptamines as well as the 
phenethylamine subclasses: C) APB-phenethylamines, D) 2C-phenethylamines, and E) NBOMe-
phenethylamines where R indicates common substituent site ........................................................ 2 

Figure 1.3 Diagram of an electron ionization source ...................................................................... 5 

Figure 1.4 Diagram of a quadrupole mass analyzer demonstrating ions with stable (blue) and 
unstable (red) trajectories ................................................................................................................ 6 

Figure 1.5 Diagram of a continuous dynode electron multiplier .................................................... 7 

Figure 1.6 Example scores plot where x and y are original measurement variables and the green 
dots are samples; PC1 is drawn to describe the most variance in the data set and PC2 is drawn 
orthogonally to PC1 ........................................................................................................................ 9 

Figure 1.7 Example loadings plot using mass spectral data ......................................................... 10 

Figure 1.8 Example LDA scores plot ........................................................................................... 12 

Figure 2.1 Structures of the APB-phenethylamine reference standards (A) 4-APB (B) 5-APB ......    
(C) 6-APB (D) 7-APB (E) 4-MAPB and (F) 4-EAPB ................................................................. 17 

Figure 2.2 Core structure for 2C-phenethylamines ....................................................................... 18 

Figure 2.3 (A) Core structure for NBOMe-phenethylamines and (B) structure for ........ 3,4-DMA-
NBOMe ......................................................................................................................................... 18 

Figure 2.4 Structures of the tryptamine reference standards (A) α-MT (B) α-ET (C) N,N-DMT 
(D) DPT (E) 4-hydroxy DMT (F) 4-hydroxy DET (G) 4-Me-α-ET (H) 5-methoxy DMT (I) 5-
methoxy DiPT and (J) 5,7 DCT .................................................................................................... 19 

Figure 3.1 Scree plot for PCA showing proportion of variance (red) and cumulative proportion 
(black) described by each PC ........................................................................................................ 30 

Figure 3.2 Scores plot for A) PC1 vs. PC2 and B) PC1 vs. PC3 .................................................. 31 

Figure 3.3 Loadings plot for PC1 ................................................................................................. 32 

Figure 3.4 Loadings plot for PC2 ................................................................................................. 33 

Figure 3.5 Loadings plot for PC3 ................................................................................................. 33 

Figure 3.6 A) Low-resolution and B) High-resolution spectra for 5-methoxy DiPT ................... 37 

 

viii 

Figure 3.7 Low-resolution spectra for A) α-MT and B) α-ET ..................................................... 39 

Figure 3.8 Proposed fragmentation for A) α-MT and B) α-ET to produce m/z 130 .................... 40 

Figure 3.9 Proposed fragmentation for A) α-MT and B) α-ET to produce m/z 131 .................... 41 

Figure 3.10 Low-resolution spectra for A) 4-hydroxy DMT and B) 4-hydroxy DET ................. 43 

Figure 3.11 Proposed fragmentation for A) 4-hydroxy DMT and B) 4-hydroxy DET to produce 
m/z 146 .......................................................................................................................................... 44 

Figure 3.12   Low-resolution spectra for A) 5-methoxy DiPT and B) 5-methoxy DMT................45                         

Figure 3.13 Proposed fragmentation for A) 5-methoxy DiPT and B) 5-methoxy DMT .............. 46 

Figure 3.14 Low-resolution spectrum for 5,7-DCT ...................................................................... 47 

Figure 3.15 Proposed fragmentation for 5,7-DCT to form m/z 199 ............................................. 48 

Figure 3.16 Low-resolution spectrum of 4-APB .......................................................................... 49 

Figure 3.17 Proposed fragmentation of 4-APB to form m/z 131 .................................................. 49 

Figure 3.18 Low-resolution spectra for A) 4-MAPB and B) 4-EAPB ......................................... 51 

Figure 3.19 Proposed fragmentation for A) 4-MAPB and B) 4-EAPB to form m/z 131 ............. 52 

Figure 3.20 Low-resolution mass spectrum for 25T-NBOMe...................................................... 53 

Figure 3.21 Proposed fragmentation for 25T-NBOMe for ions with m/z 91, 121, and 150 ......... 54 

Figure 3.22 Low-resolution mass spectra for A) 2C-E and B) 2C-G ........................................... 55 

Figure 3.23 Proposed fragmentation for A) 2C-E and B) 2C-G to form m/z 165 ........................ 56 

Figure 3.24 Low-resolution mass spectrum for 2C-T ................................................................... 57 

Figure 3.25 Proposed fragmentation of 2C-T to form m/z 198 ..................................................... 58 

Figure 3.26 Venn diagram illustrating similarities and differences between m/z values selected by 
an informed chemical approach versus PCA ................................................................................ 59 

Figure A.1 Low-resolution spectrum of 4-Me-α-ET .................................................................... 67 

Figure A.2 Low-resolution spectrum of DPT ............................................................................... 67 

Figure A.3 Low-resolution spectrum of N,N-DMT...................................................................... 68 

Figure A.4 Low-resolution spectrum of 5-APB ........................................................................... 68 

 

ix 

Figure A.5 Low-resolution spectrum of 6-APB ........................................................................... 69 

Figure A.6 Low-resolution spectrum of 7-APB ........................................................................... 69 

Figure A.7 Low-resolution spectrum of 25B-NBOMe ................................................................. 70 

Figure A.8 Low-resolution spectrum of 25C-NBOMe ................................................................. 70 

Figure A.9 Low-resolution spectrum of 25D-NBOMe................................................................. 71 

Figure A.10 Low-resolution spectrum of 25E-NBOMe ............................................................... 71 

Figure A.11 Low-resolution spectrum of 25G-NBOMe............................................................... 72 

Figure A.12 Low-resolution spectrum of 25H-NBOMe............................................................... 72 

Figure A.13 Low-resolution spectrum of 25P-NBOMe ............................................................... 73 

Figure A.14 Low-resolution spectrum of 25N-NBOMe............................................................... 73 

Figure A.15 Low-resolution spectrum of 3,4-DMA-NBOMe ...................................................... 74 

Figure A.16 Low-resolution spectrum of 2C-B ............................................................................ 74 

Figure A.17 Low-resolution spectrum of 2C-C ............................................................................ 75 

Figure A.18 Low-resolution spectrum of 2C-D ............................................................................ 75 

Figure A.19 Low-resolution spectrum of 2C-H ............................................................................ 76 

Figure A.20 Low-resolution spectrum of 2C-I ............................................................................. 76 

Figure A.21 Low-resolution spectrum of 2C-N ............................................................................ 77 

Figure A.22 Low-resolution spectrum of 2C-P ............................................................................ 77 

Figure 4.1 Mass spectra for A) 4-MAPB and B) 4-APB .............................................................. 80 

Figure 4.2 LDA scores plot showing A) LD1 vs. LD2 and B) LD1 vs. LD3 using PCA as the 
variable selection method, where the boxes to the right indicate zoomed-in regions .................. 84 

Figure 4.3 Coefficients of linear discriminants for the nine variables selected using PCA ......... 86 

Figure 4.4 LDA scores plot showing A) LD1 vs. LD2 and B) LD1 vs. LD3 using the informed 
chemical approach as the variable selection method .................................................................... 89 

Figure 4.5 Coefficients of linear discriminants for the 13 variables selected using the informed 
chemical approach ........................................................................................................................ 90    

 

x 

 

I. Introduction 

 
 
1.1 Synthetic Designer Drugs 

Controlled substances contribute to one-third of the caseloads in forensic laboratories in 

the United States.1 The National Forensic Laboratory Information System's 2015 Annual Report 

published by the Drug Enforcement Administration (DEA) estimated that over 1.19 million cases 

involving controlled substances were submitted to state and local crime laboratories in 2015.2 

Controlled substances are classified by the Controlled Substances Act into five schedules based 

on the potential for abuse, dependency, and accepted medical use.3 Schedule I contains 

substances with no medical use and a high potential for abuse, such as heroin and  

3,4-methylenedioxymethamphetamine. Schedule V consists of substances with an accepted 

medical use and a low abuse potential, like dextromethorphan, which is commonly used in cough 

syrups. Controlled substances can be naturally occurring (e.g., marijuana, mushrooms, opium 

poppy, etc.) or synthetic (e.g., cathinones, heroin, amphetamine, etc.).  

Recently, synthetic designer drugs have become a concern in the United States. These 

drugs are synthesized with a slightly different molecular structure than already existing 

scheduled compounds in order to mimic the pharmacological effects while avoiding legal 

ramifications. There are several different designer drug classes: phenethylamines, cannabinoids, 

phencyclidines, tryptamines, piperazines, pipradols, and tropane alkaloids. The two categories 

this research will focus on are phenethylamines and tryptamines. The United Nations Office on 

Drugs and Crime reported that phenethylamines were only outnumbered by synthetic 

cannabinoids and cathinones in terms of the proportion of new psychoactive substances 

identified in 2017 (Figure 1.1).4 Of the total number of new compounds identified, 18% were 

 

1 

phenethylamines and 6% were tryptamines. The core structures for phenethylamines and 

tryptamines, as well as the subclasses for phenethylamines, are shown in Figure 1.2. 

Figure 1.1 Proportion of new psychoactive substances4 

 

 

Figure 1.2 Core structure of A) phenethylamines and B) tryptamines as well as the 

phenethylamine subclasses: C) APB-phenethylamines, D) 2C-phenethylamines, and E) NBOMe-

phenethylamines where R indicates common substituent site 

 

 

2 

1%2%16%18%3%3%32%19%6%AminoindanesPhencyclidine-type substancesOther substancesPhenethylaminesPiperazinesPlant-based substancesSynthetic cannabinoidsSynthetic cathinonesTryptaminesONH2NH2NHNH2A)B)C)D)E)OONH2R1R2OONHOR1R2The Scientific Working Group for the Analysis of Seized Drugs (SWGDRUG) sets 

standards for forensic analysis of controlled substances based on the specificity of the data 

obtained from the analytical method.5 SWGDRUG provides recommendations for identification 

that categorize analysis methods into three groups: A, B, and C. Category A includes methods 

that provide structural information, such as infrared (IR) spectroscopy and mass spectrometry 

(MS), and only requires the use of one other technique from Category B or C for identification. 

If a Category A technique is not used, three other techniques, two of which must be from 

Category B, are required. Because MS, a category A technique, couples efficiently with gas 

chromatography (GC), a category B technique, analysis by GC-MS fulfills SWGDRUG 

requirements. Consequently, GC-MS with electron ionization and a single quadrupole mass 

analyzer is most commonly used for the analysis and identification of controlled substances in 

forensic laboratories.  

Using GC-MS, a submitted sample is dissolved in an organic solvent, injected into the 

GC, ionized using electron ionization, separated by mass-to-charge (m/z) ratio using a single 

quadrupole mass analyzer, and detected. This approach yields a chromatogram with retention 

times and a mass spectrum with nominal, or integer, mass information. The mass spectrum of the 

analyte is compared to a reference library of standards and identified via visual comparison of 

the spectra. While this method of visual comparison works well for already scheduled drugs, it 

fails with emerging analogs of synthetic designer drugs for which reference materials are not 

currently readily available. This problem necessitates a new method for the classification of 

emerging designer drugs that uses mass spectral characteristics to assign an unknown to a 

particular class and/or subclass. Methods have been developed for the analysis of designer drugs, 

most of the work is not practical as the instrumentation used [e.g. gas chromatography-time-of-

 

3 

flight mass spectrometry (GC-TOFMS), liquid chromatography-mass spectrometry (LC-MS), 

etc.] is not available in most forensic laboratories.6-8 Thus, research needs to be focused on the 

development of methods using a GC-MS with a single quadrupole mass analyzer, as this is the 

instrument already available and being utilized in controlled substances analysis. 

1.2 Single Quadrupole Mass Spectrometry 

Although GC-MS was used in this work to analyze samples, all data analysis was 

performed on the mass spectral data alone. The most common mass spectrometer employed in 

forensic laboratories for the analysis of controlled substances uses electron ionization followed 

by a single quadrupole mass analyzer where the ions are separated by the m/z value. Electron 

ionization ionizes gas phase molecules by bombarding them with a beam of electrons originating 

from a tungsten filament (Figure 1.3).9 The electrons are accelerated to 70 eV perpendicularly 

towards the analyte molecules, where ionization is induced by the fluctuating electric field 

created by the close proximity of the electrons at low pressure. Placed directly opposite the 

cathode, the electron trap serves as the anode with a slightly positive charge to attract the 

electrons being emitted from the filament. The magnets create a weak magnetic field parallel to 

the direction of the electrons in order to increase ionization efficiency by inducing a spiral path 

motion. A repeller plate is also placed opposite the ion focusing lenses to repel the ions to the 

lenses and subsequently, to the mass analyzer.  

 

 

4 

Figure 1.3 Diagram of an electron ionization source 

 

 

When using electron ionization, excess energy following initial ionization to form the molecular 

ion causes extensive fragmentation to occur, resulting in a spectrum with many peaks. For this 

reason, electron ionization is considered a hard ionization technique that creates a unique spectra 

for the analyte, under these conditions, that allows for identification. Under these conditions, 

mostly singly positively charged ions are formed which exit the ionization source and reach the 

mass analyzer to be sorted in terms of m/z ratio.  

The mass analyzer used in forensic laboratories consists of a single quadrupole which 

provides nominal, or integer, mass information. A quadrupole mass analyzer consists of four 

cylindrical rods placed parallel to each other (Figure 1.4).9 Opposing rods are connected via a 

voltage supply and supplied with a constant direct current (DC) potential, either positive or 

negative. In conjunction with the DC potential, a radio frequency (RF) voltage is also applied in 

the form of an alternating potential. Due to the radio wave of the RF voltage, ideally only ions 

 

5 

70 voltsMagnetMagnete-e-e-e-e-M(analyte)e-e-e-e-Mass AnalyzerM+*Electron Trap (Anode)Filament (Cathode)with a single m/z ratio will have a stable trajectory through the rods, allowing them to reach the 

detector. Ions of other m/z ratios will not follow a stable trajectory and subsequently will strike 

one of the rods, becoming neutralized and pumped away, failing to reach the detector. The 

applied DC and RF potentials are then scanned at a constant rate across the m/z range of interest 

and thus create a mass spectrum with peaks at a multitude of m/z values.  

Figure 1.4 Diagram of a quadrupole mass analyzer demonstrating ions with stable (blue) and 

 

unstable (red) trajectories 

 

 

The most common detector used in mass spectrometry is the continuous dynode electron 

multiplier due to the high signal gain with low noise (Figure 1.5).10 A high negative voltage 

(about -1.5 kV) is applied to the opening of the dynode to attract the positively charged ions from 

the mass analyzer. The inside of the dynode contains a resistive conductive surface mounted on 

glass to create a gradual potential drop between the opening of the dynode and the back, which is 

grounded. When the positively charged ion strikes the inner wall of the dynode, secondary 

electrons are emitted and accelerate down the dynode. The number of electrons increases 

exponentially as the emitted secondary electrons continue to strike the walls as they are 

accelerated toward the end of the dynode. The gain of electrons in a continuous dynode electron 

multiplier is on the order of magnitude of 105.  

 

6 

IonSource++--DetectorFigure 1.5 Diagram of a continuous dynode electron multiplier 

 
The output from the detector is a plot of intensity as a function of m/z. Because electron 

ionization is used, the mass spectrum will contain many peaks corresponding to individual 

 

fragment ions. This extensive fragmentation allows for identification due to the unique spectra 

per compound under a certain set of conditions. However, rapid identification in forensic 

laboratories is often done by visual comparison of the unknown spectrum with a spectrum of a 

known reference material. For newly emerging synthetic designer drugs with no available 

reference standard, drug analysts have no reference spectrum for comparison, and identification 

becomes more challenging. 

 
1.3 Multivariate Statistical Procedures 
 
 

Multivariate statistical procedures such as principal components analysis (PCA) and 

linear discriminant analysis (LDA) have been applied forensically in various fields such as 

source tracing MDMA tablets and in fire debris analysis to identify ignitable liquids in complex 

matrices.11-12 Bonetti recently applied LDA to a set of fluoromethcathinone and fluorofentanyl 

isomers, using PCA as a variable selection method.13 While success was shown for 

differentiating isomers in Bonetti's work, the focus of this thesis was on the classification of 

structural subclasses of phenethylamines and tryptamines. 

 

 

7 

-1500 VGround++++++Resistive conductivesurface1.3.1 Principal Components Analysis 
 

 

PCA is a dimensionality-reducing procedure in which linear combinations of the original 

variables are created to describe natural variance within a data set.14 PCA is an unsupervised 

method, meaning that the method has no group knowledge and identifies natural groups based on 

variance. Based on patterns within the data set, new axes are created called principal components 

(PCs). When the number of variables outnumbers the number of samples, the maximum number 

of calculated PCs is N-1 where N is equal to the number of variables. Scree plots indicate the 

variance described by each PC as well as the cumulative variance, which should equal 100% 

when all calculated PCs are included. Each PC is a linear combination (Equation 1) of 

uncorrelated original variables (x1 and x2) with a weighting coefficient (a1 and a2) that depends 

on the extent to which that particular variable contributes to the variance. In this equation, p 

represents the total number of variables. 

A score is calculated for a sample on each PC using the mean-centered data incorporated into the 

y = a1x1 + a2x2 +… apxp                                                         (1) 

linear combination for that PC. For example, assume the linear combination in Equation 2 

applies to PC1 and that of Equation 3 applies to PC2 where the numerical values are the 

weighting coefficients and Vn indicates a single variable. 

y = 0.56 (V1) + 0.31 (V2) + 0.19 (V3)                                            (2) 

y = 0.41 (V1) + 0.12 (V2) + 0.05 (V3)                                            (3) 

If the mean-centered data indicated that a sample exhibited the values in Equations 4 and 5, the 

score on PC1 would be 3.88 and the score on PC2 would be -0.97. 

y = 0.56 (-4.0) + 0.31 (6.0) + 0.19 (-1.0) = 3.88                                     (4) 

   y = 0.41 (-4.0) + 0.12 (6.0) + 0.05 (-1.0) = -0.97                                    (5) 

 

8 

These scores are plotted as coordinates for each sample on a two-dimensional plane to create a 

scores plot. Natural grouping is observed to demonstrate variance between different sets of 

samples with regards to the variables. PC1 always describes the most variance within a data set 

while PC2 is drawn orthogonally to PC1, as seen in Figure 1.6. 

Figure 1.6 Example scores plot where x and y are original measurement variables and the green 
dots are samples; PC1 is drawn to describe the most variance in the data set and PC2 is drawn 

 

Loadings values provide a way to determine the contribution of individual variables to 

orthogonally to PC1 

 

each PC. Loadings values can range from -1 to +1 where -1 is the maximum negative 

contribution of a variable to a PC, zero indicates no contribution, and +1 is the maximum 

positive contribution. From these data, loadings plots can be generated to visualize which 

variables contribute most heavily to the positioning of the samples on the scores plot. For mass 

spectral data with hundreds of variables (m/z values), loadings plots are best represented as plot 

of PC loadings versus m/z. In the example loadings plot in Figure 1.7, m/z 154 contributes most 

positively to PC1 while m/z 152 contributes most negatively. In this example, m/z 151 will not be 

 

9 

xyseen in the linear combination for PC1 as it has no contribution. With relation to the scores plot, 

samples that contain m/z 150 or 154 will be positioned more positively on PC1, while samples 

containing m/z 152 or 153 will be positioned more negatively. These contributions establish 

natural groups based on similarities in the mass spectra data.  

 

Figure 1.7 Example loadings plot using mass spectral data 

 

 
1.3.2 Linear Discriminant Analysis 
 
 

LDA is similar to PCA in that it is a multivariate statistical procedure with the aim of 

reducing dimensionality. However, LDA is a supervised technique which means that group 

knowledge is known.14 Because of this, the weighting coefficient to formulate the linear 

combination is selected to maximize between-class variance while minimizing within-class 

 

10 

Loadings (PC1)+10-1150151152153154m/zvariance. Samples can then be assigned a score based on mean-centered raw data with respect to 

the variables, and a scores plot can be generated. The axes are now called linear discriminants 

(LDs) where the total number of LDs is equal to M-1, where M is the number of groups. LDA 

requires that the total number of samples is greater than the number of variables. The general 

equation for a linear discriminant function is the same as PCA (Equation 1) where a represents 

the weighting coefficient, x is the original measurement variable, and p is the number of 

variables represented by a particular LD. 

A centroid is calculated for each group based on the average of each LD for the members 

of that group. A score can then be calculated for a new sample and classified to the group in 

which the sample has the shortest distance to the centroid. For LDA, Mahalanobis distance is 

used as the measurement (Equation 6) where x is the sample measurement, µ is the centroid, and 

C-1 is the sample covariance matrix. 

D  =  √(x − μ)T  ∗   C−1 ∗ (x − μ) 

In the example in Figure 1.8, the new sample would be classified as a member of Group 3 based 

on the defined LDA model. 

 

 

 

 

11 

Figure 1.8 Example LDA scores plot 

 

 

1.4 Research Objectives 
 

The overall goal in this research was to create a statistical model to classify unknown 

synthetic drugs according to structural class or subclass. For classification, LDA was selected 

because it is an objective technique with the ability to use raw mass spectral data to define 

models that can then be applied to new samples for classification. However, the total number of 

variables must be less than that of the total number of samples. With mass spectral data, there are 

hundreds of variables as each m/z value represents a single variable. Therefore, before LDA 

could be utilized, the variable set first needed to be reduced. 

 

For variable reduction, two methods were used. The first, PCA was chosen as an 

unsupervised method to identify the variables that described natural variance in the data set. The 

second method used an informed chemical approach where mass spectra were probed for ions 

characteristic of each class or subclass of drugs. Each method for variable selection is discussed 

in Chapter 3.  

 

12 

Discriminant Function 1Discriminant Function 2Group 1 MemberGroup 2 MemberGroup 3 MemberNew SampleFollowing variable selection, LDA was used to create a model in which new compounds could 

be introduced and classified. Three subclasses of phenethylamines as well as various tryptamines 

were used for model development. Classification success rates were then calculated to determine 

the optimal method for variable selection as well as the overall classification success of the 

models. Chapter 4 presents the results for the optimization and model development of LDA.  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

13 

REFERENCES 

 

14 

REFERENCES 

 

1.  Bureau of Justice Statistics (BJS). Publicly Funded Forensic Crime Laboratories: 

Resources and Services, 2014. https://www.bjs.gov/content/pub/pdf/pffclrs14_sum.pdf  
 

2.  U.S. Drug Enforcement Administration (DEA) Diversion Control Division. National 

Forensic Laboratory Information System 2015 Annual Report. 
https://www.deadiversion.usdoj.gov/nflis/2015_annual_rpt.pdf  
 

3.  U.S. Drug Enforcement Administration (DEA). Drug Scheduling. 

https://www.dea.gov/druginfo/ds.shtml. 

4.  United Nations Office on Drugs and Crime. World Drug Report 2017: Market Analysis 

of Synthetic Drugs, Booklet 4. 
http://www.unodc.org/wdr2017/field/Booklet_4_ATSNPS.pdf  
 

5.  Scientific Working Group for the Analysis of Seized Drugs (SWGDRUG). 

Recommendations. 
http://www.swgdrug.org/Documents/SWGDRUG%20Recommendations%20Version%2
07-0.pdf 
 

6.  Zuba, D., Sekula, K., Identification and characterization of 2,5-dimethoxy-3,4-dimethyl-

B-phenethylamine (2C-G)- A new designer drug. Drug Testing and Analysis 2012, 5, 
549-559. 
 

7.  Sekula, K., Zuba, D. Structural elucidation and identification of a new derivative of 

phenethylamine using quadrupole time-of-flight mass spectrometry. Rapid 
Communications in Mass Spectrometry 2013, 27, 2081-2090.  

8.  Pasin, D., Cawley, A., Bidny, S., Fu, S. Characterization of hallucinogenic 

phenethylamines using high-resolution mass spectrometry for non-targeted screening 
purposes. Drug Testing and Analysis 2017, 9, 1620-1629. 
 

9.  Watson, J.T., Sparkman, O.D. Introduction to Mass Spectrometry, 4th edition; John Wiley 

& Sons, LTD: West Sussex, England, 2007.  
 

10. De Hoffman, E., Stroobant, V. Mass Spectrometry: Principles and Applications, 3rd 

edition; John Wiley & Sons, LTD: West Sussex, England, 2007. 
 

11. Weyermann, C., et al. Drug intelligence based on MDMA tablets data: I. Organic 

impurities profiling. Forensic Science International 2008, 177, 11-16. 
 

12. Waddell, E. E., et al. Progress toward the determination of correct classification rates in 

fire debris analysis. Journal of Forensic Sciences 2013, 58 (4), 887-896. 

15 

 

 

 

 

 

 

13. Bonetti, J. Mass spectral differentiation of positional isomers using multivariate statistics. 

Forensic Chemistry 2018, 9, 50-61. 
 

14. Smith, R. Chemometrics. In Forensic Chemistry: Fundamentals and Applications; 

Siegel, J., Ed; John Wiley & Sons, Ltd: West Sussex, UK, 2016; pp 469-503. 

 

16 

II. Materials and Methods 
 
2.1 Synthetic Designer Drug Reference Materials 
 

Reference materials representative of the phenethylamine and tryptamine classes were 

purchased from Cayman Chemical (Ann Arbor, MI). For the phenethylamine class, six 

aminopropyl benzofuran phenethylamine (APB), ten 2,5-dimethoxyphenethylamine (2C), and 

ten 2,5-dimethoxy-N-(2-methoxybenzyl) phenethylamine (NBOMe) compounds were acquired. 

Ten compounds from the tryptamine class were also purchased for testing. The full chemical 

names for each compound in the data set can be found in the Chapter 2 Appendix. Structures for 

the compounds are seen in Figures 2.1-2.4. Reference materials were prepared by dissolving 1 

mg in 1 mL of methanol (ACS grade, Sigma Aldrich, St. Louis, MO) for analysis by GC-MS.  

 

 

Figure 2.1 Structures of the APB-phenethylamine reference standards (A) 4-APB (B) 5-APB  

(C) 6-APB (D) 7-APB (E) 4-MAPB and (F) 4-EAPB 

 

 

17 

ONH2ONH2ONH2ONH2ONHONHA)B)C)D)E)F)Figure 2.2 Core structure for 2C-phenethylamines 

 

 
Table 2.1 Substituents for 2C-phenethylamines investigated 
 

Compound 

R1 

R2 

Compound 

R1 

R2 

2C-B 
2C-C 
2C-D 
2C-E 
2C-G 

 

-H 
-H 
-H 
-H 
-CH3 

-Br 
-Cl 
-CH3 
-CH2CH3 
-CH3 

2C-H 
2C-I 
2C-N 
2C-P 
2C-T 

-H 
-H 
-H 
-H 
-H 

-H 
-I 
-NO2 
-CH2CH2CH3 
-SCH3 

Figure 2.3 (A) Core structure for NBOMe-phenethylamines and (B) structure for  

 

3,4-DMA-NBOMe 

 

Table 2.2 Substituents for NBOMe-phenethylamines investigated 
 

Compound 
25B-NBOMe 
25C-NBOMe 
25D-NBOMe 
25E-NBOMe 
25G-NBOMe 

R1 

-H 
-H 
-H 
-H 
-CH3 

R2 

-Br 
-Cl 
-CH3 
-CH2CH3 
-CH3 

Compound 

25H-NBOMe 
25I-NBOMe 
25N-NBOMe 
25P-NBOMe 
25T-NBOMe 

-H 
-H 
-H 
-H 
-H 

R1 

R2 

-H 
-I 
-NO2 
-CH2CH2CH3 
-SCH3 

 

 

18 

OONH2R1R2OONHOR1R2OONHOOOA)B)Figure 2.4 Structures of the tryptamine reference standards (A) α-MT (B) α-ET (C) N,N-DMT 

(D) DPT (E) 4-hydroxy DMT (F) 4-hydroxy DET (G) 4-Me-α-ET (H) 5-methoxy DMT  

(I) 5-methoxy DiPT and (J) 5,7 DCT 

 
 
2.2 Gas Chromatography-Mass Spectrometry (GC-MS) Analysis 
 
 

The ten tryptamine reference standards were analyzed by low-resolution (GC-QMS) and 

 

high-resolution instruments (GC-TOFMS). The six APB-phenethylamines (4-APB, 5-APB, 

6-APB, 7-APB, 4-MAPB, and 4-EAPB) and six of the NBOMe-phenethylamines (25B-, 25E-, 

25H-, 25P-, 25T-, and 3,4-DMA-NBOMe) were analyzed on the same GC-QMS instrument. 

Previously collected low-resolution and high-resolution data were used for the 

 

19 

NHNH2NHNH2NHNNHNNHNOHNHNOHNHNH2NHNONHNONHNH2ClClA)B)C)D)G)J)E)F)H)I)2C-phenethylamines (2C-B, C, D, E, G, H, I, N, P, and T) and for 25C-, 25D-, 25G-, and 

25N-NBOMe. Low-resolution data were used for testing in the multivariate statistical procedures 

and high-resolution data were used for elemental formula confirmation.  

The GC-QMS contained an Agilent 7890 gas chromatograph coupled to an Agilent 5975 

mass spectrometer with an Agilent 7693A injector (Agilent Technologies, Santa Clara, CA). A 

DB-5 column was used with a 5%-diphenyl-95%-dimethyl polysiloxane stationary phase and 

dimensions of 30 m x 0.25 mm internal diameter x 0.25 µm film thickness (DB-5, Restek, 

Bellefonte, PA). The injection temperature was 250 °C with 1 µL of sample injected with a 50:1 

split. For the carrier gas, ultra-high purity helium (Airgas, Radnor Township, PA) was used at a 

nominal flow rate of 1 mL/min. The oven temperature was held at 40°C for 1 min and then 

increased at a rate of 20 °C/min until the oven reached 280 °C, with a final hold of 2 min. The 

transfer line was heated at 280 °C. Electron ionization was employed at 70 eV. The scan range 

was set to m/z 35-450, with a scan rate of 2.83 scans/s, to encompass a wide range of ions. The 

temperature of the ion source was 230 °C while the temperature of the mass analyzer was   

150 °C.  

 

The GC-TOFMS that was used to analyze 3,4-DMA-NBOMe, 4-MAPB, 4-EAPB and the 

tryptamines was a Waters GCT Premier (Waters, Milford, MA). This instrument contained an 

Agilent 6890N gas chromatograph coupled to a Waters GCT mass spectrometer and an Agilent 

7683B autosampler. A DB-5 column with the same dimensions and stationary phase as the GC-

QMS was used. The GC parameters and oven program were identical to those used to collect the 

low-resolution spectra. Electron ionization at 70 eV was used with a scan range of m/z 35-450 at 

a scan rate of 5.00 scans/s. The temperature of the ion source was 180 °C and the temperature of 

the mass analyzer was 130 °C. Perfluoro-tertbutylamine (PFTBA) was selected to calibrate the 

 

20 

instrument for good mass accuracy. The resolution of this GC-TOF was up to 7,000 full width at 

half maximum (FWHM).  

 

The high-resolution data for the remaining NBOMe-, 2C-, and APB- reference materials 

were from data collected previously on a LECO Pegasus GC-HRT (LECO Corporation, St. 

Joseph, MI). This instrument contained an Agilent 7890N gas chromatograph with a LECO 

Pegasus HRT mass spectrometer and a Gerstel MPS2 (GERSTEL, Inc., Linthicum Heights, MD) 

autosampler. The stationary phase for the GC column was 1,4-bis(dimethylsiloxyl)phenylene 

dimethyl polysiloxane (Rxi-5sil ms), with dimensions 20 m x 0.18 mm x 0.18 µm (Restek, 

Bellefonte, PA). For each reference material, 1 µL was injected at a temperature of 250 °C with a 

100:1 split ratio. Ultra-high purity helium was used as the carrier gas with a nominal flow rate of 

0.85 mL/min. The oven temperature started at 60 °C for 0.5 min and was ramped to 340 °C at  

36 °C/min, with a final hold of 4 min. The transfer line temperature was 300 °C. Electron 

ionization at 70 eV was used with a scan range of m/z 35-510 at a scan rate of 10 scans/s. The 

temperature of the ion source was 250 °C. PFTBA was used as a calibrant for each analysis and 

the resolution of this instrument was up to 50,000 FWHM.  

 
2.3 Data Processing 
 
 

Low-resolution mass spectra were obtained from GC-QMS by taking a single scan at the 

apex of the peak in the chromatogram. The intensity values were normalized to the base peak 

and imported into Origin (version 9.0 OriginLab Corporation, Northampton, MA), where a 

spectrum was generated.  

 

High-resolution mass spectra were obtained from GC-TOFMS using MassLynx (version 

4.1, Waters) by averaging ten scans where the intensity was on the order of magnitude of 104 but 

no greater than 5x105. A peak separation value of 0.05 was used to generate all mass spectra. 

 

21 

Mass accuracy, in ppm, was determined for all ions in the spectrum above a given threshold 

using an algorithm in MassLynx. The known elemental formula for each reference standard was 

used to indicate the maximum number of each element to minimize the total number of possible 

elemental formulae for each peak. For each spectrum, the accurate mass, intensity, elemental 

formula, and corresponding mass accuracy were exported to Microsoft Excel (Microsoft 

Corporation, Redmond, WA). The intensity values were normalized to the base peak and then 

plotted in Origin.  

 
2.4 Statistical Models in R 
 
 

The ultimate goal in this work was to use linear discriminant analysis (LDA) as a 

classification tool for synthetic designer drugs. LDA requires the presence of more samples than 

variables, which is difficult with mass spectral data sets. For this reason, two methods were used 

in this work for variable selection to obtain variable data sets small enough for LDA. The first 

method, principal components analysis (PCA), was used on the full data set, meaning that the 

full spectrum was used for all reference standards. Scores plots and loadings plots were 

generated based on results from R code, which are listed in the Chapter 2 Appendix. Loadings 

values for each m/z value were normalized to the highest loadings value in the first three 

principal components (PCs). Three variable sets were determined based on m/z values with 

normalized loadings greater than 15%, 20%, and 30%.  

 

The second method of variable selection involved probing mass spectra for characteristic 

ions of each class or subclass of drugs. From this, a variable set was created that could be 

compared with that from the PCA method. 

 

Following variable selection, the spectra from the reference materials were split into a 

training set and a test set. The black box method was used to randomly select the tryptamines and 

 

22 

APB- and 2C-phenethylamines that would constitute the test set. For the tryptamines and 

2C-phenethylamines, three of the ten compounds were selected for the test set and the remaining 

seven stayed in the training set. One APB-phenethylamine was selected for the test set and the 

remaining five became the training set compounds. All of the NBOMe-phenethylamines 

analyzed via GC-QMS for this work were used for the training set, with data collected previously 

used for the test set. In addition, all NBOMe- and APB-phenethylamines as well as tryptamines 

in the training set were analyzed on a different day and those data were used as additional test set 

compounds to increase the robustness of the statistical models. Tables 2.3-2.5 show the 

compounds that comprised the training and test sets. 

 

LDA models were defined using the training set for each of the three variable sets 

resulting from PCA. Leave-one-out cross validation was performed and the classification success 

of each variable set was assessed. The compounds in the test set were classified based on the 

LDA model developed using the most successful variable set, and the classification success rate 

was calculated based on the percentage of correctly classified compounds. A separate LDA 

model was generated using the variables selected by probing the mass spectra for characteristic 

ions. The test set was then incorporated and the classification success rate was calculated. The 

two variable selection methods were compared in terms of the performance to classify known 

reference materials from a test set into the correct class or subclass of designer drug. 

 
 
 
 
 
 
 
 
 
 

 

23 

Table 2.3 Training Set of Reference Standards for Classification Models 
 

2C-Phenethylamines 

APB-

NBOMe-

Phenethylamines 

Phenethylamines 

2C-B 
2C-D 
2C-E 
2C-H 
2C-I 
2C-N 
2C-T 

4-APB 
5-APB 
6-APB 
7-APB 
4-EAPB 

 
 

25B-NBOMe 
25E-NBOMe 
25H-NBOMe 
25P-NBOMe 
25T-NBOMe 

3,4-DMA-NBOMe 

 

Tryptamines 

5,7-DCT 

α-ET 

4-Me-α-ET 

4-hydroxy DMT 
4-hydroxy DET 
5-methoxy DMT 

DPT 

 
Table 2.4 Test Set 1 of Reference Standards for Classification Models 
 

2C-Phenethylamines 

APB-

NBOMe-

Tryptamines 

Phenethylamines 

Phenethylamines 

2C-C 
2C-G 
2C-P 

 
 

4-MAPB 

 
 
 
 

25G-NBOMe 
25C-NBOMe 
25D-NBOMe 
25N-NBOMe 
25I-NBOMe 

N,N-DMT 

α-MT 

5-methoxy DiPT 

 
 

 
Table 2.5 Test Set 2 of Reference Standards for Classification Models 
 

APB-

NBOMe-

Phenethylamines 

Phenethylamines 

4-APB 
5-APB 
6-APB 
7-APB 
4-EAPB 

 
 

25B-NBOMe 
25E-NBOMe 
25H-NBOMe 
25P-NBOMe 
25T-NBOMe 

3,4-DMA-NBOMe 

 

Tryptamines 

5,7-DCT 

α-ET 

4-Me-α-ET 

4-hydroxy DMT 
4-hydroxy DET 
5-methoxy DMT 

DPT 

 

 

 
 

 

24 

APPENDIX

 

25 

Table A.1 Compound abbreviations with full chemical names 
 

Compound 
Abbreviation 

Full Chemical Name 

Compound 
Abbreviation 

4-APB 

4-(2-aminopropyl)benzofuran 

7-APB 

5-APB 

5-(2-aminopropyl)benzofuran 

4-MAPB 

6-APB 

6-(2-aminopropyl)benzofuran 

4-EAPB 

2C-B 

2C-C 

2C-D 

2C-E 

2C-G 

α-MT 

α-ET 

2,5-dimethoxy-4-

bromophenethylamine 

2,5-dimethoxy-4-

chlorophenethylamine 

2,5-dimethoxy-4-

methylphenethylamine 

2,5-dimethoxy-4-

ethylphenethylamine 

3,4-dimethyl-2,5-

dimethoxyphenethylamine 

2C-H 

2C-I 

2C-N 

2C-P 

2C-T 

α-methyl-1H-indole-3-

4-hydroxy 

ethanamine 

α-ethyl-1H-indole-3-

ethanamine 

4-Me-α-ET 

N,N-DMT 

N,N-dimethyl-1H-indole-3-

5-methoxy 

ethanamine 

DMT 

DPT 

N,N-dipropyl-1H-indole-3-

5-methoxy 

ethanamine 

DiPT 

4-hydroxy 

3-[2-(dimethylamino)ethyl]-

DMT 

25D-

NBOMe 

1H-indol-4-ol 

2-(2,5-dimethoxy-4-
methylphenyl)-N-(2-

methoxybenzyl)ethanamine 

2-(4-ethyl-2,5-

5,7-DCT 

25E-NBOMe 

dimethoxyphenyl)-N-(2-

25N-NBOMe 

25G-

NBOMe 

25B-NBOMe 

methoxybenzyl)ethanamine 

2,5-dimethoxy-N-[(2-

methoxyphenyl)methyl]-3,4-
dimethyl-benzeethanamine 

4-bromo-2,5-dimethoxy-N-
[(2-methoxyphenyl)methyl]- 

benzeneethanamine 

25P-NBOMe 

25T-NBOMe 

 

 

26 

Full Chemical Name 

7-(2-aminopropyl) 

benzofuran 

4-(2-methylaminopropyl) 

benzofuran 

4-(2-ethylaminopropyl) 

benzofuran 

2,5-

dimethoxyphenethylamine 

2,5-dimethoxy-4-

iodophenethylamine 

2,5-dimethoxy-4-

nitrophenethylamine 

2,5-dimethoxy-4-

propylphenethylamine 

2,5-dimethoxy-4-

methylthiophenethylamine 
3-[2-(diethylamino)ethyl]-

α-ethyl-4-methyl-1H-indole-

3-ethanamine 

5-methoxy-N,N-dimethyl-
1H-indole-3-ethanamine 
5-methoxy-N,N-bis(1-

methylethyl)-1H-indole-3-

ethanamine 

5,7-dichloro-1H-indole-3-

ethanamine 

2-(2,5-dimethoxyphenyl)-N-

ethanamine 

2-(2,5-dimethoxy-4-
nitrophenyl)-N-(2-

methoxybenzyl)ethanamine 

2,5-dimethoxy-N-[(2-

methoxyphenyl)methyl]-4-
propyl-benzeneethanamine 

2,5-dimethoxy-N-[(2-

methoxyphenyl)methyl]-4-

(methylthio)-

benzeneethanamine 

DET 

1H-indol-4-ol 

25H-NBOMe 

(2-methoxybenzyl) 

Table A.1 Compound abbreviations with full chemical names (cont'd) 
 

Compound 
Abbreviation 

Full Chemical Name 

2-(4-chloro-2,5-

25C-NBOMe 

dimethoxyphenyl)-N-(2-

methoxybenzyl)ethanamine 

Compound 
Abbreviation 

3,4-DMA-
NBOMe 

Full Chemical Name 

3,4-dimethoxy-N-[(2-

methoxyphenyl)methyl]-α-
methyl-benzeethanamine 

 

Table A.2 R Code for Inputting Data 
 

R Code 

getwd() 

Action 

Identifies current working 
directory 

data=read.table("RData.txt",header=TRUE) 

setwd("C:/Users/Amanda/Documents/Forensic_Research/Data")  Sets new working directory 
Imports and names data set 
(header=TRUE if first 
row/column is 
variable/sample name) 
Identifies header names 
Attaches data set for use 

names(data)=c("Mass44","Mass91"…"Type") 
attach(data) 

 
Table A.3 R Code for PCA 
 

R Code 

pca<-prcomp(data,scale=FALSE) 
print(pca) 
summary(pca) 
pca$x 

Action 

Code to perform PCA 
Output for loadings 
Output for scree 
Output for scores 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

27 

Table A.4 R Code for LDA 
 

R Code 

Action 

library(MASS) 

train<-data[1:25,] 
test<-data[26:55,] 

data.lda=lda(Type~Mass44+Mass91+…,data=train) 

data.lda 

data.lda.values<-predict(data.lda,data[1:25,]) 

data.lda.values$x 
lda.pred<-predict(data.lda,test) 

lda.pred$posterior 

lda.pred$x 

Loads R package that contains LDA 
code 
Selects data for training set 
Selects data for test set 
Performs LDA on training set with 
selected variables 
Displays LDA results 
Code to obtain scores for samples in 
training set 
Displays scores for training set samples 
Code to incorporate test set 
Output for probability that a sample in 
test set belongs to each of the groups 
defined for LDA 
Displays scores for test set samples 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

28 

III. Variable Selection for Linear Discriminant Analysis 
 
 

In forensic laboratories, controlled substances that are submitted as powders are often 

analyzed by gas chromatography-single quadrupole mass spectrometry (GC-QMS) followed by a 

comparison to spectra of known reference materials. However, when a novel designer drug is 

submitted, there is often no available reference material and thus no spectrum for comparison. 

The objective in this work was to develop a classification model using linear discriminant 

analysis (LDA) in which new samples could be introduced and subsequently classified into a 

class or subclass of designer drugs. However, as LDA requires a greater number of samples than 

variables, this criterion is not met when all m/z values from mass spectra are used as variables. 

Therefore, appropriate selection of m/z values is critical for classification success; so part of this 

work investigated two different selection methods. The first method used principal components 

analysis (PCA) to identify m/z values that described the greatest variance among the compounds. 

The second method was based on an informed chemical approach, using mass spectral 

interpretation to identify ions characteristic of each class or subclass.  

 
3.1 Variable Selection by Principal Components Analysis (PCA) 
 
 

The first method of variable selection used the unsupervised approach of PCA to 

determine the m/z values that accounted for the most variance among the compounds in the data 

set. The entire set of reference standards, found in the Chapter 2 Appendix, for tryptamines as 

well as APB-, 2C-, and NBOMe-phenethylamines was used for PCA. Using the R codes given in 

the Chapter 2 Appendix, PCA was performed on the full mass spectrum, ranging from m/z 40-

440, for each reference material. A scree plot (Figure 3.1) was first generated to determine the 

contribution of each principal component (PC) to the total variance. By PC35, 100% of the 

variance was described. However, only PC1, 2, and 3 were used for variable selection. Further 

 

29 

PCs were explored but were not considered in the analysis because the variance described by 

each was that of within-class variance rather than between-class variance. The goal of defining 

variables is to use them in LDA to create distinct groups that allow for classification of new 

samples. Using m/z values beyond PC3 would weaken the subsequent LDA models. The first 

three PCs accounted for 44.65% of the total variance with individual contributions of 23.40%, 

10.95%, and 10.29%, respectively.  

 

 

 
Figure 3.1 Scree plot for PCA showing proportion of variance (red) and cumulative proportion 

(black) described by each PC 

PC scores were then calculated for each compound and plotted for PC1 versus PC2 and 

for PC1 versus PC3 (Figure 3.2). From the scores plot of PC1 (23.40%) versus PC2 (10.95%), 

the NBOMe-phenethylamines are positioned positively on PC1, while the remaining compounds 

are positioned negatively. The 2C-phenethylamines and tryptamines overlap on PC2, but gain 

some additional separation on PC1 and PC3. The APB-phenethylamines are separated from the 

 

30 

01020304050607080901000510152025303540Proportion of Variance (%)Principal ComponentVarianceCumulative Proportion2C-phenethylamines on both PC1 and PC3 but score similarly to the tryptamines on all three 

PCs.  

Figure 3.2 Scores plot for A) PC1 vs. PC2 and B) PC1 vs. PC3 

 

 

 

 

31 

-1000100-1000100PC2 (10.95%)PC1 (23.4%)APBNBOMe2CTryptamines-1200120-1000100PC3 (10.29%)PC1 (23.4%)APBNBOMe2CTryptaminesA)B)The loadings for each PC were plotted as a function of m/z (Figures 3.3, 3.4, and 3.5) to 

demonstrate the variables contributing positively and negatively to each PC. The loadings value 

for each m/z value is a measurement of the extent of contribution, where +1 is the maximum 

positive contribution of a variable to a PC and -1 is the maximum negative contribution to a PC. 

For example, in the loadings plot for PC1 (Figure 3.3), m/z 121 contributes most positively and 

m/z 58 contributes most negatively. Therefore, it would be expected that compounds with a high 

intensity of m/z 121 would be positioned more positively on PC1 in the scores plot and 

compounds with a high intensity of m/z 58 would be positioned more negatively. More 

specifically, because the data is mean centered, compounds with an intensity of m/z 121 greater 

than the average will be positioned positively and compounds with an intensity of m/z 58 greater 

than the average will be positioned negatively. 

Figure 3.3 Loadings plot for PC1 

 

 

32 

 

445891121131150-0.4-0.20.00.20.40.60.81.0406080100120140160180200220240260280300320340360380400420440Loadings (PC1)m/zFigure 3.4 Loadings plot for PC2 

 

 

 

Figure 3.5 Loadings plot for PC3 

33 

 

 

4458121130131132198199-0.8-0.6-0.4-0.20.00.20.40.6406080100120140160180200220240260280300320340360380400420440Loadings (PC2)m/z4458121130131165180197198199200201-0.8-0.6-0.4-0.20.00.20.40.6406080100120140160180200220240260280300320340360380400420440Loadings (PC3)m/z 

Based on the scores plots in Figure 3.2, the NBOMe-phenethylamines are well separated 

from the other three groups on PC1. Compounds in this class all contain a base peak at m/z 121 

with high intensity peaks at m/z 91 (~25% relative intensity) and m/z 150 (~40-60% relative 

intensity), as demonstrated by example spectra of NBOMe-phenethylamines in the appendix of 

this chapter. The loadings plot for PC1 shows that the positive contributions are dominated by 

these three ions, which account for the separation of compounds in the NBOMe subclass. 

Two ions that contribute negatively to PC1 are m/z 44 and 58. The isomeric APB-

phenethylamines (4-, 5-, 6-, and 7-APB) contain a base peak at m/z 44 while 4-MAPB has a base 

peak at m/z 58. Several of the compounds in the tryptamine class also contain m/z 58 in high 

intensity such as 4-hydroxy DMT, 5-methoxy DMT, 4-Me-α-ET, N,N-DMT, and α-ET. The 

similarities in these two ions for the APB-phenethylamines and tryptamines account for the 

overlap of these two classes on PC1.  

Based on the loadings plot for PC2 (Figure 3.4), m/z 44 and m/z 131 both contribute 

negatively. The isomeric APB-phenethylamines contain a base peak at m/z 44 as well as a high 

intensity (~30%) peak at m/z 131, positioning 4-, 5-, 6-, and 7-APB negatively on PC2. The two 

tryptamines positioned negatively on PC2 that are grouped closer to the isomeric 

APB-phenethylamines are α-MT and α-ET. Both of these tryptamines have a base peak of m/z 

131 as well as high intensity (~80%) peaks at m/z 130, another significant negative contributor to 

PC2. A small negative contribution from m/z 121 is observed for PC2, positioning the NBOMe-

phenethylamines slightly negatively on this PC. 

Three ions, m/z 58, 198, and 199, contribute positively to PC2. Several tryptamines 

contain m/z 58 at high intensity or as the base peak (Figures A.1 and A.3) and 5,7-DCT contains 

m/z 198 and 199 at high intensity (100 and 99%, respectively). Because of these contributions, 

 

34 

the tryptamines are positioned positively on PC2. A number of 2C-phenethylamines (i.e., 2C-B, 

N, and T) also contain m/z 198 and 199 which, when coupled with the absence of ions 

contributing negatively to PC2, cause the 2C-phenethylamines to position positively on this PC.  

 

The separation between the 2C-phenethylamines and tryptamines on PC3 is due primarily 

to the high number of ions present in 2C compounds that contribute positively to PC3. These 

ions include m/z 65, 180, 197, 198, and 201. Further separation is achieved through m/z 58 which 

contributes negatively to PC3 and is present in a large percentage of the tryptamines. Again, 

overlap occurs between the APB-phenethylamines and tryptamines because of the similarities of 

the compounds with regards to m/z 44, 58, and 131.  

 

Loadings from the first three PCs were normalized to the m/z value with the highest 

loadings value across all three PCs. Three variable data sets were then defined containing m/z 

values with greater than 30%, 20%, and 15% relative loadings. Thresholds were selected based 

on the number of allowed variables for LDA in relation to the total number of compounds that 

would constitute the training set. Any threshold lower than 15% would cause more variables than 

the 25 compounds in the training set. The three variable data sets are shown in Table 3.1. These 

sets of variables will be used to define LDA models in Chapter 4.  

Table 3.1 m/z Values Identified using PCA 
 

>30% 
Relative 
Loadings 
>20% 
Relative 
Loadings 
>15% 
Relative 
Loadings 

 

 

 

 

44  58 

 

121 

44  58  91  121 

 

 

131 

131 

 

 

150 

150 

 

 

 

 

 

 

198  199 

 

198  199  200 

 

 

44  58  91  121  130  131  132  150  165  180  197  198  199  200  201 

35 

PCA is a valuable technique used to identify natural groups in a data set based on 

variance. A scores plot creates a visualization of those natural groups while a loadings plot 

indicates which variables describe the most variance. However, with mass spectral data, the most 

intense peaks are not always representative of ions characteristic of a compound or compound 

class. Base peaks are often low molecular weight ions that are simply the most stable ions 

following fragmentation. For this reason, separation is not always achieved when the spectra of 

compounds from different classes are dominated by low mass ions. Therefore, an approach was 

needed that used informed chemical information to identify ions that were characteristic of the 

different classes under investigation. 

 

 
3.2 Variable Selection based on Characteristic Ions 
 

The second method of variable selection involved probing the low-resolution mass 

spectra for ions considered to be characteristic of the phenethylamines and tryptamines. 

Common ions within classes or subclasses were identified as integer m/z values in the low-

resolution spectra. Using high-resolution spectra acquired from the gas chromatography-time-of-

flight mass spectrometry (GC-TOFMS) analyses, the accurate masses were used to confirm the 

chemical formulae of those ions, to a particular accuracy calculated in parts per million (ppm). 

Formulae were then used to predict the structure of each fragment ion and ions that were 

characteristic of each class were selected based on fragmentation patterns. Given that electron 

ionization was used in both the GC-QMS and GC-TOFMS instruments, the fragmentation of 

compounds was similar in both low-resolution and high-resolution spectra (Figure 3.6). For that 

reason, low-resolution spectra were used for the statistical procedures and high-resolution spectra 

were only used to confirm molecular formulae of ions commonly observed within the designer 

drug classes.  

 

36 

Figure 3.6 A) Low-resolution and B) High-resolution spectra for 5-methoxy DiPT 

 

 

 

 

 

37 

50100150200250300020406080100Relative Intensity (%)m/z50100150200250300020406080100Relative Intensity (%)m/z1147216017472.0812C4H10N-1.4 ppm114.1267C7H16N-14 ppm160.0766C10H10NO-0.6 ppm174.0915C11H12N-2.3 ppmA)B)3.2.1 Characteristic Ions for Tryptamines 
 

 

Low-resolution mass spectra were first probed for the tryptamine class of synthetic 

designer drugs. Ten compounds were used: 4-hydroxy DMT, 4-hydroxy DET, 5-methoxy DMT, 

5-methoxy DiPT, DPT, N,N-DMT, 4-Me-α-ET, 5,7-DCT, α-ET, and α-MT. The tryptamine 

class can be divided into three groups based on substitution: non-aromatically substituted, 

hydroxy-substituted, and methoxy-substituted tryptamines. Full chemical names for these 

compounds are given in the Chapter 2 Appendix.  

3.2.1 Non-Aromatically Substituted Tryptamines 
 

All compounds in the tryptamine class contained m/z 130, which ranged in relative 

intensity from less than 1% to 85%. Compounds with no substitution on the aromatic ring, such 

as α-MT and α-ET, contained m/z 130 in higher abundance (Figure 3.7) compared to 

aromatically-substituted tryptamines. The accurate mass for this peak using high-resolution data 

was 130.0667 Da (7.7 ppm mass accuracy) in α-MT and 130.0658 Da (0.8 ppm) in α-ET.  In 

both cases, this accurate mass corresponded to a chemical formula of C9H8N+, which is 

consistent with the non-aromatically substituted core structure of the tryptamines (Figure 3.8). In 

α-MT and α-ET, the loss of m/z 44 and 58, respectively, from the molecular ion leads to the ion 

with m/z 130. Fragment ions at m/z 44 [M - C2H6N]+ and 58 [ M - C3H8N]+ are a result of α-β 

bond cleavage, separating the aromatic ring from the amine chain. If the amine is tertiary, the 

resulting ion appears as m/z 58 and if the amine is secondary, the fragment ions appears as m/z 

44.  

 

38 

Figure 3.7 Low-resolution spectra for A) α-MT and B) α-ET 

 

 

 

 

39 

50100150200250300020406080100Relative Intensity (%)m/z50100150200250300020406080100Relative Intensity (%)m/zB)A)1301305844131131NHNH2NHNH2 

Figure 3.8 Proposed fragmentation for A) α-MT and B) α-ET to produce m/z 130 

 

Another peak observed in the mass spectra of α-MT and α-ET is m/z 131. While most non-

aromatically substituted tryptamines contain the m/z 130 ion, α-MT and α-ET also contain a base 

peak at m/z 131. The proposed fragmentation involves the loss of m/z 43 and 57, respectively, 

with rearrangement of the bonds to form a radical on the terminal ethyl group, moving the charge 

into the ring (Figure 3.9). The final characteristic peak in the spectra for α-MT and α-ET that 

was selected as a variable is m/z 77. Based on the high-resolution data, this peak corresponds to a 

molecular formula of C6H5

+, which is consistent with a six-membered aromatic ring. Although 

this ion is a common fragment ion present in various classes of aromatic compounds, m/z 77 was 

selected due to the similarities in relative intensity within specific groups (see Chapter 3 

Appendix).  

 

40 

NHNH2NHNH2NHCH2+CH+NH2CH3CH+NH2CH3-44 Da-58 Da174 Da188 Da130 DaA)B)The other two non-aromatically substituted compounds in the data set were N,N-DMT 

and DPT. Both of these compounds contained m/z 130 and m/z 58, previously identified. 

However, m/z 130 was observed at a much lower abundance (~12% relative intensity, see 

Chapter 3 Appendix). N,N-DMT contains the m/z 58 peak; however, DPT exhibits a peak at m/z 

114 due to the dipropyl substituted amine that is left as the fragment ion [C7H16N]+ after the ion 

corresponding to the m/z 130 [C9H8N]+ peak is formed.  

 

 

 

 

 

 

Figure 3.9 Proposed fragmentation for A) α-MT and B) α-ET to produce m/z 131 

 

41 

NHNH2NHNH2-43 Da-57 Da174 Da188 Da131 DaA)B)C+NHCH2CHNH2CH2+CHNH2CH+CH33.2.1 Aromatically-Substituted Tryptamines 
 

For tryptamines that contain a substituent on the aromatic ring, such as a methoxy or 

hydroxy group, the relative intensity of the m/z 130 peak was observed at only 2-3%. However, 

the ion corresponding to the substituted core tryptamine structure was identified for each of those 

compounds and incorporated into the list of variables. The two compounds containing a hydroxy 

group, 4-hydroxy DMT and 4-hydroxy DMT, shared a common peak at m/z 146 (Figure 3.10). 

The accurate mass for this peak from the high-resolution spectra in these two compounds 

confirmed the molecular formula as C9H8NO, which is consistent with a methoxy-substituted 

tryptamine compound (Figure 3.11).  

 

 

42 

Figure 3.10 Low-resolution spectra for A) 4-hydroxy DMT and B) 4-hydroxy DET 
 

 

 

43 

50100150200250300020406080100Relative Intensity (%)m/z50100150200250300020406080100Relative Intensity (%)m/z1461465886A)B)NHNOHNHNOH 

Figure 3.11 Proposed fragmentation for A) 4-hydroxy DMT and B) 4-hydroxy DET to produce 

m/z 146 

 

The second class of aromatically substituted tryptamines, besides the hydroxy-substituted 

compounds, contain a methoxy group on the core ring structure. The two compounds, 5-methoxy 

DMT and 5-methoxy DiPT, also shared a common ion in the low-resolution spectra at m/z 160 

(Figure 3.12). Upon analysis with the GC-TOFMS, the molecular formula for this ion was 

confirmed as C10H10NO. A loss of the amine chain with an α-β bond cleavage would result in the 

core ring structure with a methoxy substitution, a fragment ion with the molecular formula of 

C10H10NO and m/z of 160 (Figure 3.13).  

 

 

 

 

44 

NHNOHNHNOHNHCH2+OH146 Da204 Da232 DaCH2+NCH2+N-58 Da-86 DaA)B)Figure 3.12 Low-resolution spectra for A) 5-methoxy DiPT and B) 5-methoxy DMT 

 

45 

50100150200250300020406080100Relative Intensity (%)m/z50100150200250300020406080100Relative Intensity (%)m/z16016058114B)A)NHNONHNOFigure 3.13 Proposed fragmentation for A) 5-methoxy DiPT and B) 5-methoxy DMT 

 

 

To this point, ions had been selected for tryptamines with no aromatic substitution and 

 

for tryptamines with either a hydroxy- or methoxy-substituted aromatic ring. However, another 

reference standard, 4-Me-α-ET, contains a methyl group on the ring. Due to the methyl group, 

the relative intensity of the m/z 130 ion is smaller because an additional loss of a CH3 group is 

required to produce the ion with a mass of 130 Da. However, 4-Me-α-ET contains m/z 58 and 

m/z 146 in relatively high abundance. The m/z 58 ion occurs as a result of the α-β bond cleavage, 

which forms the high abundance ion at m/z 144. The second characteristic ion occurs at m/z 146, 

which has the same nominal mass as the ion produced in the hydroxy-substituted tryptamines, 

but with a different molecular formula. This ion in 4-Me-α-ET corresponds to a formula of 

C10H12N+, which is consistent with the core structure of a methyl-substituted tryptamine. 

 

46 

NHNONHNONHCH2+OCH2+N160 Da274 Da218 Da-114 Da-58 DaCH2+NA)B) 

The final tryptamine reference material included in the variable selection process was 

5,7-DCT which contains the core tryptamine structure with two chlorine substituents on the 

aromatic ring. Because the ring is substituted and the amine on the alkyl chain is primary,  

5,7-DCT does not contain m/z 44, 58, or 130 in high abundance. This compound also does not 

exhibit a hydroxy-, methoxy-, or methyl-substituted aromatic ring meaning that m/z 146 and m/z 

160 are not present. The base peak for 5,7-DCT occurs at m/z 198 with m/z 199 at a relative 

intensity of 99% (Figure 3.14). The peak at m/z 199 occurs as a result of the α-β bond cleavage 

resulting in a loss of CH4N, which has a mass of 30 Da (Figure 3.15). 

Figure 3.14 Low-resolution spectrum for 5,7-DCT 

 

 

 

47 

50100150200250300020406080100Relative Intensity (%)m/z199198NHNH2ClClFigure 3.15 Proposed fragmentation for 5,7-DCT to form m/z 199 

 

 

Based on analysis of mass spectra and elemental formulae of fragment ions, confirmed 

through high-resolution data, eight ions were selected as diagnostic variables for LDA with 

regards to the tryptamine class (m/z 44, 58, 77, 130, 131, 146, 160, 199).  

 

 
3.2.3 Characteristic Ions for APB-Phenethylamines 
 

Of the six APB-phenethylamine reference materials, four (4-, 5-, 6-, and 7-APB) are 

isomeric compounds with a base peak at m/z 44 and a high relative intensity (~30%) peak at m/z 

131 [C9H7O]+ (Figure 3.16). These two fragment ions are a result of the α-β bond cleavage also 

observed in the tryptamine class (Figure 3.17).  

 

48 

NHNH2ClClCH2+NH2229 Da199 DaNHCH2+ClCl-30 Da 

 

Figure 3.16 Low-resolution spectrum of 4-APB 

 

Figure 3.17 Proposed fragmentation of 4-APB to form m/z 131 

49 

 

 

 

50100150200250020406080100Relative Intensity (%)m/z13144ONH2OCH2+CH2+NH2175 Da131 Da-44 DaONH2The remaining two APB-phenethylamines, 4-MAPB and 4-EAPB, have the same core 

APB structure but contain an additional methyl or ethyl group, respectively, on the amine. When 

these two compounds fragment, [C9H7O]+ (m/z 131) is formed by the loss of either 58 Da from 

4-MAPB or 72 Da from 4-EAPB (Figure 3.18-3.19). The final ion present at high relative 

abundance (~7-18%) in the APB-phenethylamines is m/z 77, whose molecular formula (C6H5

+) 

corresponds to an aromatic ring.  

 

50 

Figure 3.18 Low-resolution spectra for A) 4-MAPB and B) 4-EAPB 

 

 

 

 

51 

50100150200250020406080100Relative Intensity (%)m/z50100150200250020406080100Relative Intensity (%)m/z5872131131B)A)ONHONHFigure 3.19 Proposed fragmentation for A) 4-MAPB and B) 4-EAPB to form m/z 131 

 

 

After probing the spectra for the APB-phenethylamines, four ions were selected as 

 

characteristic of this class (m/z 44, 58, 77, and 131).  

3.2.4 Characteristic Ions for NBOMe-Phenethylamines 
 

 

 

The NBOMe-phenethylamines have very distinguishable spectra with characteristic 

peaks at m/z 91 [C7H7]+, 121 [C8H9O]+, and 150 [C9H12NO]+, with m/z 121 consistently present 

as the base peak (Figure 3.20). Mass spectra for NBOMe-phenethylamines all contain these three 

peaks, which are fragment ions representative of the methoxybenzyl group attached to the amine 

chain on the core phenethylamine structure (Figure 3.21). Because these ions dominate the 

 

52 

ONHONHCH+NHCH+NHOCH2+189 Da203 Da131 Da-58 Da-72 DaA)B)spectra, substitutions on the core aromatic ring do not affect the prevailing fragmentation 

patterns observed, making them excellent ions for use as diagnostic markers.  

Figure 3.20 Low-resolution mass spectrum for 25T-NBOMe 

 

 

 

53 

50100150200250300350400020406080100Relative Intensity (%)m/z91121150NHOOOSFigure 3.21 Proposed fragmentation for 25T-NBOMe for ions with m/z 91, 121, and 150 

 

 

Based on the dominance of m/z 91, 121, and 150 in spectra for NBOMe-

phenethylamines, these ions were selected as characteristic for this phenethylamine subclass. 

 

 
3.2.2 Characteristic Ions for 2C-Phenethylamines 
 

 

Unlike the APB- and NBOMe-phenethylamines, the 2C-phenethylamines do not contain 

ions that are common to the whole class. Therefore, several m/z values were selected based on 

the common presence in subsets of these compounds, such as alkyl substituted versus non-alkyl 

substituted. The first ion selected as characteristic of the alkyl-substituted 2C-phenethylamines 

was m/z 165 [C10H13O2]+, which is the base peak for 2C-G and also present in high abundance in 

2C-D, 2C-E, and 2C-P (Figure 3.22). The ion with a m/z value of 165 occurs when the amine 

chain cleaves from the ring, leaving a carbocation on the aromatic ring (Figure 3.23).  

 

54 

NHOOSOCH2+NHOCH2+OCH2+150 Da121 Da91 Da347 DaFigure 3.22 Low-resolution mass spectra for A) 2C-E and B) 2C-G 

 

 

 

55 

50100150200250300350020406080100Relative Intensity (%)m/z50100150200250300350020406080100Relative Intensity (%)m/zA)B)165165209209OONH2OONH2Figure 3.23 Proposed fragmentation for A) 2C-E and B) 2C-G to form m/z 165 

 

 

 

 

 

 

 

 

 

 

 

 

 

56 

OONH2OONH2C+OOC+OO209 Da209 Da165 Da165 DaA)B)When fragmentation occurs due to an α-β bond cleavage in the amine chain in a 2C-

phenethylamine with a sulfur substituent, the result is a fragment ion at m/z 198 [C10H14O2S]+ , 

such as in the case of 2C-T (Figure 3.24-3.25).  

Figure 3.24 Low-resolution mass spectrum for 2C-T 

 

 
 

57 

 

50100150200250300350020406080100Relative Intensity (%)m/z227198OONH2S 

 

Figure 3.25 Proposed fragmentation of 2C-T to form m/z 198 

The final ion that appears in the 2C-phenethylamine class is m/z 77 which is indicative of 

a protonated aromatic ring. This ion is in the 2C class in relative intensities ranging from 4-14%.  

The halogenated and nitro-containing 2C-phenethylamines contained m/z 77 at high relative 

intensity than the remaining compounds. However, no other common characteristic ions were 

observed and incorporated into the variable set. Therefore, the three m/z values selected as 

characteristic for the 2C-phenethylamines were m/z 77, 165, and 198.  

 

3.3 Summary 
 

 

Prior to the application of linear discriminant analysis (LDA), a method of variable 

reduction needed to be employed for this data set. The first method used PCA, an unsupervised 

multivariate statistical procedure, to identify m/z values that described the most variance within 

the data set. Using PCA, three variable sets were formed corresponding to the percentage of 

overall contribution of particular m/z values across the first three PCs. The second method of 

variable selection utilized an informed chemical approach where mass spectra of compounds 

from three phenethylamine subclasses, as well as a set of tryptamines, were probed to identify 

 

58 

OONH2SOH+OCH2S227 Da198 Dam/z values that represented ions characteristic of the mass spectra of that class. A comparison of 

the variable sets defined by both methods is shown in Figure 3.26. 

 

Figure 3.26 Venn diagram illustrating similarities and differences between m/z values selected by 

an informed chemical approach versus PCA 

 

Overlap occurred for several m/z values that were present in high abundance in the set of 

reference materials. However, the major distinction with the ions selected exclusively from the 

informed chemical approach was the presence of ions characteristic of the tryptamines (i.e. m/z 

146 and 160). Additionally, m/z 77 was selected by the informed chemical approach to do the 

similarities in intensity between compounds in the same class or subclass. While the presence of 

m/z 77 is not inherently characteristic, the intensity of this ion within a class demonstrates 

characteristic property. The ions selected by PCA exclusively are present in individual 

compounds, rather than being diagnostic of a whole class or subclass. Consequently, ions 

selected only by PCA describe variance within classes rather than between classes. For example, 

 

59 

44589112113013115016519819913218019720020177146160PCAInformed Chemical Approachm/z 200 is observed in high abundance in 5,7-DCT and creates separation of this compound from 

the remainder of the tryptamines. 6-APB becomes separated from the other isomeric APB-

phenethylamines due to the incorporation of m/z 132 selected by PCA because that particular 

compound contains that ion at higher intensity than the remaining compounds. Because the 

objective of this work is to classify new compounds by class or subclass, incorporating more 

characteristic ions is essential. Variable sets derived from both methods were optimized and 

compared in Chapter 4. 

 

 

 

60 

APPENDICES

 

61 

APPENDIX A 

 

Relative Intensity Values of Characteristic Ions for each Class or Subclass 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

62 

Table A.5 Relative Intensity (%) of m/z Values Identified as Characteristic of Tryptamines 
 

 

4-hydroxy 

DMT 

4-hydroxy 

DET 

5-methoxy 

DMT 

5-methoxy 

DiPT 
α-MT 

 

α-ET 

 

4-Me-α-ET 

 

DPT 

 

N,N-DMT 

 

5,7-DCT 

 

44 

1.82 

58 

100 

77 

2.44 

130 

3.84 

131 

1.38 

146 

6.89 

160 

3.16 

0.73 

5.22 

1.21 

2.33 

0.79 

4.93 

2.76 

0.26 

100 

1.97 

3.96 

1.46 

0.97 

9.27 

0.36 

0.25 

0.79 

2.39 

1.17 

1.00 

8.91 

38.82 

0 

12.39 

85.20 

100 

3.33 

58.61 

9.07 

74.29 

100 

0 

0 

0 

0 

199 

0 

0 

0 

0 

0 

0 

0.12 

41.94 

2.40 

6.76 

2.00 

12.29 

0.07 

0.02 

0.94 

0.17 

2.42 

11.79 

1.45 

0.07 

0 

0.24 

100 

5.87 

11.56 

1.31 

0.03 

0.03 

0 

0 

0.39 

0 

2.03 

0.83 

0.34 

0 

0.45 

99.17 

 
Table A.6 Relative Intensity (%) of m/z Values Identified as Characteristic of APB-
Phenethylamines 
 

77 

18.19 

17.27 

17.72 

17.45 

9.34 

7.32 

131 

27.36 

29.85 

33.42 

29.51 

14.51 

15.22 

 

4-APB 

5-APB 

6-APB 

7-APB 

4-MAPB 

4-EAPB 

44 

100 

100 

100 

100 

0.40 

12.54 

 
 
 
 

 

58 

0.17 

0 

0 

0.20 

100 

0.08 

63 

Table A.7 Relative Intensity (%) of m/z Values Identified as Characteristic of NBOMe-
Phenethylamines 
 

 

25B-NBOMe 

25C-NBOMe 

25D-NBOMe 

25E-NBOMe 

25G-NBOMe 

25H-NBOMe 

25N-NBOMe 

25P-NBOMe 

25T-NBOMe 

3,4-DMA-NBOMe 

91 

24.00 

24.65 

24.76 

25.42 

23.26 

26.19 

28.54 

23.08 

23.52 

31.31 

121 

100 

100 

100 

100 

100 

100 

100 

100 

100 

100 

150 

63.01 

55.93 

39.75 

41.83 

47.30 

48.12 

51.20 

46.41 

36.31 

0.17 

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 

64 

Table A.8 Relative Intensity (%) of m/z Values Identified as Characteristic of 2C-
Phenethylamines 
 

 

2C-B 

2C-C 

2C-D 

2C-E 

2C-G 

2C-H 

2C-I 

2C-N 

2C-P 

2C-T 

77 

14.46 

10.14 

7.80 

5.81 

7.94 

8.84 

6.30 

9.38 

4.76 

4.09 

 

165 

0.49 

0 

18.24 

49.76 

100 

0.94 

0.38 

0.98 

49.79 

1.22 

198 

0 

0.93 

0 

0 

0 

0 

0 

10.33 

0 

100 

65 

 
 

 

APPENDIX B 

 

Low-Resolution Mass Spectra of 2C-, APB-, and NBOMe-Phenethylamines and Tryptamines 

 

 

Investigated 

 

66 

 

 

Figure A.1 Low-resolution spectrum of 4-Me-α-ET 

Figure A.2 Low-resolution spectrum of DPT 

 

67 

 

50100150200250300020406080100Relative Intensity (%)m/z50100150200250300020406080100Relative Intensity (%)m/zFigure A.3 Low-resolution spectrum of N,N-DMT 

 

 

Figure A.4 Low-resolution spectrum of 5-APB 

 

68 

 

50100150200250300020406080100Relative Intensity (%)m/z50100150200250020406080100Relative Intensity (%)m/zFigure A.5 Low-resolution spectrum of 6-APB 

 

 

Figure A.6 Low-resolution spectrum of 7-APB 

 
 

69 

 

50100150200250020406080100Relative Intensity (%)m/z50100150200250020406080100Relative Intensity (%)m/zFigure A.7 Low-resolution spectrum of 25B-NBOMe 

 

 

Figure A.8 Low-resolution spectrum of 25C-NBOMe 

 
 

70 

 

50100150200250300350400020406080100Relative Intensity (%)m/z50100150200250300350400020406080100Relative Intensity (%)m/zFigure A.9 Low-resolution spectrum of 25D-NBOMe 

 

 

Figure A.10 Low-resolution spectrum of 25E-NBOMe 

 
 

71 

 

50100150200250300350400020406080100Relative Intensity (%)m/z50100150200250300350400020406080100Relative Intensity (%)m/zFigure A.11 Low-resolution spectrum of 25G-NBOMe 

 

 

Figure A.12 Low-resolution spectrum of 25H-NBOMe 

 
 

72 

 

50100150200250300350400020406080100Relative Intensity (%)m/z50100150200250300350400020406080100Relative Intensity (%)m/zFigure A.13 Low-resolution spectrum of 25P-NBOMe 

 

 

Figure A.14 Low-resolution spectrum of 25N-NBOMe 

 
 

73 

 

50100150200250300350400020406080100Relative Intensity (%)m/z50100150200250300350400020406080100Relative Intensity (%)m/zFigure A.15 Low-resolution spectrum of 3,4-DMA-NBOMe 

 

 

Figure A.16 Low-resolution spectrum of 2C-B 

 
 

74 

 

50100150200250300350400020406080100Relative Intensity (%)m/z50100150200250300350020406080100Relative Intensity (%)m/zFigure A.17 Low-resolution spectrum of 2C-C 

 

 

Figure A.18 Low-resolution spectrum of 2C-D 

 
 

75 

 

50100150200250300350020406080100Relative Intensity (%)m/z50100150200250300350020406080100Relative Intensity (%)m/zFigure A.19 Low-resolution spectrum of 2C-H 

 

 

Figure A.20 Low-resolution spectrum of 2C-I 

 
 

76 

 

50100150200250300350020406080100Relative Intensity (%)m/z50100150200250300350020406080100Relative Intensity (%)m/zFigure A.21 Low-resolution spectrum of 2C-N 

 

 

 

 

Figure A.22 Low-resolution spectrum of 2C-P 

77 

50100150200250300350020406080100Relative Intensity (%)m/z50100150200250300350020406080100Relative Intensity (%)m/zIV. Linear Discriminant Analysis for the Classification of Synthetic Phenethylamines and 
Tryptamines 
 
 

The overall objective in the work was to create classification models by linear 

discriminant analysis (LDA) but first, the mass spectral variable set needed to be reduced. The 

first method used principal components analysis (PCA) as an approach to find the m/z variables 

that described the variance in the data set, with no group knowledge of the compounds. PCA 

groups the samples based on similarities and differences in the raw mass spectral data without 

prior knowledge of the group to which the compounds belong. Using PCA, three sets of 

variables were identified corresponding to varying levels of contribution to the variance; the 

thresholds were set at 30%, 20%, and 15% relative loadings. The second method used an 

informed chemical approach to identify m/z values that are characteristic of the classes or 

subclasses of compounds comprising the data set. Ions were selected based on common 

fragmentation patterns that yielded similar peaks across the spectra of a particular compound 

class or subclass. Using the informed chemical approach, a set of variables containing 13 ions 

was identified. First, the variable set from the PCA method with the best classification 

performance was determined for comparison with the variable set defined by the informed 

chemical approach. 

 
4.1 Variable Set Selection for PCA 
 
 

To determine the variable set from PCA with the highest classification rate, leave-one-out 

cross validation (CV) was used. The CV procedure involves training the model with all 

compounds in the data set, removing one compound, retraining the model with the remainder of 

the training set, and evaluating the retrained model by reclassification of the removed compound. 

The CV method was performed on each variable set and the classification results obtained 

 

78 

through a table of posterior probabilities, which represent the probability that a compound 

belongs to a particular class or subclass. A posterior probability of 1.0 indicates the highest 

probability that a compound belongs to that particular group. Likewise, a probability of 0 means 

that there is no probability that a compound belongs to that group. If the probability value is 

between 0 and 1, the compound will be classified to the group for which there is the largest 

posterior probability.  

For Variable Set 1 (30% relative loadings), seven m/z values (44, 58, 121, 131, 150, 198, 

and 199) were identified. Using LDA with validation by CV, 30 out of 36 compounds were 

classified correctly, yielding an 83% classification success rate. Misclassified compounds are 

indicated in red in Table A.1. Of the six compounds, two were 4-MAPB and 4-EAPB, which 

were misclassified as tryptamines. The other four APB-phenethylamines are structural isomers 

with a base peak at m/z 44 and a dominant peak at m/z 131. Both 4-EAPB and 4-MAPB contain 

m/z 131 at approximately half the abundance of the isomeric APB-phenethylamines and contain 

m/z 44 in very low abundance. Conversely, 4-MAPB has a base peak at m/z 58, an ion commonly 

observed in tryptamines with tertiary amine chains (Figure 4.1). These major differences account 

for the misclassification of 4-MAPB and 4-EAPB. 

Additionally, 2C-D was misclassified as a tryptamine and DPT and 5-methoxy DiPT 

were misclassified as 2C-phenethylamines. Less separation was achieved between the 2C-

phenethylamines and tryptamines in this set because of the small number of variables, 

accounting for the misclassifications. The table with the posterior probabilities and 

classifications for Variable Set 1 can be found in the Chapter 4 Appendix. 

 

 

79 

Figure 4.1 Mass spectra for A) 4-MAPB and B) 4-APB 

 

 

 

80 

50100150200250020406080100Relative Intensity (%)m/z58131B)A)ONH50100150200250020406080100Relative Intensity (%)m/z13144ONHTable 4.1 depicts the posterior probabilities for Variable Set 2 (20% relative loadings) 

where nine variables were selected (m/z 44, 58, 91, 121, 131, 150, 198, 199, and 200). With this 

variable set, 31 out of 36 compounds were correctly classified when CV was performed, yielding 

an 86% classification success rate. Misclassified compounds are indicated in red in Table 4.1. 

Both 4-EAPB and 4-MAPB were misclassified as tryptamines, similarly to the results found 

when using Variable Set 1.  

Two 2C-phenethylamines (2C-C and 2C-I) were also misclassified as tryptamines. 

Variable Set 2 contains few ions characteristic of 2C-phenethylamines. The two misclassified 

compounds do not contain any of the m/z values in this variable set at high abundance, resulting 

in misclassification. The classification as tryptamines is attributed to the lack of high abundance 

ions that are observed in NBOMe- and APB-phenethylamines, such as m/z 121 or 131. One 

tryptamine, 5,7-DCT, was misclassified with a posterior probability output of NaN ('not a 

number'), which indicates that the posterior probabilities were too small for each class to 

calculate a number. When the same CV method was performed numerous times, the resultant 

classification was different each time for 5,7-DCT, leading to an unreliable class assignment. 

 

 

 

 

 

 

 
 

 

81 

Table 4.1 Posterior Probabilities for CV with 20% Relative Loadings Threshold 

NBOMe 

Tryptamine 

 

 
4-APB 
5-APB 
6-APB 
7-APB 
4-EAPB 
4-MAPB 
25B-NBOMe 
25C-NBOMe 
25D-NBOMe 
25E-NBOMe 
25G-NBOMe 
25H-NBOMe 
25N-NBOMe 
25P-NBOMe 
25T-NBOMe 
3,4-DMA-
NBOMe 
2C-B 
2C-C 
2C-D 
2C-E 
2C-G 
2C-H 
2C-I 
2C-N 
2C-P 
2C-T 
α-MT 
α-ET 
DPT 
N,N-DMT 
4-OH DMT 
4-OH DET 
5-methoxy 
DMT 
5-methoxy 
DiPT 
4-Me-α-ET 
5,7-DCT 

2C 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 

0 

1.0 

APB 
1.0 
1.0 
1.0 
1.0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 

0 

0 

2.67 x 10-2 

2.67 x 10-3 

1.0 
1.0 
1.0 
1.0 

0 
0 
0 
0 

0.234 

1.65 x 10-3 

1.0 
1.0 
1.0 

1.15 x 10-7 

0 

6.25 x 10-5 
6.72 x 10-9 
1.53 x 10-7 
1.09 x 10-4 

0 
0 
0 

0.438 

0 

3.50 x 10-3 
2.85 x 10-2 
4.25 x 10-2 
3.45 x 10-3 

3.51 x 10-9 

2.86 x 10-2 

4.82 x 10-5 

3.15 x 10-3 

5.91 x 10-5 

4.24 x 10-3 

0 
0 
0 
0 
0 
0 
1.0 
1.0 
1.0 
1.0 
1.0 
1.0 
1.0 
1.0 
1.0 

1.0 

0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 

0 

0 

0 

0 
0 
0 
0 
1.0 
1.0 
0 
0 
0 
0 
0 
0 
0 
0 
0 

0 

0 

Class 
APB 
APB 
APB 
APB 

Tryptamine* 
Tryptamine* 

NBOMe 
NBOMe 
NBOMe 
NBOMe 
NBOMe 
NBOMe 
NBOMe 
NBOMe 
NBOMe 

NBOMe 

2C 

0.971 

Tryptamine* 

0 
0 
0 
0 

2C 
2C 
2C 
2C 

0.765 

Tryptamine* 

0 
0 
0 

0.562 

1.0 

0.996 
0.971 
0.957 
0.996 

0.971 

0.997 

0.996 
NaN 

2C 
2C 
2C 

Tryptamine 
Tryptamine 
Tryptamine 
Tryptamine 
Tryptamine 
Tryptamine 

Tryptamine 

Tryptamine 

Tryptamine 
Unreliable** 

NaN 

NaN 

NaN 

*Entries in red indicate misclassified compounds 
**'NaN' indicates that the posterior probabilities were too small for class assignment 
 

 

82 

Variable Set 3 (15% relative loadings) included 15 m/z values (44, 58, 91, 121, 130, 131, 

132, 150, 165, 180, 197, 198, 199, 200, and 201). This variable set yielded an 83% classification 

success rate, with 30 of the 36 compounds correctly classified in CV. Misclassified compounds 

are indicated in red in Table A.3. With Variable Set 3, 4-MAPB was still misclassified as a 

tryptamine but 6-APB was also misclassified as a tryptamine. The third variable set incorporated 

m/z 132 which is found at a higher intensity in 6-APB than the remaining isomeric APB-

phenethylamines, leading to the misclassification. The inclusion of m/z 132 created less 

separation for the APB-phenethylamines from both tryptamines and 2C-phenethylamines. As a 

result, 2C-N and α-ET were misclassified as APB-phenethylamines. Similar to Variable Set 2, 

2C-C was misclassified as a tryptamine using Variable Set 3. The table with the posterior 

probabilities and classifications for Variable Set 3 can be found in the Chapter 4 Appendix. 

Based on the performance of each of these variable sets during CV, the set with the 

threshold set at 20% relative loadings was selected to move forward in the analysis due to the 

lower number of misclassifications. 

 

4.2 LDA using Selected Variable Set from PCA  
 
 

Following selection of the best performing variable set from PCA, the compounds were 

split into a training and test set. The compounds in the training set were randomly selected via 

the black box method to contain seven out of the ten compounds in each of the tryptamine and 

2C-phenethylamine groups, respectively. For the APB-phenethylamines, five of the six 

compounds were selected by the black box method to comprise the training set. For the NBOMe-

phenethylamines, only six compounds were analyzed on the same day and instrument as the 

other compound classes in the training set, so all six were included as training set compounds. 

Four unique NBOMe-phenethylamines that were analyzed on a different day and instrument 

 

83 

were selected for use in the test set. A majority of the compounds were also analyzed on a 

different day. These spectra were placed into the test set to make the model more robust. A table 

of the training and test set compounds can be found in Chapter 2. 

 

LDA was performed on the training set to define the model before the introduction of the 

test set. Each test set compound was classified to the compound class to which the distance of the 

sample and group centroid was shortest. The scores plots are shown in Figure 4.2. 

 

 

 

Figure 4.2 LDA scores plot showing A) LD1 vs. LD2 and B) LD1 vs. LD3 using PCA as the 

variable selection method, where the boxes to the right indicate zoomed-in regions 

 

84 

0.52-40040LD3 (0.84%)LD1 (97.84%)APB Training SetAPB Test SetNBOMe Training SetNBOMe Test SetTryptamine Training SetTryptamine Test Set2C Training Set2C Test SetCentroids-0.7-20LD2 (1.32 %)LD1 (97.84%)APB Training SetAPB Test SetNBOMe Training SetNBOMe Test SetTryptamine Training SetTryptamine Test Set2C Training Set2C Test SetCentroids-505-40040LD2 (1.32 %)LD1 (97.84%)APB Training SetAPB Test SetNBOMe Training SetNBOMe Test SetTryptamine Training SetTryptamine Test Set2C Training Set2C Test SetCentroids-505-40040LD3 (0.84%)LD1 (97.84%)APB Training SetAPB Test SetNBOMe Training SetNBOMe Test SetTryptamine Training SetTryptamine Test Set2C Training Set2C Test SetCentroidsA)B)α-MT4-MAPB2C-Cα-MT2C-C4-EAPB4-MAPB4-EAPBThe LDA scores plots (Figure 4.2) show four distinct groups corresponding to the four 

classes or subclasses of compounds, with a few exceptions. The position of individual 

compounds can be explained using a plot of the coefficients of linear discriminants for each m/z 

value (Figure 4.3). The coefficient of the linear discriminant is the value associated with the 

weighting of each m/z variable, analogous to PC loadings. A higher magnitude coefficient means 

that the variable contributes more to the linear discriminant (LD) function. Compounds 

containing that particular m/z value greater than the average will score more positively or 

negatively on that particular LD.   

The highest positive contribution to LD1 is at m/z 121, followed by m/z 91, both shown in 

dark blue on Figure 4.3. These are both ions characteristic of the NBOMe-phenethylamines 

which explains the highly positive positioning of this class on LD1. The three remaining groups 

score similarly on LD1 due to the lower intensity of these two ions, as the relative intensity data 

are mean centered. With mean centering, if a compound contains a peak at an intensity lower 

than the group average, the intensity value becomes negative for that compound.  

On LD2, m/z 44 contributes negatively. While this ion is the base peak for the isomeric 

APB-phenethylamines, 4-EAPB contains this peak at a much lower intensity. The isomeric APB-

phenethylamines positioned more negatively on LD2 than the other three groups as well as  

4-EAPB. The isomeric APB-phenethylamines also position negatively on LD3 due to the base 

peak at m/z 44 for these four compounds and the high negative contribution of this ion to LD3.  

 

 

 

85 

Figure 4.3 Coefficients of linear discriminants for the nine variables selected using PCA  

 

 

 

Less separation is observed between the tryptamines and 2C-phenethylamines on LD1 

and LD2 due to the lack of representative ions for these two groups. A number of the 2C-

phenethylamines are positioned more positively on LD1 and LD2 due to the presence of m/z 199 

which contributes positively on these two LDs. However, the close positioning of tryptamines 

and 2C-phenethylamines is mostly due to the lower intensity of ions characteristic of APB- and 

NBOMe-phenethylamines in these compounds.  

With the test set introduced, four compounds are misclassified: α-MT, 4-MAPB, 2C-C, 

and 4-EAPB analyzed on a different day than the training set equivalent (Table 4.2). The first of 

these compounds, α-MT, is misclassified as an APB-phenethylamine because of the negative 

positioning on LD2 than the remainder of the tryptamine class. This scoring can be attributed to 

the presence of m/z 44 in high abundance as well as the base peak at m/z 131 in the spectrum for 

α-MT. Both of these ions are present in high abundance in the spectra for APB-phenethylamines 

 

86 

-0.5-0.4-0.3-0.2-0.100.10.20.30.40.5445891121131150198199200Coefficients of Linear Discriminantsm/zLD1LD2LD3as well. The remaining three misclassified compounds, 4-EAPB, 4-MAPB, and 2C-C, are 

classified as tryptamines. 4-MAPB and 4-EAPB lack the base peak at m/z 44 that the other 

isomeric APB-phenethylamines exhibit. For this reason, the two compounds are positioned less 

negatively on LD2 and LD3 closer to the centroid of the tryptamine class. Finally, 2C-C is 

misclassified as a tryptamine due to the lack of several of the ions present in the 

2C-phenethylamines, such as m/z 199 and 200. Of the nine variables selected using PCA, the 

highest intensity ion for 2C-C is m/z 91 at a relative intensity of 4.9%. Because of the 

underrepresentation of ions characteristic of this compound and the lack of intensity for the two 

ions that are observed in the 2C subclass, 2C-C scores closer to the tryptamine class. APB- and 

NBOMe-phenethylamines have more characteristic ions observed across the entirety of the 

subclass which creates more distinct grouping, explaining why 2C-phenethylamines and 

tryptamines are less well-separated. 

Using PCA as the variable selection method, 26 out of 30 total compounds in the test set 

were correctly classified, yielding an 86.6% classification success rate. Several of the 

misclassifications can be attributed to the lack of a low mass ion observed across a class/subclass 

or the underrepresentation of ions characteristic of a particular class. PCA is an unsupervised 

approach with no group knowledge, thus the variables were selected based on the presence of 

high intensity peaks. However, high intensity peaks in mass spectra are often low mass ions that 

are not characteristic of a group of compounds. As a result, lower intensity and higher molecular 

weight ions that are characteristic become masked and consequently are not selected using the 

PCA approach. For this reason, a second method of variable selection was utilized where mass 

spectra were probed for common ions that also correspond to characteristic fragments.  

 

 

 

87 

4.3 LDA using Variable Set from the Informed Chemical Approach 
 
 

A second LDA model was defined using the same training set as in section 4.2 but with a 

new variable set that was derived using the informed chemical approach discussed in Chapter 3. 

This variable set included 13 m/z values (m/z 44, 58, 77, 91, 121, 130, 131, 146, 150, 160, 165, 

198, and 199) that were determined to be representative of different classes or subclasses of 

compounds in the data set. The scores plot for the LDA model using the informed chemical 

approach is shown in Figure 4.4.  

 

 

 

88 

Figure 4.4 LDA scores plot showing A) LD1 vs. LD2 and B) LD1 vs. LD3 using the informed 

chemical approach as the variable selection method 

 

 
 
 
 
 
 
 
 
 
 

89 

 

-15015-40040LD2 (10.99%)LD1 (86.96%)APB Training SetAPB Test SetNBOMe Training SetNBOMe Test SetTryptamine Training SetTryptamine Test Set2C Training Set2C Test SetCentroids-20020-40040LD3 (2.05%)LD1 (86.96%)APB Training SetAPB Test SetNBOMe Training SetNBOMe Test Set2C Training Set2C Test SetTryptamine Training SetTryptamine Test SetCentroidsA)B)α-ETα-ETN,N-DMTN,N-DMTSimilar to the previous model, the NBOMe-phenethylamines create a distinct group on the 

positive LD1, while the other three groups position similarly on LD1. The tryptamines in the 

training set are separated from the 2C- and APB-phenethylamines on LD2, while the 2C- and 

APB-phenethylamines are separated from each other on LD3. The plot displaying the 

coefficients of linear discriminants is shown in Figure 4.5.  

Figure 4.5 Coefficients of linear discriminants for the 13 variables selected using the informed 

 

chemical approach  

 

The two highest contributors to the first linear discriminant function are m/z 91 and 121, which 

both have a positive coefficient. The positive contribution explains the positioning of the 

NBOMe-phenethylamines, as these are both characteristic ions of this class, with m/z 121 

representing the base peak for all NBOMe compounds analyzed. The remaining three groups 

score similarly on LD1 because they do not contain these peaks at substantial abundances and 

there are few additional ions contributing to this LD.  

 

 

 

90 

-1.2-1-0.8-0.6-0.4-0.200.20.40.644587791121130131146150160165198199Coefficients of Linear Discriminantsm/zLD1LD2LD3The tryptamines position negatively on LD2, distinguishing these compounds from the 

2C- and APB-phenethylamines. The three ions with significant negative contributions to LD2 are 

m/z 130, 146, and 160. The first of these, m/z 130, is present in tryptamine compounds with no 

aromatic substitution, while m/z 146 and 160 are present in tryptamine compounds with hydroxy 

or methoxy substitutions, respectively, on the aromatic ring. While the 2C- and 

APB-phenethylamines position similarly on LD1 and LD2, separation is achieved on LD3. The 

two highest abundance peaks in the APB-phenethylamines are m/z 44 and 131. These ions both 

contribute negatively to LD3 and are largely responsible for the separation of 2C- and 

APB-phenethylamines.  

 

When introducing the test set into the LDA model, 28 compounds out of 30 were 

correctly classified, yielding a 93.3% classification success rate. The first of two compounds that 

was misclassified was α-ET analyzed on a different day and instrument, which was misclassified 

as an APB-phenethylamine due to the high abundance of m/z 131, an ion that is prevalent in the 

APB-phenethylamines. Because of the presence of this ion, α-ET positioned more negatively on 

LD3 than the rest of the tryptamines and had a positive score on LD2.  

The second compound that was misclassified was N,N-DMT, which was classified as a 

2C-phenethylamine. N,N-DMT is a non-aromatically substituted tryptamine. Therefore, it does 

not contain peaks at m/z 146 or 160, which are indicative of a hydroxy or methoxy substituent on 

the aromatic ring. Because a large percentage of the training set for the tryptamines contained 

hydroxy or methoxy substituted compounds, these two ions became important contributors 

towards classifying compounds in this group. As N,N-DMT does not contain m/z 146 and 160, 

this compound was positioned less negatively on LD2 and was classified as a 2C-

phenethylamine. This misclassification could be corrected if the training set was expanded such 

 

91 

that the training set included more non-aromatically substituted tryptamines that contain m/z 130 

at high abundance.  

4.4 Comparison of Variable Selection Methods 
 
 

Two LDA classification models were developed using the same training set but with 

different variables selected by different methods. A common test set was used to determine the 

performance of each model, with classification from both LDA models summarized in Table 4.2. 

The model defined by the m/z values selected by PCA resulted in an 86.6% classification success 

rate, compared to 93.3% when using m/z values derived from an informed chemical approach. 

The compounds misclassified by each model are shown in the third column of Table 4.2, 

followed by the class to which LDA classified each compound. Finally, the correct class is given 

in the final column. 

Table 4.2 Summary of LDA Classification 
 

Method of 

Variable Selection 

Classification 
Success Rate 

Principal 

Components 

Analysis (PCA) 

86.6% 

Informed 
Chemical 

Approach by 
Probing Mass 

Spectra 

93.3% 

Compounds 
Misclassified 

α-MT 

Classification 

Correct 

by LDA 

APB 

Classification 
Tryptamine 

4-EAPB 

Tryptamine 

APB 

4-MAPB 

Tryptamine 

APB 

2C-C 

α-ET 

Tryptamine 

2C 

APB 

Tryptamine 

N,N-DMT 

2C 

Tryptamine 

 
 
 
 
 
 

 

92 

4.5 Summary 
 
 

In Chapter 3, two different methods of variable selection were discussed. The first 

method, PCA, used an objective statistical approach based solely on the natural variance in the 

data set. The second method used a more informed approach where trends in low-resolution 

mass spectra for particular compound classes were identified. High-resolution mass spectra were 

then used to confirm the structure of the ions identified. The first method, PCA, yielded three 

variable sets corresponding to different set thresholds of relative loadings. LDA cross validation 

was performed using the full compound set for each of the three variable sets. It was determined 

that the set with the highest classification success included all m/z values over 20% relative 

loadings. The compounds were then randomly split into a training set and a test set and LDA 

models were defined using the variable set derived from PCA and the set comprised of m/z 

values selected in the informed approach. The model using the PCA variables performed at an 

86.6% classification success rate while the model using the informed approach yielded a 93.3% 

success rate. With PCA, higher intensity ions dominate. However, these ions are most often low 

mass fragments that are common across several compound groups and, therefore, are not 

sufficiently characteristic of each compound class. Incorporating lower intensity ions into the 

variable set that are characteristic of the classes of compounds but are not selected by PCA 

improves the classification when LDA is performed.  

 
 
 
 
 
 

 

 

 

93 

 

 

APPENDIX 

 

94 

Table A.9 Posterior Probabilities for CV with 30% Relative Loadings Threshold 
 

NBOMe 

Tryptamine 

0 
0 
0 
0 

0.907 

1.0 
0 
0 
0 
0 
0 
0 
0 
0 
0 

0 

4.36 x 10-2 

0.466 
0.548 

3.88 x 10-2 

0 
0 

7.96 x 10-2 

0 

1.27 x 10-3 
9.36 x 10-2 

0.559 

1.0 

0.482 
0.972 
0.959 
0.627 

0.972 

0.473 

0.981 
NaN 

Class 
APB 
APB 
APB 
APB 

Tryptamine* 
Tryptamine* 

NBOMe 
NBOMe 
NBOMe 
NBOMe 
NBOMe 
NBOMe 
NBOMe 
NBOMe 
NBOMe 

NBOMe 

2C 
2C 

Tryptamine* 

2C 
2C 
2C 
2C 
2C 
2C 
2C 

Tryptamine 
Tryptamine 

2C* 

Tryptamine 
Tryptamine 
Tryptamine 

Tryptamine 

2C* 

Tryptamine 
Unreliable** 

 
4-APB 
5-APB 
6-APB 
7-APB 
4-EAPB 
4-MAPB 
25B-NBOMe 
25C-NBOMe 
25D-NBOMe 
25E-NBOMe 
25G-NBOMe 
25H-NBOMe 
25N-NBOMe 
25P-NBOMe 
25T-NBOMe 
3,4-DMA-
NBOMe 
2C-B 
2C-C 
2C-D 
2C-E 
2C-G 
2C-H 
2C-I 
2C-N 
2C-P 
2C-T 
α-MT 
α-ET 
DPT 
N,N-DMT 
4-OH DMT 
4-OH DET 
5-methoxy 
DMT 
5-methoxy 
DiPT 
4-Me-α-ET 
5,7-DCT 

2C 
0 
0 
0 
0 

APB 
1.0 
1.0 
1.0 
1.0 

9.32 x 10-2 

1.98 x 10-7 

0 
0 
0 
0 
0 
0 
0 
0 
0 
0 

0 

0 
0 
0 
0 
0 
0 
0 
0 
0 
0 

0 

0.956 
0.533 
0.450 
0.961 

1.0 
1.0 

0.920 

1.0 

0.999 
0.906 

9.03 x 10-4 

0 

0.516 

1.83 x 10-4 
1.69 x 10-4 

0.371 

1.61 x 10-4 
1.41 x 10-3 
1.33 x 10-3 
8.21 x 10-5 

0 
0 

1.75 x 10-4 

0 

2.41 x 10-6 
2.07 x 10-4 

0.440 

0 

1.71 x 10-3 
2.83 x 10-2 
4.09 x 10-2 
2.23 x 10-3 

1.78 x 10-4 

2.80 x 10-2 

0.525 

1.51 x 10-3 

1.54 x 10-2 

3.75 x 10-3 

0 
0 
0 
0 
0 
0 
1.0 
1.0 
1.0 
1.0 
1.0 
1.0 
1.0 
1.0 
1.0 

1.0 

0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 

0 

0 

0 

NaN 

NaN 

NaN 

*Entries in red indicate misclassified compounds 
**'NaN' indicates that the posterior probabilities were too small for class assignment 

 

95 

Table A.10 Posterior Probabilities for CV with 15% Relative Loadings Threshold 
 

NBOMe 

Tryptamine 

0 
0 
1.0 
0 

1.09 x 10-2 

Class 
APB 
APB 

Tryptamine* 

APB 
APB 

 
4-APB 
5-APB 
6-APB 
7-APB 
4-EAPB 
4-MAPB 
25B-NBOMe 
25C-NBOMe 
25D-NBOMe 
25E-NBOMe 
25G-NBOMe 
25H-NBOMe 
25N-NBOMe 
25P-NBOMe 
25T-NBOMe 
3,4-DMA-
NBOMe 
2C-B 
2C-C 
2C-D 
2C-E 
2C-G 
2C-H 
2C-I 
2C-N 
2C-P 
2C-T 
α-MT 
α-ET 
DPT 
N,N-DMT 
4-OH DMT 
4-OH DET 
5-methoxy 
DMT 
5-methoxy 
DiPT 
4-Me-α-ET 
5,7-DCT 

2C 
0 
0 
0 
0 

1.22 x 10-2 
3.46 x 10-5 

APB 
1.0 
1.0 
0 
1.0 

0.977 

4.60 x 10-3 

0 
0 
0 
0 
0 
0 
0 
0 
0 

0 

0.898 
0.298 

1.0 
1.0 
1.0 
1.0 

0.871 
NaN 
1.0 
1.0 
0 
0 
0 
0 
0 
0 

0 

0 

0 

0 
0 
0 
0 
0 
0 
0 
0 
0 

0 

6.55 x 10-4 
3.32 x 10-7 

0 
0 
0 
0 

3.79 x 10-8 

NaN 

0 
0 
0 
1.0 
0 
0 
0 
0 

0 

0 

0 

0 
0 
0 
0 
0 
0 
1.0 
1.0 
1.0 
1.0 
1.0 
1.0 
1.0 
1.0 
1.0 

1.0 

0 
0 
0 
0 
0 
0 
0 

NaN 

0 
0 
0 
0 
0 
0 
0 
0 

0 

0 

0 

0.995 

Tryptamine* 

0 
0 
0 
0 
0 
0 
0 
0 
0 

0 

NBOMe 
NBOMe 
NBOMe 
NBOMe 
NBOMe 
NBOMe 
NBOMe 
NBOMe 
NBOMe 

NBOMe 

0.101 
0.702 

2C 

Tryptamine* 

0 
0 
0 
0 

0.129 
NaN 

0 
0 
1.0 
0 
1.0 
1.0 
1.0 
1.0 

1.0 

1.0 

1.0 
NaN 

2C 
2C 
2C 
2C 
2C 

APB* 

2C 
2C 

Tryptamine 

APB* 

Tryptamine 
Tryptamine 
Tryptamine 
Tryptamine 

Tryptamine 

Tryptamine 

Tryptamine 
Unreliable** 

NaN 

NaN 

NaN 

*Entries in red indicate misclassified compounds 
**'NaN' indicates that the posterior probabilities were too small for class assignment 

 

96 

V. Conclusions and Future Work 
 
5.1 Conclusions 
 

The overall objective in this work was to create a statistical model to classify newly 

emerging synthetic designer drugs into a class or structural subclass. Due to their structural 

similarities, tryptamines and phenethylamines were selected to develop the model using linear 

discriminant analysis (LDA). Variable selection was accomplished using two different methods 

to consider the advantages and disadvantages of each. Principal components analysis (PCA) was 

selected as an objective approach for variable selection to eliminate bias in the selection process. 

However, PCA lacks group knowledge and therefore selects variables based solely on the raw 

data and the contribution of each variable to the natural variance within the data set. When using 

mass spectral data, characteristic ions can be masked by less significant but dominant ions, 

leading to a less specific model.  

The second method used a more informed approach of identifying characteristic ions 

from the mass spectra based on known structural information. While this method allows for the 

identification of ions that are more characteristic of specific classes or subclasses, there exists the 

risk for bias and overtraining the model. Ultimately, the latter method of hand-selecting the 

variables was more successful and a LDA model was established. This model performed at a 

93.3% classification success rate, misclassifying only two of the thirty compounds comprising 

the test set. However, misclassifications could be explained and possibly overcome by a more 

robust and representative data set.  

 

The forensic implications of this work lie in the ability to classify, not identify, an 

unknown compound. Samples submitted to a forensic laboratory are commonly analyzed by GC-

MS using standard operating procedures. If the sample is not consistent with any available 

 

97 

reference materials, the LDA model developed in this work could be used to determine a 

possible designer drug class. Rather than a strict identification that occurs with direct comparison 

to the spectrum of a reference material, the LDA model would determine the likely class or 

subclass to which the submitted sample belongs. With this information, the laboratory would 

have a clearer plan moving forward, whether that includes obtaining a set of reference materials 

that are consistent with the preliminary results or finding literature spectra for various designer 

drugs for comparison purposes. In short, the model developed in this thesis would neither be a 

first nor a final step, but would provide direction in cases where the identity or class of the 

questioned sample is completely unknown. 

 
5.2 Future Work 
 
 

The classification model developed in this work included ten compounds in the 

tryptamine class and 2C-phenethylamine subclass as well as eleven NBOMe-phenethylamines. 

Only six compounds represented the APB-phenethylamines in the development of the LDA 

models. To create a more robust model, more samples should be acquired that represent a wider 

array of synthetic phenethylamines and tryptamines, accounting for different subclasses and 

substituents. The two compounds misclassified by LDA were α-ET and  

N,N-DMT, which are both non-aromatically substituted tryptamines. The misclassification of 

these tryptamines could be attributed to the small number of non-aromatically substituted 

compounds in the training set. With the addition of more tryptamines, this particular issue in the 

model could be resolved. Additionally, the model was only trained with two synthetic designer 

drug classes. More classes should be incorporated leading to one model that could classify any 

possible designer drug that is submitted to the laboratory. 

 

98 

 

For the work presented here, reference materials were analyzed and used to develop the 

classification model. The compounds were acquired as pure samples and were analyzed 

individually. Future work should explore the analysis of mixtures and impure samples where 

concentration becomes a concern. Drug samples are very rarely submitted without some form of 

contamination and are often submitted as mixtures with other controlled substances. To prove 

applicability, the procedure developed in this work needs to be validated in cases where the 

concentration of controlled substance is relatively small. Finally, the introduction of true 

unknowns in the test set could be accomplished through the analysis of street samples. For the 

work presented in this thesis, the 'questioned samples' used as test set compounds were also 

reference standards. The validity of the model could be proved with the use of true street samples 

as an additional test set.   

 

While additional steps need to be taken to enhance and validate the LDA model, it is 

evident that the ability to classify newly emerging synthetic designer drugs has been proven by 

the methods developed in this thesis. Previously unidentifiable samples with no available 

reference material can now be classified by structural group, allowing for additional steps toward 

identification.  

 

 

 

 

 

 

 

 

 

99