CLASSIFICATION OF SYNTHETIC PHENETHYLAMINES AND TRYPTAMINES USING MULTIVARIATE STATISTICAL PROCEDURES By Amanda Lynn Setser A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Forensic Science – Master of Science 2019 ABSTRACT CLASSIFICATION OF SYNTHETIC PHENETHYLAMINES AND TRYPTAMINES USING MULTIVARIATE STATISTICAL PROCEDURES By Amanda Lynn Setser Identification of newly emerging synthetic designer drugs is challenging for forensic drug analysts due to the lack of available reference materials for a visual comparison of mass spectra. The focus in this work was the classification of synthetic drugs according to class or subclass using multivariate statistical procedures, specifically linear discriminant analysis (LDA). Reference materials representative of tryptamines and phenethylamines were analyzed by gas chromatography-mass spectrometry with a single quadrupole mass analyzer. Before LDA models could be developed, variable reduction was necessary. Two methods of variable selection were used. The first method used principal components analysis (PCA) as an objective approach and the second method used an informed chemical approach where mass spectra were probed for ions characteristic of each class or subclass of compounds. Ultimately, two variable sets were compared for classification success rates: the variable set selected by PCA (including nine m/z values) and the variable set selected using the informed chemical approach (including 13 m/z values). Two models were defined using the different variable sets and a common training set. A test set was then introduced to the model for classification. The LDA model using the informed chemical approach performed better, with a 93% classification success rate as opposed to the 86% success rate observed when using the variables selected by PCA. Overall, this research provides a classification procedure for compounds not identifiable by standard methods of comparison to a known reference material. ACKNOWLEDGMENTS I would like to first thank my advisor, Dr. Ruth Smith for her guidance and support on this research project. Without her knowledge, expertise, and moral support, this project would not have been possible. She provided opportunities to grow as a scientist and gain the confidence necessary to complete this thesis. I would also like to thank Dr. Victoria McGuffin for her support and feedback throughout the development of this work. She was always willing to challenge me to take the next step and thus allow me to push the boundaries. I would also like to thank Dr. Scott Wolfe for being so enthusiastic in agreeing to serve on my committee on such short notice. In addition to the personal support I received to complete this project, I am also grateful to the National Institute of Justice for supporting this research through grant number 2015-IJ- CX-K008. Points of view in this thesis are those of the author and do not necessarily represent the official position or policies of the U.S. Department of Justice. Finally, I would like to thank all of my friends, family, and members of the Forensic Chemistry group that helped and supported me along the way. A special thank you to Alex Anstett for her mentorship in the beginning stages of this project and to Natasha Eklund, Emma Stuhmer, and Becca Boyea for making the lab a fun and engaging place to work. Working on this project never had a dull moment because of the three of you and I will forever be grateful for our time together. I would also like to thank my good friend Thomas Diaz for always being there when I needed a good laugh through the stressful times. He always timed his visits exactly when I needed them most. I would also like to extend a huge thank you to Todd Burkhart for being the rock on which I stand while in graduate school. Without his support and encouragement, nothing iii I do would be possible. Last but certainly not least, I would like to thank my family, especially my parents, Theresa and Jeff, and my sister, Olivia. Everything I am and everything I do is a direct result of the love and support of my family. I was grateful to have been given many opportunities by my parents and I wouldn't be the person I am today without them. This thesis is dedicated to them. iv TABLE OF CONTENTS LIST OF TABLES .................................................................................................................... vii LIST OF FIGURES .................................................................................................................. viii I. Introduction .............................................................................................................................. 1 1.1 Synthetic Designer Drugs .................................................................................................. 1 1.2 Single Quadrupole Mass Spectrometry ............................................................................. 4 1.3 Multivariate Statistical Procedures .................................................................................... 7 1.3.1 Principal Components Analysis ............................................................................... 8 1.3.2 Linear Discriminant Analysis ................................................................................. 10 1.4 Research Objectives ........................................................................................................ 12 REFERENCES .......................................................................................................................... 14 II. Materials and Methods ......................................................................................................... 17 2.1 Synthetic Designer Drug Reference Materials ................................................................ 17 2.2 Gas Chromatography-Mass Spectrometry (GC-MS) Analysis ....................................... 19 2.3 Data Processing ............................................................................................................... 21 2.4 Statistical Models in R .................................................................................................... 22 APPENDIX ............................................................................................................................... 25 III. Variable Selection for Linear Discriminant Analysis ......................................................... 29 3.1 Variable Selection by Principal Components Analysis (PCA) ....................................... 29 3.2 Variable Selection based on Characteristic Ions ............................................................. 36 3.2.1 Characteristic Ions for Tryptamines ....................................................................... 38 3.2.1 Non-Aromatically Substituted Tryptamines ................................................... 38 3.2.1 Aromatically-Substituted Tryptamines ........................................................... 42 3.2.3 Characteristic Ions for APB-Phenethylamines ....................................................... 48 3.2.4 Characteristic Ions for NBOMe-Phenethylamines ................................................. 52 3.2.2 Characteristic Ions for 2C-Phenethylamines .......................................................... 54 3.3 Summary ......................................................................................................................... 58 APPENDICES ........................................................................................................................... 61 APPENDIX A: Relative Intesity Values of Characteristic Ions for each Class or Subclass 62 APPENDIX B: Low-Resolution Mass Spectra of 2C-, APB-, and NBOMe-Phenethylamines and Tryptamines Investigated ............................................................................................... 66 IV. Linear Discriminant Analysis for the Classification of Synthetic Phenethylamines and Tryptamines ............................................................................................................................... 78 4.1 Variable Set Selection for PCA ....................................................................................... 78 4.2 LDA using Selected Variable Set from PCA .................................................................. 83 4.3 LDA using Variable Set from the Informed Chemical Approach ................................... 88 4.4 Comparison of Variable Selection Methods ................................................................... 92 4.5 Summary ......................................................................................................................... 93 v APPENDIX ............................................................................................................................... 94 V. Conclusions and Future Work .............................................................................................. 97 5.1 Conclusions ..................................................................................................................... 97 5.2 Future Work .................................................................................................................... 98 vi LIST OF TABLES Table 2.1 Substituents for 2C-phenethylamines investigated ....................................................... 18 Table 2.2 Substituents for NBOMe-phenethylamines investigated .............................................. 18 Table 2.3 Training Set of Reference Standards for Classification Models ................................... 24 Table 2.4 Test Set 1 of Reference Standards for Classification Models ...................................... 24 Table 2.5 Test Set 2 of Reference Standards for Classification Models ...................................... 24 Table A.1 Compound abbreviations with full chemical names .................................................... 26 Table A.2 R Code for Inputting Data............................................................................................ 27 Table A.3 R Code for PCA ........................................................................................................... 27 Table A.4 R Code for LDA ........................................................................................................... 28 Table 3.1 m/z Values Identified using PCA .................................................................................. 35 Table A.5 Relative Intensity (%) of m/z Values Identified as Characteristic of Tryptamines...... 63 Table A.6 Relative Intensity (%) of m/z Values Identified as Characteristic of APB- Phenethylamines ........................................................................................................................... 63 Table A.7 Relative Intensity (%) of m/z Values Identified as Characteristic of NBOMe- Phenethylamines ........................................................................................................................... 64 Table A.8 Relative Intensity (%) of m/z Values Identified as Characteristic of 2C- Phenethylamines ........................................................................................................................... 65 Table 4.1 Posterior Probabilities for CV with 20% Relative Loadings Threshold ....................... 82 Table 4.2 Summary of LDA Classification .................................................................................. 92 Table A.9 Posterior Probabilities for CV with 30% Relative Loadings Threshold ...................... 95 Table A.10 Posterior Probabilities for CV with 15% Relative Loadings Threshold .................... 96 vii LIST OF FIGURES Figure 1.1 Proportion of new psychoactive substances .................................................................. 2 Figure 1.2 Core structure of A) phenethylamines and B) tryptamines as well as the phenethylamine subclasses: C) APB-phenethylamines, D) 2C-phenethylamines, and E) NBOMe- phenethylamines where R indicates common substituent site ........................................................ 2 Figure 1.3 Diagram of an electron ionization source ...................................................................... 5 Figure 1.4 Diagram of a quadrupole mass analyzer demonstrating ions with stable (blue) and unstable (red) trajectories ................................................................................................................ 6 Figure 1.5 Diagram of a continuous dynode electron multiplier .................................................... 7 Figure 1.6 Example scores plot where x and y are original measurement variables and the green dots are samples; PC1 is drawn to describe the most variance in the data set and PC2 is drawn orthogonally to PC1 ........................................................................................................................ 9 Figure 1.7 Example loadings plot using mass spectral data ......................................................... 10 Figure 1.8 Example LDA scores plot ........................................................................................... 12 Figure 2.1 Structures of the APB-phenethylamine reference standards (A) 4-APB (B) 5-APB ...... (C) 6-APB (D) 7-APB (E) 4-MAPB and (F) 4-EAPB ................................................................. 17 Figure 2.2 Core structure for 2C-phenethylamines ....................................................................... 18 Figure 2.3 (A) Core structure for NBOMe-phenethylamines and (B) structure for ........ 3,4-DMA- NBOMe ......................................................................................................................................... 18 Figure 2.4 Structures of the tryptamine reference standards (A) α-MT (B) α-ET (C) N,N-DMT (D) DPT (E) 4-hydroxy DMT (F) 4-hydroxy DET (G) 4-Me-α-ET (H) 5-methoxy DMT (I) 5- methoxy DiPT and (J) 5,7 DCT .................................................................................................... 19 Figure 3.1 Scree plot for PCA showing proportion of variance (red) and cumulative proportion (black) described by each PC ........................................................................................................ 30 Figure 3.2 Scores plot for A) PC1 vs. PC2 and B) PC1 vs. PC3 .................................................. 31 Figure 3.3 Loadings plot for PC1 ................................................................................................. 32 Figure 3.4 Loadings plot for PC2 ................................................................................................. 33 Figure 3.5 Loadings plot for PC3 ................................................................................................. 33 Figure 3.6 A) Low-resolution and B) High-resolution spectra for 5-methoxy DiPT ................... 37 viii Figure 3.7 Low-resolution spectra for A) α-MT and B) α-ET ..................................................... 39 Figure 3.8 Proposed fragmentation for A) α-MT and B) α-ET to produce m/z 130 .................... 40 Figure 3.9 Proposed fragmentation for A) α-MT and B) α-ET to produce m/z 131 .................... 41 Figure 3.10 Low-resolution spectra for A) 4-hydroxy DMT and B) 4-hydroxy DET ................. 43 Figure 3.11 Proposed fragmentation for A) 4-hydroxy DMT and B) 4-hydroxy DET to produce m/z 146 .......................................................................................................................................... 44 Figure 3.12 Low-resolution spectra for A) 5-methoxy DiPT and B) 5-methoxy DMT................45 Figure 3.13 Proposed fragmentation for A) 5-methoxy DiPT and B) 5-methoxy DMT .............. 46 Figure 3.14 Low-resolution spectrum for 5,7-DCT ...................................................................... 47 Figure 3.15 Proposed fragmentation for 5,7-DCT to form m/z 199 ............................................. 48 Figure 3.16 Low-resolution spectrum of 4-APB .......................................................................... 49 Figure 3.17 Proposed fragmentation of 4-APB to form m/z 131 .................................................. 49 Figure 3.18 Low-resolution spectra for A) 4-MAPB and B) 4-EAPB ......................................... 51 Figure 3.19 Proposed fragmentation for A) 4-MAPB and B) 4-EAPB to form m/z 131 ............. 52 Figure 3.20 Low-resolution mass spectrum for 25T-NBOMe...................................................... 53 Figure 3.21 Proposed fragmentation for 25T-NBOMe for ions with m/z 91, 121, and 150 ......... 54 Figure 3.22 Low-resolution mass spectra for A) 2C-E and B) 2C-G ........................................... 55 Figure 3.23 Proposed fragmentation for A) 2C-E and B) 2C-G to form m/z 165 ........................ 56 Figure 3.24 Low-resolution mass spectrum for 2C-T ................................................................... 57 Figure 3.25 Proposed fragmentation of 2C-T to form m/z 198 ..................................................... 58 Figure 3.26 Venn diagram illustrating similarities and differences between m/z values selected by an informed chemical approach versus PCA ................................................................................ 59 Figure A.1 Low-resolution spectrum of 4-Me-α-ET .................................................................... 67 Figure A.2 Low-resolution spectrum of DPT ............................................................................... 67 Figure A.3 Low-resolution spectrum of N,N-DMT...................................................................... 68 Figure A.4 Low-resolution spectrum of 5-APB ........................................................................... 68 ix Figure A.5 Low-resolution spectrum of 6-APB ........................................................................... 69 Figure A.6 Low-resolution spectrum of 7-APB ........................................................................... 69 Figure A.7 Low-resolution spectrum of 25B-NBOMe ................................................................. 70 Figure A.8 Low-resolution spectrum of 25C-NBOMe ................................................................. 70 Figure A.9 Low-resolution spectrum of 25D-NBOMe................................................................. 71 Figure A.10 Low-resolution spectrum of 25E-NBOMe ............................................................... 71 Figure A.11 Low-resolution spectrum of 25G-NBOMe............................................................... 72 Figure A.12 Low-resolution spectrum of 25H-NBOMe............................................................... 72 Figure A.13 Low-resolution spectrum of 25P-NBOMe ............................................................... 73 Figure A.14 Low-resolution spectrum of 25N-NBOMe............................................................... 73 Figure A.15 Low-resolution spectrum of 3,4-DMA-NBOMe ...................................................... 74 Figure A.16 Low-resolution spectrum of 2C-B ............................................................................ 74 Figure A.17 Low-resolution spectrum of 2C-C ............................................................................ 75 Figure A.18 Low-resolution spectrum of 2C-D ............................................................................ 75 Figure A.19 Low-resolution spectrum of 2C-H ............................................................................ 76 Figure A.20 Low-resolution spectrum of 2C-I ............................................................................. 76 Figure A.21 Low-resolution spectrum of 2C-N ............................................................................ 77 Figure A.22 Low-resolution spectrum of 2C-P ............................................................................ 77 Figure 4.1 Mass spectra for A) 4-MAPB and B) 4-APB .............................................................. 80 Figure 4.2 LDA scores plot showing A) LD1 vs. LD2 and B) LD1 vs. LD3 using PCA as the variable selection method, where the boxes to the right indicate zoomed-in regions .................. 84 Figure 4.3 Coefficients of linear discriminants for the nine variables selected using PCA ......... 86 Figure 4.4 LDA scores plot showing A) LD1 vs. LD2 and B) LD1 vs. LD3 using the informed chemical approach as the variable selection method .................................................................... 89 Figure 4.5 Coefficients of linear discriminants for the 13 variables selected using the informed chemical approach ........................................................................................................................ 90 x I. Introduction 1.1 Synthetic Designer Drugs Controlled substances contribute to one-third of the caseloads in forensic laboratories in the United States.1 The National Forensic Laboratory Information System's 2015 Annual Report published by the Drug Enforcement Administration (DEA) estimated that over 1.19 million cases involving controlled substances were submitted to state and local crime laboratories in 2015.2 Controlled substances are classified by the Controlled Substances Act into five schedules based on the potential for abuse, dependency, and accepted medical use.3 Schedule I contains substances with no medical use and a high potential for abuse, such as heroin and 3,4-methylenedioxymethamphetamine. Schedule V consists of substances with an accepted medical use and a low abuse potential, like dextromethorphan, which is commonly used in cough syrups. Controlled substances can be naturally occurring (e.g., marijuana, mushrooms, opium poppy, etc.) or synthetic (e.g., cathinones, heroin, amphetamine, etc.). Recently, synthetic designer drugs have become a concern in the United States. These drugs are synthesized with a slightly different molecular structure than already existing scheduled compounds in order to mimic the pharmacological effects while avoiding legal ramifications. There are several different designer drug classes: phenethylamines, cannabinoids, phencyclidines, tryptamines, piperazines, pipradols, and tropane alkaloids. The two categories this research will focus on are phenethylamines and tryptamines. The United Nations Office on Drugs and Crime reported that phenethylamines were only outnumbered by synthetic cannabinoids and cathinones in terms of the proportion of new psychoactive substances identified in 2017 (Figure 1.1).4 Of the total number of new compounds identified, 18% were 1 phenethylamines and 6% were tryptamines. The core structures for phenethylamines and tryptamines, as well as the subclasses for phenethylamines, are shown in Figure 1.2. Figure 1.1 Proportion of new psychoactive substances4 Figure 1.2 Core structure of A) phenethylamines and B) tryptamines as well as the phenethylamine subclasses: C) APB-phenethylamines, D) 2C-phenethylamines, and E) NBOMe- phenethylamines where R indicates common substituent site 2 1%2%16%18%3%3%32%19%6%AminoindanesPhencyclidine-type substancesOther substancesPhenethylaminesPiperazinesPlant-based substancesSynthetic cannabinoidsSynthetic cathinonesTryptaminesONH2NH2NHNH2A)B)C)D)E)OONH2R1R2OONHOR1R2 The Scientific Working Group for the Analysis of Seized Drugs (SWGDRUG) sets standards for forensic analysis of controlled substances based on the specificity of the data obtained from the analytical method.5 SWGDRUG provides recommendations for identification that categorize analysis methods into three groups: A, B, and C. Category A includes methods that provide structural information, such as infrared (IR) spectroscopy and mass spectrometry (MS), and only requires the use of one other technique from Category B or C for identification. If a Category A technique is not used, three other techniques, two of which must be from Category B, are required. Because MS, a category A technique, couples efficiently with gas chromatography (GC), a category B technique, analysis by GC-MS fulfills SWGDRUG requirements. Consequently, GC-MS with electron ionization and a single quadrupole mass analyzer is most commonly used for the analysis and identification of controlled substances in forensic laboratories. Using GC-MS, a submitted sample is dissolved in an organic solvent, injected into the GC, ionized using electron ionization, separated by mass-to-charge (m/z) ratio using a single quadrupole mass analyzer, and detected. This approach yields a chromatogram with retention times and a mass spectrum with nominal, or integer, mass information. The mass spectrum of the analyte is compared to a reference library of standards and identified via visual comparison of the spectra. While this method of visual comparison works well for already scheduled drugs, it fails with emerging analogs of synthetic designer drugs for which reference materials are not currently readily available. This problem necessitates a new method for the classification of emerging designer drugs that uses mass spectral characteristics to assign an unknown to a particular class and/or subclass. Methods have been developed for the analysis of designer drugs, most of the work is not practical as the instrumentation used [e.g. gas chromatography-time-of- 3 flight mass spectrometry (GC-TOFMS), liquid chromatography-mass spectrometry (LC-MS), etc.] is not available in most forensic laboratories.6-8 Thus, research needs to be focused on the development of methods using a GC-MS with a single quadrupole mass analyzer, as this is the instrument already available and being utilized in controlled substances analysis. 1.2 Single Quadrupole Mass Spectrometry Although GC-MS was used in this work to analyze samples, all data analysis was performed on the mass spectral data alone. The most common mass spectrometer employed in forensic laboratories for the analysis of controlled substances uses electron ionization followed by a single quadrupole mass analyzer where the ions are separated by the m/z value. Electron ionization ionizes gas phase molecules by bombarding them with a beam of electrons originating from a tungsten filament (Figure 1.3).9 The electrons are accelerated to 70 eV perpendicularly towards the analyte molecules, where ionization is induced by the fluctuating electric field created by the close proximity of the electrons at low pressure. Placed directly opposite the cathode, the electron trap serves as the anode with a slightly positive charge to attract the electrons being emitted from the filament. The magnets create a weak magnetic field parallel to the direction of the electrons in order to increase ionization efficiency by inducing a spiral path motion. A repeller plate is also placed opposite the ion focusing lenses to repel the ions to the lenses and subsequently, to the mass analyzer. 4 Figure 1.3 Diagram of an electron ionization source When using electron ionization, excess energy following initial ionization to form the molecular ion causes extensive fragmentation to occur, resulting in a spectrum with many peaks. For this reason, electron ionization is considered a hard ionization technique that creates a unique spectra for the analyte, under these conditions, that allows for identification. Under these conditions, mostly singly positively charged ions are formed which exit the ionization source and reach the mass analyzer to be sorted in terms of m/z ratio. The mass analyzer used in forensic laboratories consists of a single quadrupole which provides nominal, or integer, mass information. A quadrupole mass analyzer consists of four cylindrical rods placed parallel to each other (Figure 1.4).9 Opposing rods are connected via a voltage supply and supplied with a constant direct current (DC) potential, either positive or negative. In conjunction with the DC potential, a radio frequency (RF) voltage is also applied in the form of an alternating potential. Due to the radio wave of the RF voltage, ideally only ions 5 70 voltsMagnetMagnete-e-e-e-e-M(analyte)e-e-e-e-Mass AnalyzerM+*Electron Trap (Anode)Filament (Cathode) with a single m/z ratio will have a stable trajectory through the rods, allowing them to reach the detector. Ions of other m/z ratios will not follow a stable trajectory and subsequently will strike one of the rods, becoming neutralized and pumped away, failing to reach the detector. The applied DC and RF potentials are then scanned at a constant rate across the m/z range of interest and thus create a mass spectrum with peaks at a multitude of m/z values. Figure 1.4 Diagram of a quadrupole mass analyzer demonstrating ions with stable (blue) and unstable (red) trajectories The most common detector used in mass spectrometry is the continuous dynode electron multiplier due to the high signal gain with low noise (Figure 1.5).10 A high negative voltage (about -1.5 kV) is applied to the opening of the dynode to attract the positively charged ions from the mass analyzer. The inside of the dynode contains a resistive conductive surface mounted on glass to create a gradual potential drop between the opening of the dynode and the back, which is grounded. When the positively charged ion strikes the inner wall of the dynode, secondary electrons are emitted and accelerate down the dynode. The number of electrons increases exponentially as the emitted secondary electrons continue to strike the walls as they are accelerated toward the end of the dynode. The gain of electrons in a continuous dynode electron multiplier is on the order of magnitude of 105. 6 IonSource++--Detector Figure 1.5 Diagram of a continuous dynode electron multiplier The output from the detector is a plot of intensity as a function of m/z. Because electron ionization is used, the mass spectrum will contain many peaks corresponding to individual fragment ions. This extensive fragmentation allows for identification due to the unique spectra per compound under a certain set of conditions. However, rapid identification in forensic laboratories is often done by visual comparison of the unknown spectrum with a spectrum of a known reference material. For newly emerging synthetic designer drugs with no available reference standard, drug analysts have no reference spectrum for comparison, and identification becomes more challenging. 1.3 Multivariate Statistical Procedures Multivariate statistical procedures such as principal components analysis (PCA) and linear discriminant analysis (LDA) have been applied forensically in various fields such as source tracing MDMA tablets and in fire debris analysis to identify ignitable liquids in complex matrices.11-12 Bonetti recently applied LDA to a set of fluoromethcathinone and fluorofentanyl isomers, using PCA as a variable selection method.13 While success was shown for differentiating isomers in Bonetti's work, the focus of this thesis was on the classification of structural subclasses of phenethylamines and tryptamines. 7 -1500 VGround++++++Resistive conductivesurface 1.3.1 Principal Components Analysis PCA is a dimensionality-reducing procedure in which linear combinations of the original variables are created to describe natural variance within a data set.14 PCA is an unsupervised method, meaning that the method has no group knowledge and identifies natural groups based on variance. Based on patterns within the data set, new axes are created called principal components (PCs). When the number of variables outnumbers the number of samples, the maximum number of calculated PCs is N-1 where N is equal to the number of variables. Scree plots indicate the variance described by each PC as well as the cumulative variance, which should equal 100% when all calculated PCs are included. Each PC is a linear combination (Equation 1) of uncorrelated original variables (x1 and x2) with a weighting coefficient (a1 and a2) that depends on the extent to which that particular variable contributes to the variance. In this equation, p represents the total number of variables. A score is calculated for a sample on each PC using the mean-centered data incorporated into the y = a1x1 + a2x2 +… apxp (1) linear combination for that PC. For example, assume the linear combination in Equation 2 applies to PC1 and that of Equation 3 applies to PC2 where the numerical values are the weighting coefficients and Vn indicates a single variable. y = 0.56 (V1) + 0.31 (V2) + 0.19 (V3) (2) y = 0.41 (V1) + 0.12 (V2) + 0.05 (V3) (3) If the mean-centered data indicated that a sample exhibited the values in Equations 4 and 5, the score on PC1 would be 3.88 and the score on PC2 would be -0.97. y = 0.56 (-4.0) + 0.31 (6.0) + 0.19 (-1.0) = 3.88 (4) y = 0.41 (-4.0) + 0.12 (6.0) + 0.05 (-1.0) = -0.97 (5) 8 These scores are plotted as coordinates for each sample on a two-dimensional plane to create a scores plot. Natural grouping is observed to demonstrate variance between different sets of samples with regards to the variables. PC1 always describes the most variance within a data set while PC2 is drawn orthogonally to PC1, as seen in Figure 1.6. Figure 1.6 Example scores plot where x and y are original measurement variables and the green dots are samples; PC1 is drawn to describe the most variance in the data set and PC2 is drawn Loadings values provide a way to determine the contribution of individual variables to orthogonally to PC1 each PC. Loadings values can range from -1 to +1 where -1 is the maximum negative contribution of a variable to a PC, zero indicates no contribution, and +1 is the maximum positive contribution. From these data, loadings plots can be generated to visualize which variables contribute most heavily to the positioning of the samples on the scores plot. For mass spectral data with hundreds of variables (m/z values), loadings plots are best represented as plot of PC loadings versus m/z. In the example loadings plot in Figure 1.7, m/z 154 contributes most positively to PC1 while m/z 152 contributes most negatively. In this example, m/z 151 will not be 9 xy seen in the linear combination for PC1 as it has no contribution. With relation to the scores plot, samples that contain m/z 150 or 154 will be positioned more positively on PC1, while samples containing m/z 152 or 153 will be positioned more negatively. These contributions establish natural groups based on similarities in the mass spectra data. Figure 1.7 Example loadings plot using mass spectral data 1.3.2 Linear Discriminant Analysis LDA is similar to PCA in that it is a multivariate statistical procedure with the aim of reducing dimensionality. However, LDA is a supervised technique which means that group knowledge is known.14 Because of this, the weighting coefficient to formulate the linear combination is selected to maximize between-class variance while minimizing within-class 10 Loadings (PC1)+10-1150151152153154m/z variance. Samples can then be assigned a score based on mean-centered raw data with respect to the variables, and a scores plot can be generated. The axes are now called linear discriminants (LDs) where the total number of LDs is equal to M-1, where M is the number of groups. LDA requires that the total number of samples is greater than the number of variables. The general equation for a linear discriminant function is the same as PCA (Equation 1) where a represents the weighting coefficient, x is the original measurement variable, and p is the number of variables represented by a particular LD. A centroid is calculated for each group based on the average of each LD for the members of that group. A score can then be calculated for a new sample and classified to the group in which the sample has the shortest distance to the centroid. For LDA, Mahalanobis distance is used as the measurement (Equation 6) where x is the sample measurement, µ is the centroid, and C-1 is the sample covariance matrix. D = √(x − μ)T ∗ C−1 ∗ (x − μ) In the example in Figure 1.8, the new sample would be classified as a member of Group 3 based on the defined LDA model. 11 Figure 1.8 Example LDA scores plot 1.4 Research Objectives The overall goal in this research was to create a statistical model to classify unknown synthetic drugs according to structural class or subclass. For classification, LDA was selected because it is an objective technique with the ability to use raw mass spectral data to define models that can then be applied to new samples for classification. However, the total number of variables must be less than that of the total number of samples. With mass spectral data, there are hundreds of variables as each m/z value represents a single variable. Therefore, before LDA could be utilized, the variable set first needed to be reduced. For variable reduction, two methods were used. The first, PCA was chosen as an unsupervised method to identify the variables that described natural variance in the data set. The second method used an informed chemical approach where mass spectra were probed for ions characteristic of each class or subclass of drugs. Each method for variable selection is discussed in Chapter 3. 12 Discriminant Function 1Discriminant Function 2Group 1 MemberGroup 2 MemberGroup 3 MemberNew Sample Following variable selection, LDA was used to create a model in which new compounds could be introduced and classified. Three subclasses of phenethylamines as well as various tryptamines were used for model development. Classification success rates were then calculated to determine the optimal method for variable selection as well as the overall classification success of the models. Chapter 4 presents the results for the optimization and model development of LDA. 13 REFERENCES 14 REFERENCES 1. Bureau of Justice Statistics (BJS). Publicly Funded Forensic Crime Laboratories: Resources and Services, 2014. https://www.bjs.gov/content/pub/pdf/pffclrs14_sum.pdf 2. U.S. Drug Enforcement Administration (DEA) Diversion Control Division. National Forensic Laboratory Information System 2015 Annual Report. https://www.deadiversion.usdoj.gov/nflis/2015_annual_rpt.pdf 3. U.S. Drug Enforcement Administration (DEA). Drug Scheduling. https://www.dea.gov/druginfo/ds.shtml. 4. United Nations Office on Drugs and Crime. World Drug Report 2017: Market Analysis of Synthetic Drugs, Booklet 4. http://www.unodc.org/wdr2017/field/Booklet_4_ATSNPS.pdf 5. Scientific Working Group for the Analysis of Seized Drugs (SWGDRUG). Recommendations. http://www.swgdrug.org/Documents/SWGDRUG%20Recommendations%20Version%2 07-0.pdf 6. Zuba, D., Sekula, K., Identification and characterization of 2,5-dimethoxy-3,4-dimethyl- B-phenethylamine (2C-G)- A new designer drug. Drug Testing and Analysis 2012, 5, 549-559. 7. Sekula, K., Zuba, D. Structural elucidation and identification of a new derivative of phenethylamine using quadrupole time-of-flight mass spectrometry. Rapid Communications in Mass Spectrometry 2013, 27, 2081-2090. 8. Pasin, D., Cawley, A., Bidny, S., Fu, S. Characterization of hallucinogenic phenethylamines using high-resolution mass spectrometry for non-targeted screening purposes. Drug Testing and Analysis 2017, 9, 1620-1629. 9. Watson, J.T., Sparkman, O.D. Introduction to Mass Spectrometry, 4th edition; John Wiley & Sons, LTD: West Sussex, England, 2007. 10. De Hoffman, E., Stroobant, V. Mass Spectrometry: Principles and Applications, 3rd edition; John Wiley & Sons, LTD: West Sussex, England, 2007. 11. Weyermann, C., et al. Drug intelligence based on MDMA tablets data: I. Organic impurities profiling. Forensic Science International 2008, 177, 11-16. 12. Waddell, E. E., et al. Progress toward the determination of correct classification rates in fire debris analysis. Journal of Forensic Sciences 2013, 58 (4), 887-896. 15 13. Bonetti, J. Mass spectral differentiation of positional isomers using multivariate statistics. Forensic Chemistry 2018, 9, 50-61. 14. Smith, R. Chemometrics. In Forensic Chemistry: Fundamentals and Applications; Siegel, J., Ed; John Wiley & Sons, Ltd: West Sussex, UK, 2016; pp 469-503. 16 II. Materials and Methods 2.1 Synthetic Designer Drug Reference Materials Reference materials representative of the phenethylamine and tryptamine classes were purchased from Cayman Chemical (Ann Arbor, MI). For the phenethylamine class, six aminopropyl benzofuran phenethylamine (APB), ten 2,5-dimethoxyphenethylamine (2C), and ten 2,5-dimethoxy-N-(2-methoxybenzyl) phenethylamine (NBOMe) compounds were acquired. Ten compounds from the tryptamine class were also purchased for testing. The full chemical names for each compound in the data set can be found in the Chapter 2 Appendix. Structures for the compounds are seen in Figures 2.1-2.4. Reference materials were prepared by dissolving 1 mg in 1 mL of methanol (ACS grade, Sigma Aldrich, St. Louis, MO) for analysis by GC-MS. Figure 2.1 Structures of the APB-phenethylamine reference standards (A) 4-APB (B) 5-APB (C) 6-APB (D) 7-APB (E) 4-MAPB and (F) 4-EAPB 17 ONH2ONH2ONH2ONH2ONHONHA)B)C)D)E)F) Figure 2.2 Core structure for 2C-phenethylamines Table 2.1 Substituents for 2C-phenethylamines investigated Compound R1 R2 Compound R1 R2 2C-B 2C-C 2C-D 2C-E 2C-G -H -H -H -H -CH3 -Br -Cl -CH3 -CH2CH3 -CH3 2C-H 2C-I 2C-N 2C-P 2C-T -H -H -H -H -H -H -I -NO2 -CH2CH2CH3 -SCH3 Figure 2.3 (A) Core structure for NBOMe-phenethylamines and (B) structure for 3,4-DMA-NBOMe Table 2.2 Substituents for NBOMe-phenethylamines investigated Compound 25B-NBOMe 25C-NBOMe 25D-NBOMe 25E-NBOMe 25G-NBOMe R1 -H -H -H -H -CH3 R2 -Br -Cl -CH3 -CH2CH3 -CH3 Compound 25H-NBOMe 25I-NBOMe 25N-NBOMe 25P-NBOMe 25T-NBOMe -H -H -H -H -H R1 R2 -H -I -NO2 -CH2CH2CH3 -SCH3 18 OONH2R1R2OONHOR1R2OONHOOOA)B) Figure 2.4 Structures of the tryptamine reference standards (A) α-MT (B) α-ET (C) N,N-DMT (D) DPT (E) 4-hydroxy DMT (F) 4-hydroxy DET (G) 4-Me-α-ET (H) 5-methoxy DMT (I) 5-methoxy DiPT and (J) 5,7 DCT 2.2 Gas Chromatography-Mass Spectrometry (GC-MS) Analysis The ten tryptamine reference standards were analyzed by low-resolution (GC-QMS) and high-resolution instruments (GC-TOFMS). The six APB-phenethylamines (4-APB, 5-APB, 6-APB, 7-APB, 4-MAPB, and 4-EAPB) and six of the NBOMe-phenethylamines (25B-, 25E-, 25H-, 25P-, 25T-, and 3,4-DMA-NBOMe) were analyzed on the same GC-QMS instrument. Previously collected low-resolution and high-resolution data were used for the 19 NHNH2NHNH2NHNNHNNHNOHNHNOHNHNH2NHNONHNONHNH2ClClA)B)C)D)G)J)E)F)H)I) 2C-phenethylamines (2C-B, C, D, E, G, H, I, N, P, and T) and for 25C-, 25D-, 25G-, and 25N-NBOMe. Low-resolution data were used for testing in the multivariate statistical procedures and high-resolution data were used for elemental formula confirmation. The GC-QMS contained an Agilent 7890 gas chromatograph coupled to an Agilent 5975 mass spectrometer with an Agilent 7693A injector (Agilent Technologies, Santa Clara, CA). A DB-5 column was used with a 5%-diphenyl-95%-dimethyl polysiloxane stationary phase and dimensions of 30 m x 0.25 mm internal diameter x 0.25 µm film thickness (DB-5, Restek, Bellefonte, PA). The injection temperature was 250 °C with 1 µL of sample injected with a 50:1 split. For the carrier gas, ultra-high purity helium (Airgas, Radnor Township, PA) was used at a nominal flow rate of 1 mL/min. The oven temperature was held at 40°C for 1 min and then increased at a rate of 20 °C/min until the oven reached 280 °C, with a final hold of 2 min. The transfer line was heated at 280 °C. Electron ionization was employed at 70 eV. The scan range was set to m/z 35-450, with a scan rate of 2.83 scans/s, to encompass a wide range of ions. The temperature of the ion source was 230 °C while the temperature of the mass analyzer was 150 °C. The GC-TOFMS that was used to analyze 3,4-DMA-NBOMe, 4-MAPB, 4-EAPB and the tryptamines was a Waters GCT Premier (Waters, Milford, MA). This instrument contained an Agilent 6890N gas chromatograph coupled to a Waters GCT mass spectrometer and an Agilent 7683B autosampler. A DB-5 column with the same dimensions and stationary phase as the GC- QMS was used. The GC parameters and oven program were identical to those used to collect the low-resolution spectra. Electron ionization at 70 eV was used with a scan range of m/z 35-450 at a scan rate of 5.00 scans/s. The temperature of the ion source was 180 °C and the temperature of the mass analyzer was 130 °C. Perfluoro-tertbutylamine (PFTBA) was selected to calibrate the 20 instrument for good mass accuracy. The resolution of this GC-TOF was up to 7,000 full width at half maximum (FWHM). The high-resolution data for the remaining NBOMe-, 2C-, and APB- reference materials were from data collected previously on a LECO Pegasus GC-HRT (LECO Corporation, St. Joseph, MI). This instrument contained an Agilent 7890N gas chromatograph with a LECO Pegasus HRT mass spectrometer and a Gerstel MPS2 (GERSTEL, Inc., Linthicum Heights, MD) autosampler. The stationary phase for the GC column was 1,4-bis(dimethylsiloxyl)phenylene dimethyl polysiloxane (Rxi-5sil ms), with dimensions 20 m x 0.18 mm x 0.18 µm (Restek, Bellefonte, PA). For each reference material, 1 µL was injected at a temperature of 250 °C with a 100:1 split ratio. Ultra-high purity helium was used as the carrier gas with a nominal flow rate of 0.85 mL/min. The oven temperature started at 60 °C for 0.5 min and was ramped to 340 °C at 36 °C/min, with a final hold of 4 min. The transfer line temperature was 300 °C. Electron ionization at 70 eV was used with a scan range of m/z 35-510 at a scan rate of 10 scans/s. The temperature of the ion source was 250 °C. PFTBA was used as a calibrant for each analysis and the resolution of this instrument was up to 50,000 FWHM. 2.3 Data Processing Low-resolution mass spectra were obtained from GC-QMS by taking a single scan at the apex of the peak in the chromatogram. The intensity values were normalized to the base peak and imported into Origin (version 9.0 OriginLab Corporation, Northampton, MA), where a spectrum was generated. High-resolution mass spectra were obtained from GC-TOFMS using MassLynx (version 4.1, Waters) by averaging ten scans where the intensity was on the order of magnitude of 104 but no greater than 5x105. A peak separation value of 0.05 was used to generate all mass spectra. 21 Mass accuracy, in ppm, was determined for all ions in the spectrum above a given threshold using an algorithm in MassLynx. The known elemental formula for each reference standard was used to indicate the maximum number of each element to minimize the total number of possible elemental formulae for each peak. For each spectrum, the accurate mass, intensity, elemental formula, and corresponding mass accuracy were exported to Microsoft Excel (Microsoft Corporation, Redmond, WA). The intensity values were normalized to the base peak and then plotted in Origin. 2.4 Statistical Models in R The ultimate goal in this work was to use linear discriminant analysis (LDA) as a classification tool for synthetic designer drugs. LDA requires the presence of more samples than variables, which is difficult with mass spectral data sets. For this reason, two methods were used in this work for variable selection to obtain variable data sets small enough for LDA. The first method, principal components analysis (PCA), was used on the full data set, meaning that the full spectrum was used for all reference standards. Scores plots and loadings plots were generated based on results from R code, which are listed in the Chapter 2 Appendix. Loadings values for each m/z value were normalized to the highest loadings value in the first three principal components (PCs). Three variable sets were determined based on m/z values with normalized loadings greater than 15%, 20%, and 30%. The second method of variable selection involved probing mass spectra for characteristic ions of each class or subclass of drugs. From this, a variable set was created that could be compared with that from the PCA method. Following variable selection, the spectra from the reference materials were split into a training set and a test set. The black box method was used to randomly select the tryptamines and 22 APB- and 2C-phenethylamines that would constitute the test set. For the tryptamines and 2C-phenethylamines, three of the ten compounds were selected for the test set and the remaining seven stayed in the training set. One APB-phenethylamine was selected for the test set and the remaining five became the training set compounds. All of the NBOMe-phenethylamines analyzed via GC-QMS for this work were used for the training set, with data collected previously used for the test set. In addition, all NBOMe- and APB-phenethylamines as well as tryptamines in the training set were analyzed on a different day and those data were used as additional test set compounds to increase the robustness of the statistical models. Tables 2.3-2.5 show the compounds that comprised the training and test sets. LDA models were defined using the training set for each of the three variable sets resulting from PCA. Leave-one-out cross validation was performed and the classification success of each variable set was assessed. The compounds in the test set were classified based on the LDA model developed using the most successful variable set, and the classification success rate was calculated based on the percentage of correctly classified compounds. A separate LDA model was generated using the variables selected by probing the mass spectra for characteristic ions. The test set was then incorporated and the classification success rate was calculated. The two variable selection methods were compared in terms of the performance to classify known reference materials from a test set into the correct class or subclass of designer drug. 23 Table 2.3 Training Set of Reference Standards for Classification Models 2C-Phenethylamines APB- NBOMe- Phenethylamines Phenethylamines 2C-B 2C-D 2C-E 2C-H 2C-I 2C-N 2C-T 4-APB 5-APB 6-APB 7-APB 4-EAPB 25B-NBOMe 25E-NBOMe 25H-NBOMe 25P-NBOMe 25T-NBOMe 3,4-DMA-NBOMe Tryptamines 5,7-DCT α-ET 4-Me-α-ET 4-hydroxy DMT 4-hydroxy DET 5-methoxy DMT DPT Table 2.4 Test Set 1 of Reference Standards for Classification Models 2C-Phenethylamines APB- NBOMe- Tryptamines Phenethylamines Phenethylamines 2C-C 2C-G 2C-P 4-MAPB 25G-NBOMe 25C-NBOMe 25D-NBOMe 25N-NBOMe 25I-NBOMe N,N-DMT α-MT 5-methoxy DiPT Table 2.5 Test Set 2 of Reference Standards for Classification Models APB- NBOMe- Phenethylamines Phenethylamines 4-APB 5-APB 6-APB 7-APB 4-EAPB 25B-NBOMe 25E-NBOMe 25H-NBOMe 25P-NBOMe 25T-NBOMe 3,4-DMA-NBOMe Tryptamines 5,7-DCT α-ET 4-Me-α-ET 4-hydroxy DMT 4-hydroxy DET 5-methoxy DMT DPT 24 APPENDIX 25 Table A.1 Compound abbreviations with full chemical names Compound Abbreviation Full Chemical Name Compound Abbreviation 4-APB 4-(2-aminopropyl)benzofuran 7-APB 5-APB 5-(2-aminopropyl)benzofuran 4-MAPB 6-APB 6-(2-aminopropyl)benzofuran 4-EAPB 2C-B 2C-C 2C-D 2C-E 2C-G α-MT α-ET 2,5-dimethoxy-4- bromophenethylamine 2,5-dimethoxy-4- chlorophenethylamine 2,5-dimethoxy-4- methylphenethylamine 2,5-dimethoxy-4- ethylphenethylamine 3,4-dimethyl-2,5- dimethoxyphenethylamine 2C-H 2C-I 2C-N 2C-P 2C-T α-methyl-1H-indole-3- 4-hydroxy ethanamine α-ethyl-1H-indole-3- ethanamine 4-Me-α-ET N,N-DMT N,N-dimethyl-1H-indole-3- 5-methoxy ethanamine DMT DPT N,N-dipropyl-1H-indole-3- 5-methoxy ethanamine DiPT 4-hydroxy 3-[2-(dimethylamino)ethyl]- DMT 25D- NBOMe 1H-indol-4-ol 2-(2,5-dimethoxy-4- methylphenyl)-N-(2- methoxybenzyl)ethanamine 2-(4-ethyl-2,5- 5,7-DCT 25E-NBOMe dimethoxyphenyl)-N-(2- 25N-NBOMe 25G- NBOMe 25B-NBOMe methoxybenzyl)ethanamine 2,5-dimethoxy-N-[(2- methoxyphenyl)methyl]-3,4- dimethyl-benzeethanamine 4-bromo-2,5-dimethoxy-N- [(2-methoxyphenyl)methyl]- benzeneethanamine 25P-NBOMe 25T-NBOMe 26 Full Chemical Name 7-(2-aminopropyl) benzofuran 4-(2-methylaminopropyl) benzofuran 4-(2-ethylaminopropyl) benzofuran 2,5- dimethoxyphenethylamine 2,5-dimethoxy-4- iodophenethylamine 2,5-dimethoxy-4- nitrophenethylamine 2,5-dimethoxy-4- propylphenethylamine 2,5-dimethoxy-4- methylthiophenethylamine 3-[2-(diethylamino)ethyl]- α-ethyl-4-methyl-1H-indole- 3-ethanamine 5-methoxy-N,N-dimethyl- 1H-indole-3-ethanamine 5-methoxy-N,N-bis(1- methylethyl)-1H-indole-3- ethanamine 5,7-dichloro-1H-indole-3- ethanamine 2-(2,5-dimethoxyphenyl)-N- ethanamine 2-(2,5-dimethoxy-4- nitrophenyl)-N-(2- methoxybenzyl)ethanamine 2,5-dimethoxy-N-[(2- methoxyphenyl)methyl]-4- propyl-benzeneethanamine 2,5-dimethoxy-N-[(2- methoxyphenyl)methyl]-4- (methylthio)- benzeneethanamine DET 1H-indol-4-ol 25H-NBOMe (2-methoxybenzyl) Table A.1 Compound abbreviations with full chemical names (cont'd) Compound Abbreviation Full Chemical Name 2-(4-chloro-2,5- 25C-NBOMe dimethoxyphenyl)-N-(2- methoxybenzyl)ethanamine Compound Abbreviation 3,4-DMA- NBOMe Full Chemical Name 3,4-dimethoxy-N-[(2- methoxyphenyl)methyl]-α- methyl-benzeethanamine Table A.2 R Code for Inputting Data R Code getwd() Action Identifies current working directory data=read.table("RData.txt",header=TRUE) setwd("C:/Users/Amanda/Documents/Forensic_Research/Data") Sets new working directory Imports and names data set (header=TRUE if first row/column is variable/sample name) Identifies header names Attaches data set for use names(data)=c("Mass44","Mass91"…"Type") attach(data) Table A.3 R Code for PCA R Code pca<-prcomp(data,scale=FALSE) print(pca) summary(pca) pca$x Action Code to perform PCA Output for loadings Output for scree Output for scores 27 Table A.4 R Code for LDA R Code Action library(MASS) train<-data[1:25,] test<-data[26:55,] data.lda=lda(Type~Mass44+Mass91+…,data=train) data.lda data.lda.values<-predict(data.lda,data[1:25,]) data.lda.values$x lda.pred<-predict(data.lda,test) lda.pred$posterior lda.pred$x Loads R package that contains LDA code Selects data for training set Selects data for test set Performs LDA on training set with selected variables Displays LDA results Code to obtain scores for samples in training set Displays scores for training set samples Code to incorporate test set Output for probability that a sample in test set belongs to each of the groups defined for LDA Displays scores for test set samples 28 III. Variable Selection for Linear Discriminant Analysis In forensic laboratories, controlled substances that are submitted as powders are often analyzed by gas chromatography-single quadrupole mass spectrometry (GC-QMS) followed by a comparison to spectra of known reference materials. However, when a novel designer drug is submitted, there is often no available reference material and thus no spectrum for comparison. The objective in this work was to develop a classification model using linear discriminant analysis (LDA) in which new samples could be introduced and subsequently classified into a class or subclass of designer drugs. However, as LDA requires a greater number of samples than variables, this criterion is not met when all m/z values from mass spectra are used as variables. Therefore, appropriate selection of m/z values is critical for classification success; so part of this work investigated two different selection methods. The first method used principal components analysis (PCA) to identify m/z values that described the greatest variance among the compounds. The second method was based on an informed chemical approach, using mass spectral interpretation to identify ions characteristic of each class or subclass. 3.1 Variable Selection by Principal Components Analysis (PCA) The first method of variable selection used the unsupervised approach of PCA to determine the m/z values that accounted for the most variance among the compounds in the data set. The entire set of reference standards, found in the Chapter 2 Appendix, for tryptamines as well as APB-, 2C-, and NBOMe-phenethylamines was used for PCA. Using the R codes given in the Chapter 2 Appendix, PCA was performed on the full mass spectrum, ranging from m/z 40- 440, for each reference material. A scree plot (Figure 3.1) was first generated to determine the contribution of each principal component (PC) to the total variance. By PC35, 100% of the variance was described. However, only PC1, 2, and 3 were used for variable selection. Further 29 PCs were explored but were not considered in the analysis because the variance described by each was that of within-class variance rather than between-class variance. The goal of defining variables is to use them in LDA to create distinct groups that allow for classification of new samples. Using m/z values beyond PC3 would weaken the subsequent LDA models. The first three PCs accounted for 44.65% of the total variance with individual contributions of 23.40%, 10.95%, and 10.29%, respectively. Figure 3.1 Scree plot for PCA showing proportion of variance (red) and cumulative proportion (black) described by each PC PC scores were then calculated for each compound and plotted for PC1 versus PC2 and for PC1 versus PC3 (Figure 3.2). From the scores plot of PC1 (23.40%) versus PC2 (10.95%), the NBOMe-phenethylamines are positioned positively on PC1, while the remaining compounds are positioned negatively. The 2C-phenethylamines and tryptamines overlap on PC2, but gain some additional separation on PC1 and PC3. The APB-phenethylamines are separated from the 30 01020304050607080901000510152025303540Proportion of Variance (%)Principal ComponentVarianceCumulative Proportion 2C-phenethylamines on both PC1 and PC3 but score similarly to the tryptamines on all three PCs. Figure 3.2 Scores plot for A) PC1 vs. PC2 and B) PC1 vs. PC3 31 -1000100-1000100PC2 (10.95%)PC1 (23.4%)APBNBOMe2CTryptamines-1200120-1000100PC3 (10.29%)PC1 (23.4%)APBNBOMe2CTryptaminesA)B) The loadings for each PC were plotted as a function of m/z (Figures 3.3, 3.4, and 3.5) to demonstrate the variables contributing positively and negatively to each PC. The loadings value for each m/z value is a measurement of the extent of contribution, where +1 is the maximum positive contribution of a variable to a PC and -1 is the maximum negative contribution to a PC. For example, in the loadings plot for PC1 (Figure 3.3), m/z 121 contributes most positively and m/z 58 contributes most negatively. Therefore, it would be expected that compounds with a high intensity of m/z 121 would be positioned more positively on PC1 in the scores plot and compounds with a high intensity of m/z 58 would be positioned more negatively. More specifically, because the data is mean centered, compounds with an intensity of m/z 121 greater than the average will be positioned positively and compounds with an intensity of m/z 58 greater than the average will be positioned negatively. Figure 3.3 Loadings plot for PC1 32 445891121131150-0.4-0.20.00.20.40.60.81.0406080100120140160180200220240260280300320340360380400420440Loadings (PC1)m/z Figure 3.4 Loadings plot for PC2 Figure 3.5 Loadings plot for PC3 33 4458121130131132198199-0.8-0.6-0.4-0.20.00.20.40.6406080100120140160180200220240260280300320340360380400420440Loadings (PC2)m/z4458121130131165180197198199200201-0.8-0.6-0.4-0.20.00.20.40.6406080100120140160180200220240260280300320340360380400420440Loadings (PC3)m/z Based on the scores plots in Figure 3.2, the NBOMe-phenethylamines are well separated from the other three groups on PC1. Compounds in this class all contain a base peak at m/z 121 with high intensity peaks at m/z 91 (~25% relative intensity) and m/z 150 (~40-60% relative intensity), as demonstrated by example spectra of NBOMe-phenethylamines in the appendix of this chapter. The loadings plot for PC1 shows that the positive contributions are dominated by these three ions, which account for the separation of compounds in the NBOMe subclass. Two ions that contribute negatively to PC1 are m/z 44 and 58. The isomeric APB- phenethylamines (4-, 5-, 6-, and 7-APB) contain a base peak at m/z 44 while 4-MAPB has a base peak at m/z 58. Several of the compounds in the tryptamine class also contain m/z 58 in high intensity such as 4-hydroxy DMT, 5-methoxy DMT, 4-Me-α-ET, N,N-DMT, and α-ET. The similarities in these two ions for the APB-phenethylamines and tryptamines account for the overlap of these two classes on PC1. Based on the loadings plot for PC2 (Figure 3.4), m/z 44 and m/z 131 both contribute negatively. The isomeric APB-phenethylamines contain a base peak at m/z 44 as well as a high intensity (~30%) peak at m/z 131, positioning 4-, 5-, 6-, and 7-APB negatively on PC2. The two tryptamines positioned negatively on PC2 that are grouped closer to the isomeric APB-phenethylamines are α-MT and α-ET. Both of these tryptamines have a base peak of m/z 131 as well as high intensity (~80%) peaks at m/z 130, another significant negative contributor to PC2. A small negative contribution from m/z 121 is observed for PC2, positioning the NBOMe- phenethylamines slightly negatively on this PC. Three ions, m/z 58, 198, and 199, contribute positively to PC2. Several tryptamines contain m/z 58 at high intensity or as the base peak (Figures A.1 and A.3) and 5,7-DCT contains m/z 198 and 199 at high intensity (100 and 99%, respectively). Because of these contributions, 34 the tryptamines are positioned positively on PC2. A number of 2C-phenethylamines (i.e., 2C-B, N, and T) also contain m/z 198 and 199 which, when coupled with the absence of ions contributing negatively to PC2, cause the 2C-phenethylamines to position positively on this PC. The separation between the 2C-phenethylamines and tryptamines on PC3 is due primarily to the high number of ions present in 2C compounds that contribute positively to PC3. These ions include m/z 65, 180, 197, 198, and 201. Further separation is achieved through m/z 58 which contributes negatively to PC3 and is present in a large percentage of the tryptamines. Again, overlap occurs between the APB-phenethylamines and tryptamines because of the similarities of the compounds with regards to m/z 44, 58, and 131. Loadings from the first three PCs were normalized to the m/z value with the highest loadings value across all three PCs. Three variable data sets were then defined containing m/z values with greater than 30%, 20%, and 15% relative loadings. Thresholds were selected based on the number of allowed variables for LDA in relation to the total number of compounds that would constitute the training set. Any threshold lower than 15% would cause more variables than the 25 compounds in the training set. The three variable data sets are shown in Table 3.1. These sets of variables will be used to define LDA models in Chapter 4. Table 3.1 m/z Values Identified using PCA >30% Relative Loadings >20% Relative Loadings >15% Relative Loadings 44 58 121 44 58 91 121 131 131 150 150 198 199 198 199 200 44 58 91 121 130 131 132 150 165 180 197 198 199 200 201 35 PCA is a valuable technique used to identify natural groups in a data set based on variance. A scores plot creates a visualization of those natural groups while a loadings plot indicates which variables describe the most variance. However, with mass spectral data, the most intense peaks are not always representative of ions characteristic of a compound or compound class. Base peaks are often low molecular weight ions that are simply the most stable ions following fragmentation. For this reason, separation is not always achieved when the spectra of compounds from different classes are dominated by low mass ions. Therefore, an approach was needed that used informed chemical information to identify ions that were characteristic of the different classes under investigation. 3.2 Variable Selection based on Characteristic Ions The second method of variable selection involved probing the low-resolution mass spectra for ions considered to be characteristic of the phenethylamines and tryptamines. Common ions within classes or subclasses were identified as integer m/z values in the low- resolution spectra. Using high-resolution spectra acquired from the gas chromatography-time-of- flight mass spectrometry (GC-TOFMS) analyses, the accurate masses were used to confirm the chemical formulae of those ions, to a particular accuracy calculated in parts per million (ppm). Formulae were then used to predict the structure of each fragment ion and ions that were characteristic of each class were selected based on fragmentation patterns. Given that electron ionization was used in both the GC-QMS and GC-TOFMS instruments, the fragmentation of compounds was similar in both low-resolution and high-resolution spectra (Figure 3.6). For that reason, low-resolution spectra were used for the statistical procedures and high-resolution spectra were only used to confirm molecular formulae of ions commonly observed within the designer drug classes. 36 Figure 3.6 A) Low-resolution and B) High-resolution spectra for 5-methoxy DiPT 37 50100150200250300020406080100Relative Intensity (%)m/z50100150200250300020406080100Relative Intensity (%)m/z1147216017472.0812C4H10N-1.4 ppm114.1267C7H16N-14 ppm160.0766C10H10NO-0.6 ppm174.0915C11H12N-2.3 ppmA)B) 3.2.1 Characteristic Ions for Tryptamines Low-resolution mass spectra were first probed for the tryptamine class of synthetic designer drugs. Ten compounds were used: 4-hydroxy DMT, 4-hydroxy DET, 5-methoxy DMT, 5-methoxy DiPT, DPT, N,N-DMT, 4-Me-α-ET, 5,7-DCT, α-ET, and α-MT. The tryptamine class can be divided into three groups based on substitution: non-aromatically substituted, hydroxy-substituted, and methoxy-substituted tryptamines. Full chemical names for these compounds are given in the Chapter 2 Appendix. 3.2.1 Non-Aromatically Substituted Tryptamines All compounds in the tryptamine class contained m/z 130, which ranged in relative intensity from less than 1% to 85%. Compounds with no substitution on the aromatic ring, such as α-MT and α-ET, contained m/z 130 in higher abundance (Figure 3.7) compared to aromatically-substituted tryptamines. The accurate mass for this peak using high-resolution data was 130.0667 Da (7.7 ppm mass accuracy) in α-MT and 130.0658 Da (0.8 ppm) in α-ET. In both cases, this accurate mass corresponded to a chemical formula of C9H8N+, which is consistent with the non-aromatically substituted core structure of the tryptamines (Figure 3.8). In α-MT and α-ET, the loss of m/z 44 and 58, respectively, from the molecular ion leads to the ion with m/z 130. Fragment ions at m/z 44 [M - C2H6N]+ and 58 [ M - C3H8N]+ are a result of α-β bond cleavage, separating the aromatic ring from the amine chain. If the amine is tertiary, the resulting ion appears as m/z 58 and if the amine is secondary, the fragment ions appears as m/z 44. 38 Figure 3.7 Low-resolution spectra for A) α-MT and B) α-ET 39 50100150200250300020406080100Relative Intensity (%)m/z50100150200250300020406080100Relative Intensity (%)m/zB)A)1301305844131131NHNH2NHNH2 Figure 3.8 Proposed fragmentation for A) α-MT and B) α-ET to produce m/z 130 Another peak observed in the mass spectra of α-MT and α-ET is m/z 131. While most non- aromatically substituted tryptamines contain the m/z 130 ion, α-MT and α-ET also contain a base peak at m/z 131. The proposed fragmentation involves the loss of m/z 43 and 57, respectively, with rearrangement of the bonds to form a radical on the terminal ethyl group, moving the charge into the ring (Figure 3.9). The final characteristic peak in the spectra for α-MT and α-ET that was selected as a variable is m/z 77. Based on the high-resolution data, this peak corresponds to a molecular formula of C6H5 +, which is consistent with a six-membered aromatic ring. Although this ion is a common fragment ion present in various classes of aromatic compounds, m/z 77 was selected due to the similarities in relative intensity within specific groups (see Chapter 3 Appendix). 40 NHNH2NHNH2NHCH2+CH+NH2CH3CH+NH2CH3-44 Da-58 Da174 Da188 Da130 DaA)B) The other two non-aromatically substituted compounds in the data set were N,N-DMT and DPT. Both of these compounds contained m/z 130 and m/z 58, previously identified. However, m/z 130 was observed at a much lower abundance (~12% relative intensity, see Chapter 3 Appendix). N,N-DMT contains the m/z 58 peak; however, DPT exhibits a peak at m/z 114 due to the dipropyl substituted amine that is left as the fragment ion [C7H16N]+ after the ion corresponding to the m/z 130 [C9H8N]+ peak is formed. Figure 3.9 Proposed fragmentation for A) α-MT and B) α-ET to produce m/z 131 41 NHNH2NHNH2-43 Da-57 Da174 Da188 Da131 DaA)B)C+NHCH2CHNH2CH2+CHNH2CH+CH3 3.2.1 Aromatically-Substituted Tryptamines For tryptamines that contain a substituent on the aromatic ring, such as a methoxy or hydroxy group, the relative intensity of the m/z 130 peak was observed at only 2-3%. However, the ion corresponding to the substituted core tryptamine structure was identified for each of those compounds and incorporated into the list of variables. The two compounds containing a hydroxy group, 4-hydroxy DMT and 4-hydroxy DMT, shared a common peak at m/z 146 (Figure 3.10). The accurate mass for this peak from the high-resolution spectra in these two compounds confirmed the molecular formula as C9H8NO, which is consistent with a methoxy-substituted tryptamine compound (Figure 3.11). 42 Figure 3.10 Low-resolution spectra for A) 4-hydroxy DMT and B) 4-hydroxy DET 43 50100150200250300020406080100Relative Intensity (%)m/z50100150200250300020406080100Relative Intensity (%)m/z1461465886A)B)NHNOHNHNOH Figure 3.11 Proposed fragmentation for A) 4-hydroxy DMT and B) 4-hydroxy DET to produce m/z 146 The second class of aromatically substituted tryptamines, besides the hydroxy-substituted compounds, contain a methoxy group on the core ring structure. The two compounds, 5-methoxy DMT and 5-methoxy DiPT, also shared a common ion in the low-resolution spectra at m/z 160 (Figure 3.12). Upon analysis with the GC-TOFMS, the molecular formula for this ion was confirmed as C10H10NO. A loss of the amine chain with an α-β bond cleavage would result in the core ring structure with a methoxy substitution, a fragment ion with the molecular formula of C10H10NO and m/z of 160 (Figure 3.13). 44 NHNOHNHNOHNHCH2+OH146 Da204 Da232 DaCH2+NCH2+N-58 Da-86 DaA)B) Figure 3.12 Low-resolution spectra for A) 5-methoxy DiPT and B) 5-methoxy DMT 45 50100150200250300020406080100Relative Intensity (%)m/z50100150200250300020406080100Relative Intensity (%)m/z16016058114B)A)NHNONHNO Figure 3.13 Proposed fragmentation for A) 5-methoxy DiPT and B) 5-methoxy DMT To this point, ions had been selected for tryptamines with no aromatic substitution and for tryptamines with either a hydroxy- or methoxy-substituted aromatic ring. However, another reference standard, 4-Me-α-ET, contains a methyl group on the ring. Due to the methyl group, the relative intensity of the m/z 130 ion is smaller because an additional loss of a CH3 group is required to produce the ion with a mass of 130 Da. However, 4-Me-α-ET contains m/z 58 and m/z 146 in relatively high abundance. The m/z 58 ion occurs as a result of the α-β bond cleavage, which forms the high abundance ion at m/z 144. The second characteristic ion occurs at m/z 146, which has the same nominal mass as the ion produced in the hydroxy-substituted tryptamines, but with a different molecular formula. This ion in 4-Me-α-ET corresponds to a formula of C10H12N+, which is consistent with the core structure of a methyl-substituted tryptamine. 46 NHNONHNONHCH2+OCH2+N160 Da274 Da218 Da-114 Da-58 DaCH2+NA)B) The final tryptamine reference material included in the variable selection process was 5,7-DCT which contains the core tryptamine structure with two chlorine substituents on the aromatic ring. Because the ring is substituted and the amine on the alkyl chain is primary, 5,7-DCT does not contain m/z 44, 58, or 130 in high abundance. This compound also does not exhibit a hydroxy-, methoxy-, or methyl-substituted aromatic ring meaning that m/z 146 and m/z 160 are not present. The base peak for 5,7-DCT occurs at m/z 198 with m/z 199 at a relative intensity of 99% (Figure 3.14). The peak at m/z 199 occurs as a result of the α-β bond cleavage resulting in a loss of CH4N, which has a mass of 30 Da (Figure 3.15). Figure 3.14 Low-resolution spectrum for 5,7-DCT 47 50100150200250300020406080100Relative Intensity (%)m/z199198NHNH2ClCl Figure 3.15 Proposed fragmentation for 5,7-DCT to form m/z 199 Based on analysis of mass spectra and elemental formulae of fragment ions, confirmed through high-resolution data, eight ions were selected as diagnostic variables for LDA with regards to the tryptamine class (m/z 44, 58, 77, 130, 131, 146, 160, 199). 3.2.3 Characteristic Ions for APB-Phenethylamines Of the six APB-phenethylamine reference materials, four (4-, 5-, 6-, and 7-APB) are isomeric compounds with a base peak at m/z 44 and a high relative intensity (~30%) peak at m/z 131 [C9H7O]+ (Figure 3.16). These two fragment ions are a result of the α-β bond cleavage also observed in the tryptamine class (Figure 3.17). 48 NHNH2ClClCH2+NH2229 Da199 DaNHCH2+ClCl-30 Da Figure 3.16 Low-resolution spectrum of 4-APB Figure 3.17 Proposed fragmentation of 4-APB to form m/z 131 49 50100150200250020406080100Relative Intensity (%)m/z13144ONH2OCH2+CH2+NH2175 Da131 Da-44 DaONH2 The remaining two APB-phenethylamines, 4-MAPB and 4-EAPB, have the same core APB structure but contain an additional methyl or ethyl group, respectively, on the amine. When these two compounds fragment, [C9H7O]+ (m/z 131) is formed by the loss of either 58 Da from 4-MAPB or 72 Da from 4-EAPB (Figure 3.18-3.19). The final ion present at high relative abundance (~7-18%) in the APB-phenethylamines is m/z 77, whose molecular formula (C6H5 +) corresponds to an aromatic ring. 50 Figure 3.18 Low-resolution spectra for A) 4-MAPB and B) 4-EAPB 51 50100150200250020406080100Relative Intensity (%)m/z50100150200250020406080100Relative Intensity (%)m/z5872131131B)A)ONHONH Figure 3.19 Proposed fragmentation for A) 4-MAPB and B) 4-EAPB to form m/z 131 After probing the spectra for the APB-phenethylamines, four ions were selected as characteristic of this class (m/z 44, 58, 77, and 131). 3.2.4 Characteristic Ions for NBOMe-Phenethylamines The NBOMe-phenethylamines have very distinguishable spectra with characteristic peaks at m/z 91 [C7H7]+, 121 [C8H9O]+, and 150 [C9H12NO]+, with m/z 121 consistently present as the base peak (Figure 3.20). Mass spectra for NBOMe-phenethylamines all contain these three peaks, which are fragment ions representative of the methoxybenzyl group attached to the amine chain on the core phenethylamine structure (Figure 3.21). Because these ions dominate the 52 ONHONHCH+NHCH+NHOCH2+189 Da203 Da131 Da-58 Da-72 DaA)B) spectra, substitutions on the core aromatic ring do not affect the prevailing fragmentation patterns observed, making them excellent ions for use as diagnostic markers. Figure 3.20 Low-resolution mass spectrum for 25T-NBOMe 53 50100150200250300350400020406080100Relative Intensity (%)m/z91121150NHOOOS Figure 3.21 Proposed fragmentation for 25T-NBOMe for ions with m/z 91, 121, and 150 Based on the dominance of m/z 91, 121, and 150 in spectra for NBOMe- phenethylamines, these ions were selected as characteristic for this phenethylamine subclass. 3.2.2 Characteristic Ions for 2C-Phenethylamines Unlike the APB- and NBOMe-phenethylamines, the 2C-phenethylamines do not contain ions that are common to the whole class. Therefore, several m/z values were selected based on the common presence in subsets of these compounds, such as alkyl substituted versus non-alkyl substituted. The first ion selected as characteristic of the alkyl-substituted 2C-phenethylamines was m/z 165 [C10H13O2]+, which is the base peak for 2C-G and also present in high abundance in 2C-D, 2C-E, and 2C-P (Figure 3.22). The ion with a m/z value of 165 occurs when the amine chain cleaves from the ring, leaving a carbocation on the aromatic ring (Figure 3.23). 54 NHOOSOCH2+NHOCH2+OCH2+150 Da121 Da91 Da347 Da Figure 3.22 Low-resolution mass spectra for A) 2C-E and B) 2C-G 55 50100150200250300350020406080100Relative Intensity (%)m/z50100150200250300350020406080100Relative Intensity (%)m/zA)B)165165209209OONH2OONH2 Figure 3.23 Proposed fragmentation for A) 2C-E and B) 2C-G to form m/z 165 56 OONH2OONH2C+OOC+OO209 Da209 Da165 Da165 DaA)B) When fragmentation occurs due to an α-β bond cleavage in the amine chain in a 2C- phenethylamine with a sulfur substituent, the result is a fragment ion at m/z 198 [C10H14O2S]+ , such as in the case of 2C-T (Figure 3.24-3.25). Figure 3.24 Low-resolution mass spectrum for 2C-T 57 50100150200250300350020406080100Relative Intensity (%)m/z227198OONH2S Figure 3.25 Proposed fragmentation of 2C-T to form m/z 198 The final ion that appears in the 2C-phenethylamine class is m/z 77 which is indicative of a protonated aromatic ring. This ion is in the 2C class in relative intensities ranging from 4-14%. The halogenated and nitro-containing 2C-phenethylamines contained m/z 77 at high relative intensity than the remaining compounds. However, no other common characteristic ions were observed and incorporated into the variable set. Therefore, the three m/z values selected as characteristic for the 2C-phenethylamines were m/z 77, 165, and 198. 3.3 Summary Prior to the application of linear discriminant analysis (LDA), a method of variable reduction needed to be employed for this data set. The first method used PCA, an unsupervised multivariate statistical procedure, to identify m/z values that described the most variance within the data set. Using PCA, three variable sets were formed corresponding to the percentage of overall contribution of particular m/z values across the first three PCs. The second method of variable selection utilized an informed chemical approach where mass spectra of compounds from three phenethylamine subclasses, as well as a set of tryptamines, were probed to identify 58 OONH2SOH+OCH2S227 Da198 Da m/z values that represented ions characteristic of the mass spectra of that class. A comparison of the variable sets defined by both methods is shown in Figure 3.26. Figure 3.26 Venn diagram illustrating similarities and differences between m/z values selected by an informed chemical approach versus PCA Overlap occurred for several m/z values that were present in high abundance in the set of reference materials. However, the major distinction with the ions selected exclusively from the informed chemical approach was the presence of ions characteristic of the tryptamines (i.e. m/z 146 and 160). Additionally, m/z 77 was selected by the informed chemical approach to do the similarities in intensity between compounds in the same class or subclass. While the presence of m/z 77 is not inherently characteristic, the intensity of this ion within a class demonstrates characteristic property. The ions selected by PCA exclusively are present in individual compounds, rather than being diagnostic of a whole class or subclass. Consequently, ions selected only by PCA describe variance within classes rather than between classes. For example, 59 44589112113013115016519819913218019720020177146160PCAInformed Chemical Approach m/z 200 is observed in high abundance in 5,7-DCT and creates separation of this compound from the remainder of the tryptamines. 6-APB becomes separated from the other isomeric APB- phenethylamines due to the incorporation of m/z 132 selected by PCA because that particular compound contains that ion at higher intensity than the remaining compounds. Because the objective of this work is to classify new compounds by class or subclass, incorporating more characteristic ions is essential. Variable sets derived from both methods were optimized and compared in Chapter 4. 60 APPENDICES 61 APPENDIX A Relative Intensity Values of Characteristic Ions for each Class or Subclass 62 Table A.5 Relative Intensity (%) of m/z Values Identified as Characteristic of Tryptamines 4-hydroxy DMT 4-hydroxy DET 5-methoxy DMT 5-methoxy DiPT α-MT α-ET 4-Me-α-ET DPT N,N-DMT 5,7-DCT 44 1.82 58 100 77 2.44 130 3.84 131 1.38 146 6.89 160 3.16 0.73 5.22 1.21 2.33 0.79 4.93 2.76 0.26 100 1.97 3.96 1.46 0.97 9.27 0.36 0.25 0.79 2.39 1.17 1.00 8.91 38.82 0 12.39 85.20 100 3.33 58.61 9.07 74.29 100 0 0 0 0 199 0 0 0 0 0 0 0.12 41.94 2.40 6.76 2.00 12.29 0.07 0.02 0.94 0.17 2.42 11.79 1.45 0.07 0 0.24 100 5.87 11.56 1.31 0.03 0.03 0 0 0.39 0 2.03 0.83 0.34 0 0.45 99.17 Table A.6 Relative Intensity (%) of m/z Values Identified as Characteristic of APB- Phenethylamines 77 18.19 17.27 17.72 17.45 9.34 7.32 131 27.36 29.85 33.42 29.51 14.51 15.22 4-APB 5-APB 6-APB 7-APB 4-MAPB 4-EAPB 44 100 100 100 100 0.40 12.54 58 0.17 0 0 0.20 100 0.08 63 Table A.7 Relative Intensity (%) of m/z Values Identified as Characteristic of NBOMe- Phenethylamines 25B-NBOMe 25C-NBOMe 25D-NBOMe 25E-NBOMe 25G-NBOMe 25H-NBOMe 25N-NBOMe 25P-NBOMe 25T-NBOMe 3,4-DMA-NBOMe 91 24.00 24.65 24.76 25.42 23.26 26.19 28.54 23.08 23.52 31.31 121 100 100 100 100 100 100 100 100 100 100 150 63.01 55.93 39.75 41.83 47.30 48.12 51.20 46.41 36.31 0.17 64 Table A.8 Relative Intensity (%) of m/z Values Identified as Characteristic of 2C- Phenethylamines 2C-B 2C-C 2C-D 2C-E 2C-G 2C-H 2C-I 2C-N 2C-P 2C-T 77 14.46 10.14 7.80 5.81 7.94 8.84 6.30 9.38 4.76 4.09 165 0.49 0 18.24 49.76 100 0.94 0.38 0.98 49.79 1.22 198 0 0.93 0 0 0 0 0 10.33 0 100 65 APPENDIX B Low-Resolution Mass Spectra of 2C-, APB-, and NBOMe-Phenethylamines and Tryptamines Investigated 66 Figure A.1 Low-resolution spectrum of 4-Me-α-ET Figure A.2 Low-resolution spectrum of DPT 67 50100150200250300020406080100Relative Intensity (%)m/z50100150200250300020406080100Relative Intensity (%)m/z Figure A.3 Low-resolution spectrum of N,N-DMT Figure A.4 Low-resolution spectrum of 5-APB 68 50100150200250300020406080100Relative Intensity (%)m/z50100150200250020406080100Relative Intensity (%)m/z Figure A.5 Low-resolution spectrum of 6-APB Figure A.6 Low-resolution spectrum of 7-APB 69 50100150200250020406080100Relative Intensity (%)m/z50100150200250020406080100Relative Intensity (%)m/z Figure A.7 Low-resolution spectrum of 25B-NBOMe Figure A.8 Low-resolution spectrum of 25C-NBOMe 70 50100150200250300350400020406080100Relative Intensity (%)m/z50100150200250300350400020406080100Relative Intensity (%)m/z Figure A.9 Low-resolution spectrum of 25D-NBOMe Figure A.10 Low-resolution spectrum of 25E-NBOMe 71 50100150200250300350400020406080100Relative Intensity (%)m/z50100150200250300350400020406080100Relative Intensity (%)m/z Figure A.11 Low-resolution spectrum of 25G-NBOMe Figure A.12 Low-resolution spectrum of 25H-NBOMe 72 50100150200250300350400020406080100Relative Intensity (%)m/z50100150200250300350400020406080100Relative Intensity (%)m/z Figure A.13 Low-resolution spectrum of 25P-NBOMe Figure A.14 Low-resolution spectrum of 25N-NBOMe 73 50100150200250300350400020406080100Relative Intensity (%)m/z50100150200250300350400020406080100Relative Intensity (%)m/z Figure A.15 Low-resolution spectrum of 3,4-DMA-NBOMe Figure A.16 Low-resolution spectrum of 2C-B 74 50100150200250300350400020406080100Relative Intensity (%)m/z50100150200250300350020406080100Relative Intensity (%)m/z Figure A.17 Low-resolution spectrum of 2C-C Figure A.18 Low-resolution spectrum of 2C-D 75 50100150200250300350020406080100Relative Intensity (%)m/z50100150200250300350020406080100Relative Intensity (%)m/z Figure A.19 Low-resolution spectrum of 2C-H Figure A.20 Low-resolution spectrum of 2C-I 76 50100150200250300350020406080100Relative Intensity (%)m/z50100150200250300350020406080100Relative Intensity (%)m/z Figure A.21 Low-resolution spectrum of 2C-N Figure A.22 Low-resolution spectrum of 2C-P 77 50100150200250300350020406080100Relative Intensity (%)m/z50100150200250300350020406080100Relative Intensity (%)m/z IV. Linear Discriminant Analysis for the Classification of Synthetic Phenethylamines and Tryptamines The overall objective in the work was to create classification models by linear discriminant analysis (LDA) but first, the mass spectral variable set needed to be reduced. The first method used principal components analysis (PCA) as an approach to find the m/z variables that described the variance in the data set, with no group knowledge of the compounds. PCA groups the samples based on similarities and differences in the raw mass spectral data without prior knowledge of the group to which the compounds belong. Using PCA, three sets of variables were identified corresponding to varying levels of contribution to the variance; the thresholds were set at 30%, 20%, and 15% relative loadings. The second method used an informed chemical approach to identify m/z values that are characteristic of the classes or subclasses of compounds comprising the data set. Ions were selected based on common fragmentation patterns that yielded similar peaks across the spectra of a particular compound class or subclass. Using the informed chemical approach, a set of variables containing 13 ions was identified. First, the variable set from the PCA method with the best classification performance was determined for comparison with the variable set defined by the informed chemical approach. 4.1 Variable Set Selection for PCA To determine the variable set from PCA with the highest classification rate, leave-one-out cross validation (CV) was used. The CV procedure involves training the model with all compounds in the data set, removing one compound, retraining the model with the remainder of the training set, and evaluating the retrained model by reclassification of the removed compound. The CV method was performed on each variable set and the classification results obtained 78 through a table of posterior probabilities, which represent the probability that a compound belongs to a particular class or subclass. A posterior probability of 1.0 indicates the highest probability that a compound belongs to that particular group. Likewise, a probability of 0 means that there is no probability that a compound belongs to that group. If the probability value is between 0 and 1, the compound will be classified to the group for which there is the largest posterior probability. For Variable Set 1 (30% relative loadings), seven m/z values (44, 58, 121, 131, 150, 198, and 199) were identified. Using LDA with validation by CV, 30 out of 36 compounds were classified correctly, yielding an 83% classification success rate. Misclassified compounds are indicated in red in Table A.1. Of the six compounds, two were 4-MAPB and 4-EAPB, which were misclassified as tryptamines. The other four APB-phenethylamines are structural isomers with a base peak at m/z 44 and a dominant peak at m/z 131. Both 4-EAPB and 4-MAPB contain m/z 131 at approximately half the abundance of the isomeric APB-phenethylamines and contain m/z 44 in very low abundance. Conversely, 4-MAPB has a base peak at m/z 58, an ion commonly observed in tryptamines with tertiary amine chains (Figure 4.1). These major differences account for the misclassification of 4-MAPB and 4-EAPB. Additionally, 2C-D was misclassified as a tryptamine and DPT and 5-methoxy DiPT were misclassified as 2C-phenethylamines. Less separation was achieved between the 2C- phenethylamines and tryptamines in this set because of the small number of variables, accounting for the misclassifications. The table with the posterior probabilities and classifications for Variable Set 1 can be found in the Chapter 4 Appendix. 79 Figure 4.1 Mass spectra for A) 4-MAPB and B) 4-APB 80 50100150200250020406080100Relative Intensity (%)m/z58131B)A)ONH50100150200250020406080100Relative Intensity (%)m/z13144ONH Table 4.1 depicts the posterior probabilities for Variable Set 2 (20% relative loadings) where nine variables were selected (m/z 44, 58, 91, 121, 131, 150, 198, 199, and 200). With this variable set, 31 out of 36 compounds were correctly classified when CV was performed, yielding an 86% classification success rate. Misclassified compounds are indicated in red in Table 4.1. Both 4-EAPB and 4-MAPB were misclassified as tryptamines, similarly to the results found when using Variable Set 1. Two 2C-phenethylamines (2C-C and 2C-I) were also misclassified as tryptamines. Variable Set 2 contains few ions characteristic of 2C-phenethylamines. The two misclassified compounds do not contain any of the m/z values in this variable set at high abundance, resulting in misclassification. The classification as tryptamines is attributed to the lack of high abundance ions that are observed in NBOMe- and APB-phenethylamines, such as m/z 121 or 131. One tryptamine, 5,7-DCT, was misclassified with a posterior probability output of NaN ('not a number'), which indicates that the posterior probabilities were too small for each class to calculate a number. When the same CV method was performed numerous times, the resultant classification was different each time for 5,7-DCT, leading to an unreliable class assignment. 81 Table 4.1 Posterior Probabilities for CV with 20% Relative Loadings Threshold NBOMe Tryptamine 4-APB 5-APB 6-APB 7-APB 4-EAPB 4-MAPB 25B-NBOMe 25C-NBOMe 25D-NBOMe 25E-NBOMe 25G-NBOMe 25H-NBOMe 25N-NBOMe 25P-NBOMe 25T-NBOMe 3,4-DMA- NBOMe 2C-B 2C-C 2C-D 2C-E 2C-G 2C-H 2C-I 2C-N 2C-P 2C-T α-MT α-ET DPT N,N-DMT 4-OH DMT 4-OH DET 5-methoxy DMT 5-methoxy DiPT 4-Me-α-ET 5,7-DCT 2C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.0 APB 1.0 1.0 1.0 1.0 0 0 0 0 0 0 0 0 0 0 0 0 0 2.67 x 10-2 2.67 x 10-3 1.0 1.0 1.0 1.0 0 0 0 0 0.234 1.65 x 10-3 1.0 1.0 1.0 1.15 x 10-7 0 6.25 x 10-5 6.72 x 10-9 1.53 x 10-7 1.09 x 10-4 0 0 0 0.438 0 3.50 x 10-3 2.85 x 10-2 4.25 x 10-2 3.45 x 10-3 3.51 x 10-9 2.86 x 10-2 4.82 x 10-5 3.15 x 10-3 5.91 x 10-5 4.24 x 10-3 0 0 0 0 0 0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.0 1.0 0 0 0 0 0 0 0 0 0 0 0 Class APB APB APB APB Tryptamine* Tryptamine* NBOMe NBOMe NBOMe NBOMe NBOMe NBOMe NBOMe NBOMe NBOMe NBOMe 2C 0.971 Tryptamine* 0 0 0 0 2C 2C 2C 2C 0.765 Tryptamine* 0 0 0 0.562 1.0 0.996 0.971 0.957 0.996 0.971 0.997 0.996 NaN 2C 2C 2C Tryptamine Tryptamine Tryptamine Tryptamine Tryptamine Tryptamine Tryptamine Tryptamine Tryptamine Unreliable** NaN NaN NaN *Entries in red indicate misclassified compounds **'NaN' indicates that the posterior probabilities were too small for class assignment 82 Variable Set 3 (15% relative loadings) included 15 m/z values (44, 58, 91, 121, 130, 131, 132, 150, 165, 180, 197, 198, 199, 200, and 201). This variable set yielded an 83% classification success rate, with 30 of the 36 compounds correctly classified in CV. Misclassified compounds are indicated in red in Table A.3. With Variable Set 3, 4-MAPB was still misclassified as a tryptamine but 6-APB was also misclassified as a tryptamine. The third variable set incorporated m/z 132 which is found at a higher intensity in 6-APB than the remaining isomeric APB- phenethylamines, leading to the misclassification. The inclusion of m/z 132 created less separation for the APB-phenethylamines from both tryptamines and 2C-phenethylamines. As a result, 2C-N and α-ET were misclassified as APB-phenethylamines. Similar to Variable Set 2, 2C-C was misclassified as a tryptamine using Variable Set 3. The table with the posterior probabilities and classifications for Variable Set 3 can be found in the Chapter 4 Appendix. Based on the performance of each of these variable sets during CV, the set with the threshold set at 20% relative loadings was selected to move forward in the analysis due to the lower number of misclassifications. 4.2 LDA using Selected Variable Set from PCA Following selection of the best performing variable set from PCA, the compounds were split into a training and test set. The compounds in the training set were randomly selected via the black box method to contain seven out of the ten compounds in each of the tryptamine and 2C-phenethylamine groups, respectively. For the APB-phenethylamines, five of the six compounds were selected by the black box method to comprise the training set. For the NBOMe- phenethylamines, only six compounds were analyzed on the same day and instrument as the other compound classes in the training set, so all six were included as training set compounds. Four unique NBOMe-phenethylamines that were analyzed on a different day and instrument 83 were selected for use in the test set. A majority of the compounds were also analyzed on a different day. These spectra were placed into the test set to make the model more robust. A table of the training and test set compounds can be found in Chapter 2. LDA was performed on the training set to define the model before the introduction of the test set. Each test set compound was classified to the compound class to which the distance of the sample and group centroid was shortest. The scores plots are shown in Figure 4.2. Figure 4.2 LDA scores plot showing A) LD1 vs. LD2 and B) LD1 vs. LD3 using PCA as the variable selection method, where the boxes to the right indicate zoomed-in regions 84 0.52-40040LD3 (0.84%)LD1 (97.84%)APB Training SetAPB Test SetNBOMe Training SetNBOMe Test SetTryptamine Training SetTryptamine Test Set2C Training Set2C Test SetCentroids-0.7-20LD2 (1.32 %)LD1 (97.84%)APB Training SetAPB Test SetNBOMe Training SetNBOMe Test SetTryptamine Training SetTryptamine Test Set2C Training Set2C Test SetCentroids-505-40040LD2 (1.32 %)LD1 (97.84%)APB Training SetAPB Test SetNBOMe Training SetNBOMe Test SetTryptamine Training SetTryptamine Test Set2C Training Set2C Test SetCentroids-505-40040LD3 (0.84%)LD1 (97.84%)APB Training SetAPB Test SetNBOMe Training SetNBOMe Test SetTryptamine Training SetTryptamine Test Set2C Training Set2C Test SetCentroidsA)B)α-MT4-MAPB2C-Cα-MT2C-C4-EAPB4-MAPB4-EAPB The LDA scores plots (Figure 4.2) show four distinct groups corresponding to the four classes or subclasses of compounds, with a few exceptions. The position of individual compounds can be explained using a plot of the coefficients of linear discriminants for each m/z value (Figure 4.3). The coefficient of the linear discriminant is the value associated with the weighting of each m/z variable, analogous to PC loadings. A higher magnitude coefficient means that the variable contributes more to the linear discriminant (LD) function. Compounds containing that particular m/z value greater than the average will score more positively or negatively on that particular LD. The highest positive contribution to LD1 is at m/z 121, followed by m/z 91, both shown in dark blue on Figure 4.3. These are both ions characteristic of the NBOMe-phenethylamines which explains the highly positive positioning of this class on LD1. The three remaining groups score similarly on LD1 due to the lower intensity of these two ions, as the relative intensity data are mean centered. With mean centering, if a compound contains a peak at an intensity lower than the group average, the intensity value becomes negative for that compound. On LD2, m/z 44 contributes negatively. While this ion is the base peak for the isomeric APB-phenethylamines, 4-EAPB contains this peak at a much lower intensity. The isomeric APB- phenethylamines positioned more negatively on LD2 than the other three groups as well as 4-EAPB. The isomeric APB-phenethylamines also position negatively on LD3 due to the base peak at m/z 44 for these four compounds and the high negative contribution of this ion to LD3. 85 Figure 4.3 Coefficients of linear discriminants for the nine variables selected using PCA Less separation is observed between the tryptamines and 2C-phenethylamines on LD1 and LD2 due to the lack of representative ions for these two groups. A number of the 2C- phenethylamines are positioned more positively on LD1 and LD2 due to the presence of m/z 199 which contributes positively on these two LDs. However, the close positioning of tryptamines and 2C-phenethylamines is mostly due to the lower intensity of ions characteristic of APB- and NBOMe-phenethylamines in these compounds. With the test set introduced, four compounds are misclassified: α-MT, 4-MAPB, 2C-C, and 4-EAPB analyzed on a different day than the training set equivalent (Table 4.2). The first of these compounds, α-MT, is misclassified as an APB-phenethylamine because of the negative positioning on LD2 than the remainder of the tryptamine class. This scoring can be attributed to the presence of m/z 44 in high abundance as well as the base peak at m/z 131 in the spectrum for α-MT. Both of these ions are present in high abundance in the spectra for APB-phenethylamines 86 -0.5-0.4-0.3-0.2-0.100.10.20.30.40.5445891121131150198199200Coefficients of Linear Discriminantsm/zLD1LD2LD3 as well. The remaining three misclassified compounds, 4-EAPB, 4-MAPB, and 2C-C, are classified as tryptamines. 4-MAPB and 4-EAPB lack the base peak at m/z 44 that the other isomeric APB-phenethylamines exhibit. For this reason, the two compounds are positioned less negatively on LD2 and LD3 closer to the centroid of the tryptamine class. Finally, 2C-C is misclassified as a tryptamine due to the lack of several of the ions present in the 2C-phenethylamines, such as m/z 199 and 200. Of the nine variables selected using PCA, the highest intensity ion for 2C-C is m/z 91 at a relative intensity of 4.9%. Because of the underrepresentation of ions characteristic of this compound and the lack of intensity for the two ions that are observed in the 2C subclass, 2C-C scores closer to the tryptamine class. APB- and NBOMe-phenethylamines have more characteristic ions observed across the entirety of the subclass which creates more distinct grouping, explaining why 2C-phenethylamines and tryptamines are less well-separated. Using PCA as the variable selection method, 26 out of 30 total compounds in the test set were correctly classified, yielding an 86.6% classification success rate. Several of the misclassifications can be attributed to the lack of a low mass ion observed across a class/subclass or the underrepresentation of ions characteristic of a particular class. PCA is an unsupervised approach with no group knowledge, thus the variables were selected based on the presence of high intensity peaks. However, high intensity peaks in mass spectra are often low mass ions that are not characteristic of a group of compounds. As a result, lower intensity and higher molecular weight ions that are characteristic become masked and consequently are not selected using the PCA approach. For this reason, a second method of variable selection was utilized where mass spectra were probed for common ions that also correspond to characteristic fragments. 87 4.3 LDA using Variable Set from the Informed Chemical Approach A second LDA model was defined using the same training set as in section 4.2 but with a new variable set that was derived using the informed chemical approach discussed in Chapter 3. This variable set included 13 m/z values (m/z 44, 58, 77, 91, 121, 130, 131, 146, 150, 160, 165, 198, and 199) that were determined to be representative of different classes or subclasses of compounds in the data set. The scores plot for the LDA model using the informed chemical approach is shown in Figure 4.4. 88 Figure 4.4 LDA scores plot showing A) LD1 vs. LD2 and B) LD1 vs. LD3 using the informed chemical approach as the variable selection method 89 -15015-40040LD2 (10.99%)LD1 (86.96%)APB Training SetAPB Test SetNBOMe Training SetNBOMe Test SetTryptamine Training SetTryptamine Test Set2C Training Set2C Test SetCentroids-20020-40040LD3 (2.05%)LD1 (86.96%)APB Training SetAPB Test SetNBOMe Training SetNBOMe Test Set2C Training Set2C Test SetTryptamine Training SetTryptamine Test SetCentroidsA)B)α-ETα-ETN,N-DMTN,N-DMT Similar to the previous model, the NBOMe-phenethylamines create a distinct group on the positive LD1, while the other three groups position similarly on LD1. The tryptamines in the training set are separated from the 2C- and APB-phenethylamines on LD2, while the 2C- and APB-phenethylamines are separated from each other on LD3. The plot displaying the coefficients of linear discriminants is shown in Figure 4.5. Figure 4.5 Coefficients of linear discriminants for the 13 variables selected using the informed chemical approach The two highest contributors to the first linear discriminant function are m/z 91 and 121, which both have a positive coefficient. The positive contribution explains the positioning of the NBOMe-phenethylamines, as these are both characteristic ions of this class, with m/z 121 representing the base peak for all NBOMe compounds analyzed. The remaining three groups score similarly on LD1 because they do not contain these peaks at substantial abundances and there are few additional ions contributing to this LD. 90 -1.2-1-0.8-0.6-0.4-0.200.20.40.644587791121130131146150160165198199Coefficients of Linear Discriminantsm/zLD1LD2LD3 The tryptamines position negatively on LD2, distinguishing these compounds from the 2C- and APB-phenethylamines. The three ions with significant negative contributions to LD2 are m/z 130, 146, and 160. The first of these, m/z 130, is present in tryptamine compounds with no aromatic substitution, while m/z 146 and 160 are present in tryptamine compounds with hydroxy or methoxy substitutions, respectively, on the aromatic ring. While the 2C- and APB-phenethylamines position similarly on LD1 and LD2, separation is achieved on LD3. The two highest abundance peaks in the APB-phenethylamines are m/z 44 and 131. These ions both contribute negatively to LD3 and are largely responsible for the separation of 2C- and APB-phenethylamines. When introducing the test set into the LDA model, 28 compounds out of 30 were correctly classified, yielding a 93.3% classification success rate. The first of two compounds that was misclassified was α-ET analyzed on a different day and instrument, which was misclassified as an APB-phenethylamine due to the high abundance of m/z 131, an ion that is prevalent in the APB-phenethylamines. Because of the presence of this ion, α-ET positioned more negatively on LD3 than the rest of the tryptamines and had a positive score on LD2. The second compound that was misclassified was N,N-DMT, which was classified as a 2C-phenethylamine. N,N-DMT is a non-aromatically substituted tryptamine. Therefore, it does not contain peaks at m/z 146 or 160, which are indicative of a hydroxy or methoxy substituent on the aromatic ring. Because a large percentage of the training set for the tryptamines contained hydroxy or methoxy substituted compounds, these two ions became important contributors towards classifying compounds in this group. As N,N-DMT does not contain m/z 146 and 160, this compound was positioned less negatively on LD2 and was classified as a 2C- phenethylamine. This misclassification could be corrected if the training set was expanded such 91 that the training set included more non-aromatically substituted tryptamines that contain m/z 130 at high abundance. 4.4 Comparison of Variable Selection Methods Two LDA classification models were developed using the same training set but with different variables selected by different methods. A common test set was used to determine the performance of each model, with classification from both LDA models summarized in Table 4.2. The model defined by the m/z values selected by PCA resulted in an 86.6% classification success rate, compared to 93.3% when using m/z values derived from an informed chemical approach. The compounds misclassified by each model are shown in the third column of Table 4.2, followed by the class to which LDA classified each compound. Finally, the correct class is given in the final column. Table 4.2 Summary of LDA Classification Method of Variable Selection Classification Success Rate Principal Components Analysis (PCA) 86.6% Informed Chemical Approach by Probing Mass Spectra 93.3% Compounds Misclassified α-MT Classification Correct by LDA APB Classification Tryptamine 4-EAPB Tryptamine APB 4-MAPB Tryptamine APB 2C-C α-ET Tryptamine 2C APB Tryptamine N,N-DMT 2C Tryptamine 92 4.5 Summary In Chapter 3, two different methods of variable selection were discussed. The first method, PCA, used an objective statistical approach based solely on the natural variance in the data set. The second method used a more informed approach where trends in low-resolution mass spectra for particular compound classes were identified. High-resolution mass spectra were then used to confirm the structure of the ions identified. The first method, PCA, yielded three variable sets corresponding to different set thresholds of relative loadings. LDA cross validation was performed using the full compound set for each of the three variable sets. It was determined that the set with the highest classification success included all m/z values over 20% relative loadings. The compounds were then randomly split into a training set and a test set and LDA models were defined using the variable set derived from PCA and the set comprised of m/z values selected in the informed approach. The model using the PCA variables performed at an 86.6% classification success rate while the model using the informed approach yielded a 93.3% success rate. With PCA, higher intensity ions dominate. However, these ions are most often low mass fragments that are common across several compound groups and, therefore, are not sufficiently characteristic of each compound class. Incorporating lower intensity ions into the variable set that are characteristic of the classes of compounds but are not selected by PCA improves the classification when LDA is performed. 93 APPENDIX 94 Table A.9 Posterior Probabilities for CV with 30% Relative Loadings Threshold NBOMe Tryptamine 0 0 0 0 0.907 1.0 0 0 0 0 0 0 0 0 0 0 4.36 x 10-2 0.466 0.548 3.88 x 10-2 0 0 7.96 x 10-2 0 1.27 x 10-3 9.36 x 10-2 0.559 1.0 0.482 0.972 0.959 0.627 0.972 0.473 0.981 NaN Class APB APB APB APB Tryptamine* Tryptamine* NBOMe NBOMe NBOMe NBOMe NBOMe NBOMe NBOMe NBOMe NBOMe NBOMe 2C 2C Tryptamine* 2C 2C 2C 2C 2C 2C 2C Tryptamine Tryptamine 2C* Tryptamine Tryptamine Tryptamine Tryptamine 2C* Tryptamine Unreliable** 4-APB 5-APB 6-APB 7-APB 4-EAPB 4-MAPB 25B-NBOMe 25C-NBOMe 25D-NBOMe 25E-NBOMe 25G-NBOMe 25H-NBOMe 25N-NBOMe 25P-NBOMe 25T-NBOMe 3,4-DMA- NBOMe 2C-B 2C-C 2C-D 2C-E 2C-G 2C-H 2C-I 2C-N 2C-P 2C-T α-MT α-ET DPT N,N-DMT 4-OH DMT 4-OH DET 5-methoxy DMT 5-methoxy DiPT 4-Me-α-ET 5,7-DCT 2C 0 0 0 0 APB 1.0 1.0 1.0 1.0 9.32 x 10-2 1.98 x 10-7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.956 0.533 0.450 0.961 1.0 1.0 0.920 1.0 0.999 0.906 9.03 x 10-4 0 0.516 1.83 x 10-4 1.69 x 10-4 0.371 1.61 x 10-4 1.41 x 10-3 1.33 x 10-3 8.21 x 10-5 0 0 1.75 x 10-4 0 2.41 x 10-6 2.07 x 10-4 0.440 0 1.71 x 10-3 2.83 x 10-2 4.09 x 10-2 2.23 x 10-3 1.78 x 10-4 2.80 x 10-2 0.525 1.51 x 10-3 1.54 x 10-2 3.75 x 10-3 0 0 0 0 0 0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 NaN NaN NaN *Entries in red indicate misclassified compounds **'NaN' indicates that the posterior probabilities were too small for class assignment 95 Table A.10 Posterior Probabilities for CV with 15% Relative Loadings Threshold NBOMe Tryptamine 0 0 1.0 0 1.09 x 10-2 Class APB APB Tryptamine* APB APB 4-APB 5-APB 6-APB 7-APB 4-EAPB 4-MAPB 25B-NBOMe 25C-NBOMe 25D-NBOMe 25E-NBOMe 25G-NBOMe 25H-NBOMe 25N-NBOMe 25P-NBOMe 25T-NBOMe 3,4-DMA- NBOMe 2C-B 2C-C 2C-D 2C-E 2C-G 2C-H 2C-I 2C-N 2C-P 2C-T α-MT α-ET DPT N,N-DMT 4-OH DMT 4-OH DET 5-methoxy DMT 5-methoxy DiPT 4-Me-α-ET 5,7-DCT 2C 0 0 0 0 1.22 x 10-2 3.46 x 10-5 APB 1.0 1.0 0 1.0 0.977 4.60 x 10-3 0 0 0 0 0 0 0 0 0 0 0.898 0.298 1.0 1.0 1.0 1.0 0.871 NaN 1.0 1.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6.55 x 10-4 3.32 x 10-7 0 0 0 0 3.79 x 10-8 NaN 0 0 0 1.0 0 0 0 0 0 0 0 0 0 0 0 0 0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0 0 0 0 0 0 0 NaN 0 0 0 0 0 0 0 0 0 0 0 0.995 Tryptamine* 0 0 0 0 0 0 0 0 0 0 NBOMe NBOMe NBOMe NBOMe NBOMe NBOMe NBOMe NBOMe NBOMe NBOMe 0.101 0.702 2C Tryptamine* 0 0 0 0 0.129 NaN 0 0 1.0 0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 NaN 2C 2C 2C 2C 2C APB* 2C 2C Tryptamine APB* Tryptamine Tryptamine Tryptamine Tryptamine Tryptamine Tryptamine Tryptamine Unreliable** NaN NaN NaN *Entries in red indicate misclassified compounds **'NaN' indicates that the posterior probabilities were too small for class assignment 96 V. Conclusions and Future Work 5.1 Conclusions The overall objective in this work was to create a statistical model to classify newly emerging synthetic designer drugs into a class or structural subclass. Due to their structural similarities, tryptamines and phenethylamines were selected to develop the model using linear discriminant analysis (LDA). Variable selection was accomplished using two different methods to consider the advantages and disadvantages of each. Principal components analysis (PCA) was selected as an objective approach for variable selection to eliminate bias in the selection process. However, PCA lacks group knowledge and therefore selects variables based solely on the raw data and the contribution of each variable to the natural variance within the data set. When using mass spectral data, characteristic ions can be masked by less significant but dominant ions, leading to a less specific model. The second method used a more informed approach of identifying characteristic ions from the mass spectra based on known structural information. While this method allows for the identification of ions that are more characteristic of specific classes or subclasses, there exists the risk for bias and overtraining the model. Ultimately, the latter method of hand-selecting the variables was more successful and a LDA model was established. This model performed at a 93.3% classification success rate, misclassifying only two of the thirty compounds comprising the test set. However, misclassifications could be explained and possibly overcome by a more robust and representative data set. The forensic implications of this work lie in the ability to classify, not identify, an unknown compound. Samples submitted to a forensic laboratory are commonly analyzed by GC- MS using standard operating procedures. If the sample is not consistent with any available 97 reference materials, the LDA model developed in this work could be used to determine a possible designer drug class. Rather than a strict identification that occurs with direct comparison to the spectrum of a reference material, the LDA model would determine the likely class or subclass to which the submitted sample belongs. With this information, the laboratory would have a clearer plan moving forward, whether that includes obtaining a set of reference materials that are consistent with the preliminary results or finding literature spectra for various designer drugs for comparison purposes. In short, the model developed in this thesis would neither be a first nor a final step, but would provide direction in cases where the identity or class of the questioned sample is completely unknown. 5.2 Future Work The classification model developed in this work included ten compounds in the tryptamine class and 2C-phenethylamine subclass as well as eleven NBOMe-phenethylamines. Only six compounds represented the APB-phenethylamines in the development of the LDA models. To create a more robust model, more samples should be acquired that represent a wider array of synthetic phenethylamines and tryptamines, accounting for different subclasses and substituents. The two compounds misclassified by LDA were α-ET and N,N-DMT, which are both non-aromatically substituted tryptamines. The misclassification of these tryptamines could be attributed to the small number of non-aromatically substituted compounds in the training set. With the addition of more tryptamines, this particular issue in the model could be resolved. Additionally, the model was only trained with two synthetic designer drug classes. More classes should be incorporated leading to one model that could classify any possible designer drug that is submitted to the laboratory. 98 For the work presented here, reference materials were analyzed and used to develop the classification model. The compounds were acquired as pure samples and were analyzed individually. Future work should explore the analysis of mixtures and impure samples where concentration becomes a concern. Drug samples are very rarely submitted without some form of contamination and are often submitted as mixtures with other controlled substances. To prove applicability, the procedure developed in this work needs to be validated in cases where the concentration of controlled substance is relatively small. Finally, the introduction of true unknowns in the test set could be accomplished through the analysis of street samples. For the work presented in this thesis, the 'questioned samples' used as test set compounds were also reference standards. The validity of the model could be proved with the use of true street samples as an additional test set. While additional steps need to be taken to enhance and validate the LDA model, it is evident that the ability to classify newly emerging synthetic designer drugs has been proven by the methods developed in this thesis. Previously unidentifiable samples with no available reference material can now be classified by structural group, allowing for additional steps toward identification. 99