COMPARISON OF MULTIVARIATE STATISTICAL MODELS FOR CLASSIFICATION OF FENTANYL ANALOGS By Amber Gerheart A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Forensic Science – Master of Science 2020 COMPARISON OF MULTIVARIATE STATISTICAL MODELS FOR CLASSIFICATION OF ABSTRACT FENTANYL ANALOGS By Amber Gerheart Novel psychoactive substances (NPS) have been a challenge in forensic laboratories in the United States. Typical analysis of controlled substances is by gas chromatography-mass spectrometry (GC-MS), in which the GC retention time and mass spectrum are compared to a reference standard to make an identification. With the emergence of NPS compounds, reference standards for new compounds may not be readily available. Multivariate statistical methods have been investigated to classify NPS compounds. This work explored linear discriminant analysis (LDA) and soft independent modelling of class analogies (SIMCA) as methods to classify fentanyl analogs according to structural subclass. Four fentanyl subclasses were investigated and were categorized by the location of the substituent on the core fentanyl structure. Three factors were investigated to improve the robustness of the LDA and SIMCA models: variation within a chromatographic peak, instrument variation, and the application of neutral loss data. Overall, the LDA models performed with a 100% successful classification rate for mass spectral data and a 100% successful classification rate for neutral loss data. The SIMCA models performed with a 91% successful classification rate for mass spectral data and an 87% successful classification rate for neutral loss data. Both models were compared to highlight benefits and limitations to each classification method. This work supports the application of multivariate statistical models in forensic laboratories to obtain structural information when reference materials are not available. ACKNOWLEDGEMENTS I would first like to acknowledge my advisor, Dr. Ruth Smith, for guiding me through graduate school. I appreciate all of the advice and knowledge I have gained from working under her the last two years. I am confident I have been prepared for a life-long career in forensic science credit to the information she has provided during my time at Michigan State University. I would like to acknowledge Michigan State University College of Social Science Faculty Initiatives Fund and the Michigan State Forensic Science Program for funding for this work, as well as funding to present this research. Additionally, I would like to acknowledge the MSU RTSF Mass Spectrometry & Metabolomics Core for instrument access and Hannah Clause for assistance in sample collection. Supplemental data for this work were provided by Amanda Setser, Emma Stuhmer, and Kimberly Venuk. I would like to acknowledge Sergey Kucheryavskiy for assistance with data analysis. I would also like to acknowledge my thesis committee: Dr. Ruth Smith, Dr. Victoria McGuffin, Dr. Charles Corley, and Kimberly Venuk, for helping me achieve one of my greatest goals. I would also like to thank my family, friends, and other forensic science colleagues: Otyllia Abraham, Rebecca Boyea, and Briana Capistran. Above all, I want to thank my mom. She has been my constant support and I can never express how grateful I am to have a mom/friend/therapist/advocate/idol like her. I have truly never understood the meaning of this quote more than I do now: “If you aren’t in over your head, how do you know how tall you are?” – T.S. Eliot iii TABLE OF CONTENTS LIST OF TABLES ........................................................................................................................ vii LIST OF FIGURES ...................................................................................................................... iix 1. INTRODUCTION ...................................................................................................................... 1 1.1 FENTANYL ANALOGS ..................................................................................................... 1 1.2 FORENSIC ANALYSIS OF SEIZED DRUGS ................................................................... 2 1.2.1 SWGDRUG Recommendations for Analysis ............................................................... 2 1.2.2 Gas Chromatography-Mass Spectrometry .................................................................... 3 1.2.3 Current Challenges in Seized Drug Analysis................................................................ 4 1.3 ADDRESSING NPS IDENTIFICATION CHALLENGES IN FORENSIC LABORATORIES ................................................................................................................... 4 1.3.1 Instrumental Methods for NPS Identification and Differentiation ............................... 4 1.3.2 Multivariate Statistical Methods for NPS Identification and Differentiation ............... 6 1.3.2.1 Principal Components Analysis .............................................................................. 8 1.3.2.2 Linear Discriminant Analysis ............................................................................... 11 1.3.2.3 Soft Independent Modelling of Class Analogies .................................................. 14 1.3.3 Neutral Losses as an Alternative to Mass Spectra ................................................... 18 1.4 RESEARCH OBJECTIVE ................................................................................................. 20 REFERENCES ............................................................................................................................ 21 2. MATERIALS AND METHODS .............................................................................................. 25 2.1 FENTANYL ANALOG REFERENCE MATERIALS ..................................................... 25 2.2 GAS CHROMATOGRAPHY-MASS SPECTROMETRY (GC-MS) ANALYSIS .......... 26 2.3 DATA ANALYSIS ............................................................................................................ 27 2.3.1 Neutral Loss Spectra Development ............................................................................ 27 2.5 STATISTICAL MODELLING .......................................................................................... 28 2.5.1 Principal Components Analysis (PCA) ...................................................................... 31 2.5.2 Linear Discriminant Analysis (LDA) ......................................................................... 32 2.5.3 Soft Independent Modelling of Class Analogies (SIMCA) ........................................ 32 APPENDIX ................................................................................................................................. 34 REFERENCES ............................................................................................................................ 53 iv 3. LINEAR DISCRIMINANT ANALYSIS (LDA) FOR CLASSIFICATION OF FENTANYL ANALOGS ACCORDING TO STRUCTURAL SUBCLASS .................................................... 55 3.1 MASS SPECTRAL ANALYSIS OF FENTANYL ANALOGS ....................................... 55 3.2 INITIAL LINEAR DISCRIMINANT ANALYSIS (LDA) MODELS TO ASSESS VARIATION WITHIN A CHROMATOGRAPHIC PEAK ................................................ 59 3.2.1 Principal Components Analysis (PCA) for Variable Selection .................................. 60 3.2.2 Linear Discriminant Analysis (LDA) Models ............................................................ 69 3.3 REFINED LINEAR DISCRIMINANT ANALYSIS (LDA) MODEL TO INCORPORATE INSTRUMENT VARIATION ................................................................. 77 3.3.1 Refined LDA Model for Classification of Fentanyl Analogs ..................................... 78 3.3.2 Additional Test Sets to Validate the Linear Discriminant Analysis (LDA) Model .... 84 3.4 APPLICATION OF NEUTRAL LOSS SPECTRA TO REFINE THE LINEAR DISCRIMINANT ANALYSIS (LDA) MODEL .................................................................. 90 3.4.1 Neutral Loss Spectra of Fentanyl Analogs ................................................................. 90 3.4.2 Application of Linear Discriminant Analysis (LDA) to Neutral Loss Spectra for Classification of Fentanyl Analogs ............................................................................. 94 3.5 SUMMARY OF LINEAR DISCRIMINANT ANALYSIS (LDA) MODELS ............... 106 APPENDIX ............................................................................................................................... 108 REFERENCES .......................................................................................................................... 129 4. SOFT-INDEPENDENT MODELLING OF CLASS ANALOGIES (SIMCA) FOR CLASSIFICATION OF FENTANYL ANALOGS ACCORDING TO STRUCTURAL SUBCLASS ................................................................................................................................ 132 4.1 INITIAL SOFT INDEPENDENT MODELLING OF CLASS ANALOGIES (SIMCA) MODELS TO ASSESS VARIATION WITHIN A CHROMATOGRAPHIC PEAK ....... 132 4.2 REFINED SOFT INDEPENDENT MODELLING OF CLASS ANALOGIES (SIMCA) MODEL TO INCORPORATE INSTRUMENT VARIATION ......................................... 141 4.2.1 Additional Test Sets to Validate the Classification Models ..................................... 147 4.3 APPLICATION OF NEUTRAL LOSS SPECTRA FOR CLASSIFICATION OF FENTANYL ANALOGS .................................................................................................... 147 4.4 SUMMARY OF SOFT INDEPENDENT MODELLING OF CLASS ANALOGIES (SIMCA) MODELS ............................................................................................................ 153 APPENDIX ............................................................................................................................... 155 REFERENCES .......................................................................................................................... 164 5. CONCLUSIONS AND FUTURE WORK ............................................................................. 166 v 5.1 CONCLUSIONS .............................................................................................................. 166 5.2 FUTURE WORK ............................................................................................................. 168 REFERENCES .......................................................................................................................... 171 vi LIST OF TABLES Table 2.1 Fentanyl analogs used in this work, separated by structural subclass .......................... 26 Table 2.2 Training set for the initial models (all analog spectra in n = 2) ................................... 29 Table 2.3 Test set for the initial models, replicate spectra indicated ........................................... 30 Table 2.4 Training set for the refined models and neutral loss models (all analog spectra in ..... 31 Table 2.5 Test set for the refined models and neutral loss models (all analog spectra in n = 4) . 31 Table A2.1 PCA R Code4 ............................................................................................................. 49 Table A2.2 LDA R Code4 ............................................................................................................ 50 Table A2.3 SIMCA R Code ......................................................................................................... 51 Table 3.1 Variables retained for LDA based on a relative loadings threshold of 2% .................. 68 Table 3.2 Variables retained for the refined LDA model, as determined by PCA ...................... 81 Table 3.3 List of non-fentanyl NPS compounds in the external test set ..................................... 84 Table 3.4 Variables retained for neutral loss LDA model, as determined by the 3.5% threshold of the PCA data ............................................................................................................................... 100 Table A3.1 Chemical names of non-fentanyl NPS compounds ................................................. 122 Table 4.1 Conditions for each subclass in SIMCA for both apex and average models ............. 133 Table 4.2 Variables contributing most to the AG subclass ........................................................ 139 Table 4.3 Conditions for each subclass in refined SIMCA model ............................................. 141 Table 4.4 Comparison of variables contributing most the initial and refined AG subclass SIMCA models ......................................................................................................................................... 144 Table 4.5 Conditions for each subclass in the neutral loss SIMCA model ................................ 148 Table A4.1 Variables contributing most to the AA subclass in the apex initial SIMCA model 156 Table A4.2 Variables contributing most to the AA subclass in the apex initial SIMCA model 158 vii Table A4.3 Variables contributing most to the AN subclass in the apex initial SIMCA model 159 viii LIST OF FIGURES Figure 1.1 Core structure of fentanyl with substitution sites indicated ......................................... 1 Figure 1.2 Intensity plot of m/z 189 versus m/z 146 ...................................................................... 9 Figure 1.3 Example of a PCA scores plot of PC1 vs PC2 ........................................................... 10 Figure 1.4 Example of a loadings plot ......................................................................................... 11 Figure 1.5 Example LDA scores plot of LD1 vs LD2 ................................................................. 13 Figure 1.6 Example of A) a SIMCA class with only one PC retained, and B) residuals plot for the class ......................................................................................................................................... 16 Figure 1.7 Example Cooman’s plot from a SIMCA model ......................................................... 18 Figure A2.1 Structures of all fentanyl analogs used in this work ................................................ 44 Figure 3.1 Initial cleavage sites of fentanyl analogs A) cleavage of the amide group, B) cleavage on the piperidine ring, C) cleavage of the n-alkyl chain ............................................................... 56 Figure 3.2 Mass spectra and chemical structures of selected fentanyl analogs A) thiofentanyl representing the AN subclass, B) ortho-methylfentanyl representing the AR subclass, C) cyclopropyl fentanyl representing the AG subclass, and D) para-fluorobutyrylfentanyl representing the AA subclass. ....................................................................................................... 58 Figure 3.3 PCA scores plots of A) principal component 1 (PC1) vs principal component 2 (PC2), B) PC1 vs principal component 3 (PC3), and C) PC1 vs principal component 4 (PC4) .. 61 Figure 3.4 Loadings plot for A) principal component 1 (PC1), B) principal component 2 (PC2), and C) principal component 4 (PC4) ............................................................................................ 64 Figure 3.5 Predicted structure of fragment ion at m/z 207 ........................................................... 67 Figure 3.6 Scores plots for the apex data A) linear discriminant 1 (LD1) vs linear discriminant 2 (LD2), B) LD1 vs linear discriminant 3 (LD3), and scores plots for the average data C) LD1 vs LD2, D) LD1 vs LD3 .................................................................................................................... 70 Figure 3.7 Coefficients of A) linear discriminant 1 (LD1), B) linear discriminant 2 (LD2), and C) linear discriminant 3 (LD3) ..................................................................................................... 72 Figure 3.8 Predicted fragments of A) para-fluorofentanyl from the AR subclass and B) para- fluorobutyryl fentanyl from the AA subclass ............................................................................... 76 ix Figure 3.9 Principal components analysis scores plot of A) PC1 vs PC2, B) PC1 vs PC3, C) PC1 vs PC4 ........................................................................................................................................... 79 Figure 3.10 Scores plot for the refined LDA model A) LD1 vs LD2, B) LD1 vs LD3 ............... 83 Figure 3.11 Scores plot for the refined LDA model A) LD1 vs LD2, B) enlarged LD1 vs LD2, C) LD1 vs LD3, D) enlarged LD1 vs LD3 ................................................................................... 86 Figure 3.12 Representative spectrum of A) 2-EMC and B) 2-FMA ........................................... 87 Figure 3.13 Structures and spectra of case samples for A) carfentanil, B) methoxy acetyl fentanyl, C) furanyl fentanyl, D) valeryl fentanyl, E) acetyl fentanyl, F) 3’-methylfentanyl ...... 88 Figure 3.14 Mass spectrum with common neutral losses highlighted for A) ortho-methylfentanyl and B) para-methoxy fentanyl ...................................................................................................... 92 Figure 3.15 Neutral loss spectra and chemical structures of selected fentanyls A) thiofentanyl representing the AN subclass, B) ortho-methylfentanyl representing the AR subclass, C) cyclopropyl fentanyl representing the AG subclass, and D) para-fluorobutyrylfentanyl representing the AA subclass. ....................................................................................................... 93 Figure 3.16 PCA scores plot for neutral loss LDA model A) PC1 vs PC2, B) PC1 vs PC3, C) PC1 vs PC4 ................................................................................................................................... 95 Figure 3.17 Neutral loss PCA loadings plots for A) PC1, B) PC2, C) PC3, and D) PC4 ........... 98 Figure 3.18 Scores plot for neutral loss LDA model A) LD1 vs LD2 and B) LD1 vs LD3 ...... 101 Figure 3.19 Coefficients for neutral loss LDA model A) LD1, B) LD2, and C) LD3 .............. 104 Figure A3.1 Mass spectra of all fentanyl analogs ...................................................................... 109 Figure A3.2 Initial average model PCA scores plots for A) PC1 vs PC2, B) PC1 vs PC3, and C) PC1 vs PC4 ................................................................................................................................. 114 Figure A3.3 Initial average model PCA loadings plots for A) PC1, B) PC2, C) PC3, and D) PC4 ..................................................................................................................................................... 116 Figure A3.4 Initial average LDA model coefficients of A) LD1 and B) LD3 .......................... 118 Figure A3.5 Refined PCA loadings plots for A) PC1, B) PC2, C) PC3, and D) PC4 ............... 119 Figure A3.6 Refined LDA model coefficients of A) LD1 and B) LD3 ..................................... 121 Figure A3.7 Neutral loss spectra of all fentanyl analogs ........................................................... 124 x Figure 4.1 Residuals plot for the AR subclass from the apex model ......................................... 135 Figure 4.2 Cooman’s plots for the apex model A) amide and aniline ring (AA) subclass vs amide group (AG) subclass, B) AA vs aniline ring (AR) subclass, C) AA vs n-alkyl chain (AN) subclass, and Cooman’s plots for the average model D) AA vs AG, E) AA vs AR, F) AA vs AN ..................................................................................................................................................... 137 Figure 4.3 Modelling power plot for the AG subclass from the apex model ............................ 139 Figure 4.4 Modelling power plots for the AG subclass SIMCA model A) with instrument variation incorporated, B) without instrument variation incorporated ....................................... 143 Figure 4.5 Residuals plot for the AR subclass in the refined SIMCA model ............................ 145 Figure 4.6 Residuals plot for the AG subclass in the refined SIMCA model ............................ 146 Figure 4.7 Cooman’s plots for the A) AA subclass vs AG subclass, B) AA subclass vs AR subclass, and C) AA subclass vs AN subclass ........................................................................... 150 Figure 4.8 Residuals plot for the AG subclass in the neutral loss SIMCA model ..................... 153 Figure A4.1 Modelling power plot for the AA subclass from the apex initial SIMCA model.. 156 Figure A4.2 Modelling power plot for the AR subclass from the apex initial SIMCA model .. 157 Figure A4.3 Modelling power plot for the AN subclass from the initial apex SIMCA model.. 159 Figure A4.4 Modelling power plots from the refined SIMCA model for the A) AA subclass, B) AR subclass, and C) AN subclass............................................................................................... 160 Figure A4.5 Modelling power plots from the neutral loss SIMCA models for the A) AA subclass, B) AR subclass, and C) AN subclass .......................................................................... 162 xi 1. INTRODUCTION 1.1 FENTANYL ANALOGS Novel psychoactive substances (NPS) are drugs synthesized to circumvent legal ramifications.1 Synthetic opioids are a type of NPS which are man-made drugs developed to mimic the pharmacological and analgesic effects of opiates. As overdose deaths in the United States rise, synthetic opioids accounted for 67% (>31,000) of opioid deaths in 2018. According to the Center for Disease Control, fentanyl was involved in the majority of overdose cases.2 Fentanyl is a synthetic opioid that is 50-100 times stronger than morphine and is used medicinally to treat severe pain.3 Fentanyl analogs are synthesized to increase the potency and pharmacological effects of fentanyl. An analog is a substance that shares a core structure with a pre-existing drug but has a structural substitution.4 Figure 1.1 shows the core structure of fentanyl with the locations indicated to show where structural substitutions can be made.5 Figure 1.1 Core structure of fentanyl with substitution sites indicated 1 n-Alkyl ChainPiperidineRingAmide GroupAniline Ring The Controlled Substances Act (CSA), enforced by the U.S. Drug Enforcement Administration (DEA), categorizes controlled substances into five schedules (Schedules I-V). Schedule I including substances with no accepted medical use and high potential for abuse and Schedule V including substances with accepted medical use and low potential for abuse.6 Due to its medicinal applications and high potential for abuse, fentanyl is listed as a Schedule II substance in the CSA. In February 2018, as a method to reduce overdose rates in the United States, the DEA created an emergency scheduling order to schedule all illicit fentanyl analogs as Schedule I substances because they have no approved medical use.7 Fentanyl analogs could not be listed by name in the order, instead they were referred to as ‘any fentanyl-related substance’, due to their ever-changing structures. 1.2 FORENSIC ANALYSIS OF SEIZED DRUGS 1.2.1 SWGDRUG Recommendations for Analysis The Scientific Working Group for the Analysis of Seized Drugs (SWGDRUG) is an international working group that aims to improve the quality of seized drug analysis and establish standards for forensic laboratories to follow.8 In 2019, SWGDRUG published the most recent guidelines detailing minimum recommendations for the forensic identification of seized drugs. These recommendations require the use of multiple uncorrelated techniques within an analytical scheme, with various techniques ranked into three categories (A, B and C) based on discriminating power and the structural information the technique provides. Category A techniques are the most discriminatory and provide selectivity through structural information. Category B techniques provide a lower level of selectivity through chemical or physical characteristics, while Category C techniques are the least discriminatory and provide only 2 general or class information. When a category A technique is used, only one additional technique must be used to verify identification: either A, B, or C.8 The gold standard for the analysis of seized drugs in forensic laboratories is gas chromatography-mass spectrometry (GC-MS). Analysis by GC-MS fulfills SWGDRUG recommendations for identification, as mass spectrometry is a category A technique and gas chromatography is a category B technique.8 1.2.2 Gas Chromatography-Mass Spectrometry Gas chromatography-mass spectrometry is a two-part technique providing separation and identification of organic molecules. Gas chromatography (GC) is a separation technique that uses varying temperatures, along with the interaction between the mobile and stationary phases, to separate analytes in a complex mixture. Due to the thermal separation, a high volatility is required of all analytes and they are commonly in the liquid phase at room temperature. Once mixtures are injected, analytes are moved through the column by a carrier gas (e.g., He, H2, N2, or Ar) and are separated based on volatility and interaction with the mobile and stationary phases. Analytes reach the detector at varying rates, with smaller, more volatile analytes moving more quickly through the column than larger, less volatile analytes. The time it takes for an analyte to move through the column and reach the detector is called the retention time.9 The analyte is transferred from the GC to the MS with a transfer line. The mass spectrometer functions as the detector in GC-MS and consists of three main parts: an ion source, mass analyzer, and ion detector. In electron ionization (EI), high-energy electrons (typically 70 eV) collide with gas-phase analyte molecules, resulting in reproducible fragmentation. The quadrupole mass analyzer consists of four parallel metal rods with oscillating radio frequency voltage and DC voltage. It separates ions according to the m/z ratios, allowing only ions of a 3 certain m/z value to reach the detector. A range of m/z values can be scanned through by varying the applied voltage. The detection of ions, commonly using an electron multiplier tube, results in a mass spectrum consisting of the various m/z ratios detected and the intensity at which they were detected.9 Due to the reproducible nature of EI-MS, spectral libraries are widely available to aid in analyte identification. 1.2.3 Current Challenges in Seized Drug Analysis To identify controlled substances, the GC retention time and mass spectrum of an analyte are compared to that of a reference standard analyzed under equivalent conditions. If the retention times are similar and the mass spectra show the same ions in a similar pattern of intensities, the unknown sample is identified. However, NPS analogs, such as fentanyl analogs, create challenges for forensic laboratories because reference standards may not be available for a specific analog. Availability of reference standards may be limited because a laboratory has not purchased newly synthesized analogs or there may not yet be a reference standard available for a particular analog. As differentiating between structural analogs is necessary for many laboratories to make an identification, methods to circumvent this challenge have been explored, such as new instrumental techniques and multivariate statistical methods. 1.3 ADDRESSING NPS IDENTIFICATION CHALLENGES IN FORENSIC LABORATORIES 1.3.1 Instrumental Methods for NPS Identification and Differentiation While GC-MS is the typical method of analysis, other instrumental techniques that can provide further discrimination between NPS compounds have been investigated. Nuclear magnetic resonance (NMR) spectroscopy, which is a Category A technique, has been investigated for NPS. Duffy et al. used low-field NMR spectroscopy to successfully differentiate 4 65 fentanyl analogs, while Bogun and Moore identified organic precursors in methamphetamine production.10,11 In both studies, smaller, benchtop NMR spectrometers were used, which is more practical for forensic applications. However, both studies analyzed pure standards rather than case samples, which are often mixtures of the controlled substances, cutting agents, and other additives. As mixtures are prominent in forensic casework samples, a prior separation step would be necessary before NMR analysis. Gas chromatography-vacuum ultraviolet (GC-VUV) spectroscopy has been explored as an alternative instrumental technique to the traditional GC-MS for NPS identification. Kranenburg et al. used GC-VUV to differentiate six sets of amphetamine isomers. Amphetamines have a high degree of conjugation, making them optimal compounds for VUV detection, as high conjugation compounds are the most UV-active.12 Of the compounds investigated, only 3,4-methylenedioxymethamphetamine (MDMA) and 3,4- methylenedioxyamphetamine (MDA) were not differentiated using GC-VUV. However, these compounds were differentiated by GC-MS. As such, Kranenburg et al. proposed that GC-VUV be used in conjunction with GC-MS as a tool for NPS and isomer identification.12 Roberson and Goodpaster also used GC-VUV to differentiate eight structurally similar phenethylamines, and similarly they recommended GC-VUV be used as a complimentary technique to GC-MS.13 Kranenburg et al. also explored modifying GC-MS to provide more specific structural information for cathinone and fluoroamphetamine isomers. Low-energy EI (15 eV) GC-MS provided more discriminating mass spectra for isomers.14 The disadvantage to this method was that low-energy EI could not be conducted on a conventional GC-MS instrument with a single quadrupole; instead, a GC-time of flight (TOF)-MS had to be used. As most forensic laboratories 5 do not have high-resolution mass spectrometers, the use of low-energy EI is not currently a viable option for all laboratories. 1.3.2 Multivariate Statistical Methods for NPS Identification and Differentiation In addition to investigating different instrumentation, multivariate statistical methods have also been explored to differentiate structurally similar NPS compounds.15-20 Principal components analysis (PCA) and linear discriminant analysis (LDA) are two of the methods that have been investigated for classification of NPS compounds.17-19 Theory of these multivariate methods is discussed in Section 1.3.2.1 and 1.3.2.2, respectively. Setser and Waddell Smith used EI mass spectral data to classify phenethylamines and tryptamines using LDA.18 The mass spectra were collected at the apex of the chromatographic peak and one collection of the training set compounds was used to develop the LDA model. Two approaches were investigated to determine which variables (in this case, m/z values) should be retained and used in the LDA model. In the first approach, chemically significant m/z values present in the training set spectra were identified manually whereas, in the second approach, PCA was applied to identify the m/z values describing most variance in the training set data. The chemically informed LDA model had a 93% successful classification rate, while the LDA model developed using PCA for variable selection had an 86% successful classification rate. Setser and Waddell Smith determined that while both methods for variable selection produced comparable results, PCA was a more efficient way to identify chemically significant m/z values to use in LDA.18 Other researchers have investigated more structurally similar compounds.13,19,20 Using EI- MS data collected at the apex of the chromatographic peak, Bonetti developed LDA models to classify fluoromethcathinone (FMC) isomers and fluorofentanyl isomers.19 These isomers cannot be differentiated based only on visual comparison of mass spectra due to similarities between 6 their structures and resulting EI mass spectra. Bonetti analyzed the FMC and fluorofentanyl isomers on six different GC-MS instruments twice a day over a five-day period to incorporate instrument variation. Separate LDA models were developed for each set of isomers. Principal components analysis was used to select the variables applied in the LDA model development. The models were tested with blind samples, case samples, and diluted samples. The LDA models successfully classified all test samples except some of the diluted samples, which were too dilute and did not have representative spectra. This work demonstrated the application of PCA and LDA for the differentiation of FMC and fluorofentanyl isomers based on mass spectral data. Bonetti concluded that when instrument variation was incorporated into the development of the LDA models, case samples and unknowns did not need to be analyzed on the same instrument used to develop the model.19 Davidson and Jackson differentiated 2,5-dimethoxy-N-(N- methoxybenzyl)phenethylamine (NBOMe) isomers by applying canonical discriminant analysis (CDA) to mass spectral data.20 Canonical discriminant analysis is similar to LDA, which classifies unknown samples into an available class. Davidson and Jackson explored instrument variation by using data collected on two different GC-MS instruments, with three different GC columns. The analogs were prepared at three different concentrations and analyzed twice a week for one or two months. All data were obtained from the apex of the chromatographic peak. When data from all three instrument collections and varying concentrations were used, the model had a 99.5% successful classification rate. When high abundance spectra collected on the same instrument were used to develop the model, a 99.9% successful classification rate was reported.20 Bonetti and Davidson and Jackson highlighted the need to incorporate variation in the development of multivariate statistical classification methods.19,20 These authors investigated 7 classifying isomers using LDA, or statistically similar CDA. Isomers share the same core structure with the same substituent at varying positions. Other analogs, compounds that share a core structure but with varying substituents, of the same drug class have not been investigated as closely. In other areas of forensic science, additional multivariate statistical methods have been explored. One method that has been used to differentiate and classify forensic samples is soft independent modelling of class analogies (SIMCA). This method has been applied in gunshot residue analysis, soil analysis, blood stain analysis, and fire debris analysis.21-24 Successful classification using SIMCA in other forensic disciplines supports the idea that it could be used to differentiate NPS compounds. See Section 1.3.2.3 for more information about SIMCA. 1.3.2.1 Principal Components Analysis Principal components analysis (PCA) is used to reduce the number of variables and dimensionality of a data set. This method is not used to classify new samples, but rather to visualize the variance within a data set. As an unsupervised technique, any separation of samples is a result of natural variance among the samples. Principal components (PCs) are linear combinations of variables and, for a given data set, there will be one less PC defined than there are number of variables. The variables have a weighting coefficient ranging from -1 to +1 for each PC, which indicates the sign and extent of contribution of the variable to a specific PC. As an example, the intensities of two m/z variables, m/z 146 and m/z 189, were plotted against each other for six samples (Figure 1.2). The first PC is shown in red on Figure 1.2. For the purposes of this discussion, the focus will be on PC1 only. 8 Figure 1.2 Intensity plot of m/z 189 versus m/z 146 A score is calculated for each sample on the new axis set. Scores are calculated for each sample for each PC, based on the variables contributing to each of the PCs. The dotted line in Figure 1.2 shows the projection of one sample onto PC1. This illustrates the projection for only one sample, but it would be done for every sample in the data set. The PCs can be plotted against one another to generate a scores plot, which shows separation among samples. The score for each PC determines where samples are positioned on a scores plot, allowing similarities and differences of each sample to be observed in relation to the other samples. Figure 1.3 is an example of a PCA scores plot for PC1 versus PC2 for the six samples shown in Figure 1.2. 9 PC1 Figure 1.3 Example of a PCA scores plot of PC1 vs PC2 Three samples are positioned negatively, while the other three samples are positioned positively on PC1. Loadings plots show which variables contribute to a particular PC and the positioning of samples on that PC. Figure 1.4 shows the loadings plot for PC1 and PC2. Three samples are positioned negatively on PC1 due to the negative loading of m/z 146 and the high intensity of m/z 146 for those samples (Figure 1.4). Conversely, three samples are positioned positively on PC1 due to the positive loading of m/z 189 and the high intensity of m/z 189 for those samples (Figure 1.4). More detailed information about this method can be found in reference 25.25 10 -0.200.2-0.400.4PC2PC1 Figure 1.4 Example of a loadings plot 1.3.2.2 Linear Discriminant Analysis Linear discriminant analysis (LDA) is a hard classification multivariate statistical method. This method is supervised, which means it has class knowledge and, as a hard classification method, must classify any new sample into one of the available classes. Because LDA is a classification method, the model must be developed with a training set to identify which variables differentiate the classes. Cross validation is performed to first assess the validity of the model. Leave-one-out cross validation is a common method where each sample of the training set is removed one at a time and applied to the model. The number of successful 11 classifications represents the classification success rate of the model. When a classification method has been developed using a training set with successful cross validation, a test set is then applied, and samples are classified by the model into an available class. The objective of LDA is to minimize within-class variance and maximize between-class variance. To do so, linear discriminants (LDs) are calculated that are linear combinations of the variables, similar to PCs. There is always one less LD than there are number of classes, for example, if there were three classes there would be only two LDs. Coefficients of linear discriminants are also similar to loadings plots for PCs and show the contributions of variables to separation along an LD. Similar to PCA, these LDs can be plotted against each other to form LDA scores plots, where the separation between classes included in the training set can be seen. LDA scores plots can then be used to visualize the positioning of new samples in relation to the training set. For each of the classes, a centroid is calculated, which is the center (or average) of all the scores of training set samples in that class. Classification is determined by calculating the distance from an unknown sample to the centroid of each class, the shortest distance results in classification to a particular class. This distance is statistically referred to as a Mahalanobis distance, which is a standardized Euclidean distance. The standardization allows for the scores of each LD to have the same variance before the Euclidean distance is calculated. Figure 1.5 shows an example of an LDA scores plot with three classes. The triangle signifies an unknown sample applied to the model and the dashed lines show the Mahalanobis distance to the centroid of each class, shown as a diamond. In this hypothetical example, the unknown sample would be classified as a member of the blue class, as the Mahalanobis distance to this class is the shortest. 12 For all unknown samples a posterior probability is calculated, which is the probability of class membership, and is calculated to each class. Figure 1.5 Example LDA scores plot of LD1 vs LD2 13 Yellow ClassBlue ClassRed ClassCentroidUnknown SampleDistancetoCentroidLD1LD2 1.3.2.3 Soft Independent Modelling of Class Analogies Like LDA, soft independent modelling of class analogies (SIMCA) is also a supervised classification method; however, SIMCA is a soft, rather than hard, classification method. This means SIMCA can classify new samples to one class, more than one class, or none of the classes. To develop a SIMCA model, PCA is performed on each of the classes individually and optimized to retain a specific number of PCs representing the variance within the class. Residuals plots are generated for each class by plotting Hotelling’s T2 residuals versus the squared residuals distance (Q). Figure 1.6A shows an example of a class with only one PC retained. The Q and T2 distances for the sample positioned the lowest on the Y axis are highlighted by blue arrows. The Q distance is statistically defined as a squared orthogonal Euclidean distance from a sample to the PCA model, in this case the point of the sample to the point on the PC. The Q distance describes lack of fit to the model: a small Q distance indicates better fit to the model than a large Q value. The T2 distance is statistically defined as a squared Mahalanobis distance between the projection of the sample and the origin in PCA space, which is the centroid of the mean-centered data. The T2 distance describes how far from the origin a sample is (or how extreme it is) in relation to the other samples in the training set. A high T2 distance indicates a sample is more extreme than the samples represented by the training set. Each of these distances is calculated for every training set sample and test sample applied to this class’s PCA space. A critical limit for both the Q and T2 parameters is determined for a class by a significance level (α) defined by the user. If a training set sample or test set sample falls outside the critical limit for a particular class, it is not classified to that class. The cylinder in Figure 1.6A signifies the critical limit for this example class. The model is optimized by adjusting the 14 significance value to include as many of the training set compounds as possible. Leave-one-out cross validation of each class can also help to determine which significance level is optimal. Figure 1.6B shows an example of a residuals plot for the class represented by Figure 1.6A. The one sample that falls outside the critical limit (dashed line) would not be recognized as being a member of the class unless the significance level, or critical limit, was expanded. When multiple classes are utilized in SIMCA, this process is repeated for each class individually. 15 A) B) Figure 1.6 Example of A) a SIMCA class with only one PC retained, and B) residuals plot for the class 16 ResidualsQCritical limitCritical limitPC1Q Cooman’s plots are used in multi-class SIMCA models to examine classes in relation to each other. As described above the Q distance is the lack of fit measurement to a particular class, the Cooman’s plot shows the Q distances calculated to two classes, plotted against each other. This comparison also requires further optimization, or adjustment, of the critical limit to enhance separation between classes while also ensuring all samples within a class are below the critical limit. Figure 1.7 shows an example of a Cooman’s plot for two classes, red class and blue class. The triangle represents the Q distance to the blue class and the Q distance to the red class of an unknown sample applied to the SIMCA model. As the sample falls outside both critical limits for the red class and blue class, this sample would be classified as ‘none’ meaning not belonging to either of these classes. 17 Figure 1.7 Example Cooman’s plot from a SIMCA model 1.3.3 Neutral Losses as an Alternative to Mass Spectra The work investigated by Davidson and Jackson, as well as Bonetti, used mass spectral data to differentiate isomers.19,20 Analogs have the same core structure, but rather than having the same substituent at different positions (isomers), they have different substituents. This results in differences in the m/z values for fragment ions, which may limit the ability to classify based on common fragments. Neutral loss spectra have been used as an alternative to overcome the challenges of analogs with structural differences based on the location of a substituent on the 18 Squared Residual Distance (Q) to Red ClassSquared Residual Distance (Q) to Blue ClassRed ClassBlueClassUnknown SampleCritical Limit core structure.26,27 A neutral loss is defined as being the loss of an uncharged species from an ion during rearrangement or direct dissociation.9 Fowble et al. differentiated synthetic cathinones into seven classes by principal components analysis (PCA) and hierarchal clustering analysis (HCA) applied to neutral loss spectra.26 The neutral loss spectra were derived from the mass spectra obtained from collision-induced dissociation (CID) direct analysis in real time high- resolution mass spectrometry (DART-MS) and required the presence of a molecular ion.26 Although this method was rapid and required minimal sample preparation, it did not have a separation method prior to analysis which is problematic in forensic laboratories where mixtures are common. Additionally, DART-MS is a soft ionization method, as opposed to EI (a hard ionization method), which produces fewer fragment ions. DART-MS also makes identification of neutral losses easier to identify because accurate mass data are available. Moorthy et al. investigated incorporating neutral loss matching in the National Institute of Standards and Technology (NIST) library search algorithm.27 In this work, the authors proposed not only comparing fragment ions between mass spectra, but also comparing the neutral losses between spectra as well, in what they developed to be the Hybrid Similarity Search (HSS). This search algorithm requires the molecular mass of both the library compound and the unknown compound in order to compare and align the spectra to compare neutral losses. The use of the HSS for fentanyl-related compounds showed success in obtaining further structural information about an analog: that is, where the substitution occurred on the core fentanyl structure (n-alkyl chain, amide group, aniline ring, or piperidine ring, Figure 1.1).27 The greatest obstacle with this work was that the molecular mass must be known or easily obtained. Typically, fentanyl analogs do not have a molecular ion present in the mass spectrum when analyzed by EI so it is unlikely this information would be available for an unknown. 19 1.4 RESEARCH OBJECTIVE The objective in this work was to develop, validate, and compare multivariate statistical models for classification of fentanyl analogs according to structural subclass. The first goal was to develop LDA models for classification of fentanyl analogs according to structural subclass while the second goal was to develop SIMCA models for similar classification. In the development of each model, three factors were considered to optimize classification. The first factor investigated was the effect of mass spectral variation within a peak on the classification success. This was achieved by developing models based on spectra collected at the apex of the chromatographic peak and spectra averaged across the chromatographic peak. The second factor investigated was the effect of spectral variation over time on the classification success. This was achieved by developing models using a training set collected within two months and across four months. The third factor investigated was the potential of using neutral losses as variables to improve classification success. Overall, this work aimed to contribute to the forensic science community by investigating methods to enhance characterization of structurally similar fentanyl analogs, for which reference spectra are not readily available, using conventional GC-MS methods already employed in laboratories. 20 REFERENCES 21 REFERENCES (1) Novel Psychoactive Substances. https://www.nmslabs.com/forensic-testing/novel- psychoactive-substances. (accessed June 2020) (2) Synthetic Opioid Overdose Data. https://www.cdc.gov/drugoverdose/data/fentanyl.html. (accessed June 2020) (3) National Institute on Drug Abuse. Fentanyl DrugFacts. https://www.drugabuse.gov/publications/drugfacts/fentanyl. (accessed June 2020) (4) Hartney, E. The Fentanyl Crisis-The Drug's Analogs and Derivatives. https://www.verywellmind.com/fentanyl-analogs-and-derivatives-4165882. (accessed June 2020) (5) Cayman Chemical. Fentanyl Identification Cayman Currents. 28, Ann Arbor (2017). (6) Drug Enforcement Administration. Drug Scheduling https://www.deadiversion.usdoj.gov/synthetic_drugs/about_sd.html. (accessed June 2020) (7) U.S. Drug Enforcement Administration Emergency Schedules All Illicit Fentanyls in an Effort to Reduce Overdose Deaths. https://www.dea.gov/press-releases/2018/02/07/us-drug- enforcement-administration-emergency-schedules-all-illicit. (accessed June 2020) (8) Scientific Working Group for the Analysis of Seized Drugs (SWGDRUG) Recommendations, 2019. http://swgdrug.org/Documents/SWGDRUG%20Recommendations%20Version%208_FINAL _ForPosting_092919.pdf. (accessed June 2020) (9) Watson, J. T.; Sparkman, O. D. Introduction to Mass Spectrometry: Instrumentation, Applications and Strategies for Data Interpretation; Wiley: Chichester, 2011. (10) Duffy, J.; Urbas, A.; Niemitz, M.; Lippa, K.; Marginean, I. Differentiation of Fentanyl Analogues by Low-Field NMR Spectroscopy. Analytica Chimica Acta. 2019. 1049, 161–169. (11) Bogun, B.; Moore, S. 1H And 31P Benchtop NMR of Liquids and Solids Used in and/or Produced during the Manufacture of Methamphetamine by the HI Reduction of Pseudoephedrine/Ephedrine. Forensic Science International. 2017, 278, 68–77. 22 (12) Kranenburg, R. F.; García-Cicourel, A. R.; Kukurin, C.; Janssen, H.-G.; Schoenmakers, P. J.; Asten, A. C. V. Distinguishing Drug Isomers in the Forensic Laboratory: GC–VUV in Addition to GC–MS for Orthogonal Selectivity and the Use of Library Match Scores as a New Source of Information. Forensic Science International. 2019, 302, 109900. (13) Roberson, Z. R.; Goodpaster, J. V. Differentiation of Structurally Similar Phenethylamines via Gas Chromatography–Vacuum Ultraviolet Spectroscopy (GC–VUV). Forensic Chemistry. 2019, 15, 100172. (14) Kranenburg, R. F.; Peroni, D.; Affourtit, S.; Westerhuis, J. A.; Smilde, A. K.; Asten, A. C. V. Revealing Hidden Information in GC–MS Spectra from Isomeric Drugs: Multivariate statisticals Based Identification from 15 EV and 70 EV EI Mass Spectra. Forensic Chemistry. 2020, 18, 100225. (15) Bodnar Willard, M.A.; McGuffin, V. L.; Waddell Smith, R. Statistical Comparison of Mass Spectra for Identification of Amphetamine-Type Stimulants. Forensic Science International. 2017, 270, 111–120. https://doi.org/10.1016/j.forsciint.2016.11.013. (16) Stuhmer, E.L.; McGuffin, V.L.; Waddell Smith, R. Discrimination of Seized Drug Positional Isomers based on Statistical Comparison of Electron-Ionization Mass Spectra. Forensic Chemistry. 2020, 20, 100261. (17) Quinn, M.; Brettell, T.; Joshi, M.; Bonetti, J.; Quarino, L. Identifying PCP and Four PCP Analogs Using the Gold Chloride Microcrystalline Test Followed by Raman Microspectroscopy and Multivariate Statisticals. Forensic Science Internationa.l 2020, 307, 110135. (18) Setser, A. L.; Waddell Smith, R. Comparison of Variable Selection Methods Prior to Linear Discriminant Analysis Classification of Synthetic Phenethylamines and Tryptamines. Forensic Chemistry. 2018, 11, 77–86. (19) Bonetti, J. Mass Spectral Differentiation of Positional Isomers using Multivariate Statistics Forensic Chemistry. 2018, 9, 50–61. (20) Davidson, J. T.; Jackson, G. P. The differentiation of 2,5-dimethoxy-N-(N- methoxybenzyl)phenethylamine (NBOMe) isomers using GC retention indices and multivariate analysis of ion abundances in electron ionization mass spectra Forensic Chemistry, 2019, 14, 100160. (21) Álvarez, Á.; Yáñez, J.; Contreras, D.; Saavedra, R.; Sáez, P.; Amarasiriwardena, D. Propellant’s Differentiation Using FTIR-Photoacoustic Detection for Forensic Studies of Improvised Explosive Devices. Forensic Science International. 2017, 280, 169–175. 23 (22) Kaniu, M.; Angeyo, K. Challenges in Rapid Soil Quality Assessment and Opportunities Presented by Multivariate Multivariate statistical Energy Dispersive X-Ray Fluorescence and Scattering Spectroscopy. Geoderma. 2015, 241-242, 32–40. (23) Pereira, J. F.; Silva, C. S.; Vieira, M. J. L.; Pimentel, M. F.; Braz, A.; Honorato, R. S. Evaluation and Identification of Blood Stains with Handheld NIR Spectrometer. Microchemical Journal. 2017, 133, 561–566. (24) Waddell, E. E.; Williams, M. R.; Sigman, M. E. Progress Toward the Determination of Correct Classification Rates in Fire Debris Analysis II: Utilizing Soft Independent Modelling of Class Analogy (SIMCA). Journal of Forensic Sciences. 2014, 59 (4), 927–935. (25) Brereton, R. G. Chemometrics: Data Driven Extraction for Science; John Wiley & Sons, Incorporated: Newark, 2018. (26) Fowble, K. L.; Shepard, J. R.; Musah, R. A. Identification and Classification of Cathinone Unknowns by Statistical Analysis Processing of Direct Analysis in Real Time-High Resolution Mass Spectrometry-Derived “Neutral Loss” Spectra. Talanta. 2018, 179, 546–553. (27) Moorthy, A. S.; Wallace, W. E.; Kearsley, A. J.; Tchekhovskoi, D. V.; Stein, S. E. Combining Fragment-Ion and Neutral-Loss Matching during Mass Spectral Library Searching: A New General Purpose Algorithm Applicable to Illicit Drug Identification. Analytical Chemistry. 2017, 89 (24), 13261–13268. 24 2. MATERIALS AND METHODS 2.1 FENTANYL ANALOG REFERENCE MATERIALS A standard operating procedure (SOP) for handling and disposing of fentanyl was developed and approved by the Environmental Health and Safety (EHS) and is shown in the appendix (A2). Twenty-eight fentanyl analogs were obtained from Cayman Chemical (Ann Arbor, MI). The fentanyl analogs were representative of four structural subclasses1: n-alkyl chain substituted (AN) subclass, amide group substituted (AG) subclass, aniline ring substituted (AR) subclass, and amide and aniline ring substituted (AA) subclass (Table 2.1). All structures for the analogs are shown in the appendix (Figure A2.1). All reference materials were prepared in a 1 mg/mL solution of methanol (ACS Grade, Sigma Aldrich, St. Louis, MO) and analyzed by gas chromatography-mass spectrometry (GC-MS). 25 Table 2.1 Fentanyl analogs used in this work, separated by structural subclass Amide and Aniline Ring Aniline Ring Amide Group n-Alkyl Chain para-fluorobutyryl para- fentanyl methylfentanyl cyclohexyl fentanyl furanylethyl fentanyl meta-fluorobutyryl meta- tetrahydrofuran α-methyl acetyl fentanyl methylfentanyl fentanyl fentanyl ortho-fluorobutyryl ortho- fentanyl methylfentanyl isobutyryl fentanyl 4’-methylfentanyl para-fluoro methoxyacetyl para- fentanyl methoxyfentanyl cyclopropyl fentanyl thio fentanyl meta-fluoro methoxyacetyl fentanyl ortho-fluoro methoxyacetyl fentanyl para-fluoroisobutyryl fentanyl meta-fluoroisobutyryl fentanyl ortho-fluoroisobutyryl fentanyl para-fluorofentanyl acrylfentanyl α-methylfentanyl para- chlorofentanyl butyryl fentanyl cyclopentyl fentanyl α-methyl thio fentanyl 2.2 GAS CHROMATOGRAPHY-MASS SPECTROMETRY (GC-MS) ANALYSIS All fentanyl analog reference materials were analyzed by gas chromatography-mass spectrometry (GC-MS), using an Agilent Technologies 7890A GC and 5975C Inert XL MSD with Triple-Axis Detector (Agilent Technologies, Santa Clara, CA). A CTC-PAL autosampler (CTC Analytics, Zwingen, Switzerland) was used to inject 1µL of each sample into the GC. A column was used with a 5%-diphenyl-95%-dimethylpolysiloxane column (VF-5ms, 30 m x 0.25 mm inner diameter x 0.25 µm film thickness, Agilent Technologies). The carrier gas was helium at a nominal flow rate of 1 mL/min. The injection temperature was 220 ℃ with a 100:1 split ratio and there was a solvent delay of 2.5 min. The GC oven temperature program was as follows: 200 ℃ for 1 min, 30 ℃/min to 300 ℃, with a final hold of 8 min. The transfer line was 26 kept at 300 ℃ and the mass spectrometer was operated in electron ionization mode at 70 eV. The scan range was m/z 40-450 with a scan rate of 4.51 scans/s. The quadrupole temperature was 150 ℃ and the source temperature was 230 ℃. The MS was tuned using the auto tune function in ChemStation software prior to each analysis. Each analog was analyzed once per month for four consecutive months. 2.3 DATA ANALYSIS The mass spectrum for each analog was collected at the apex of the chromatographic peak and as an average across the chromatographic peak. The average was taken across the width of the peak at half maximum. Mass spectral data were exported from ChemStation (version E.01.00.237, Agilent Technologies) to Microsoft Excel (version 16.0, Microsoft Corporation, Redmond, WA). For each analog, the mass spectral intensity was normalized to the base peak and zero-filled from m/z 40-450. The data were input into Origin (version 9.0 OriginLab Corporation, Northampton, MA) for further visualization of the mass spectra. 2.3.1 Neutral Loss Spectra Development Because low-resolution MS was used in this work, only the mass of neutral losses could be identified rather than the chemical identity. Instead, neutral loss data were hypothetically determined and were developed to be as objective as possible. All fentanyl analogs used in this work did not produce a molecular ion when subjected to electron ionization. The neutral loss spectra were developed by subtracting each m/z value from the base peak in the spectrum. The intensity that represented each neutral loss was the normalized intensity of the m/z value from which the neutral loss was derived. For example, if the base peak was m/z 245, then neutral loss m/z 99 and its intensity would be derived from m/z 146 in the mass spectrum. 27 2.5 STATISTICAL MODELLING All statistical modelling in this work was performed in R (version 3.5.1, The R Project for Statistical Computing). All spectra were divided into a training set and a test set. Two classification methods were investigated: linear discriminant analysis (LDA) and soft independent modelling of class analogies (SIMCA). For these procedures additional packages were downloaded.2,3 For each classification method, four models were developed to investigate the effect of spectral variation within a peak and over time on the classification success. The first two models (one for apex spectra, one for average spectra) to investigate the effect of spectral variation within a peak were developed using an initial training set that contained 44 spectra (Table 2.2) and tested using a test set containing 68 spectra (Table 2.3). To investigate the effect of spectral variation over time, the training and test sets were re-defined. For these models, the training set contained 88 spectra (Table 2.4) and the test set contained 24 spectra (Table 2.5). The refined models were only developed using mass spectra collected at the apex of the chromatographic peak. Finally, LDA and SIMCA models were developed, optimized, and tested using the neutral loss spectra (Section 2.3.1), using the same training set and test set as the refined model, albeit with neutral loss, rather than mass spectral, data. 28 Table 2.2 Training set for the initial models (all analog spectra in n = 2) Amide and Aniline Ring Aniline Ring Amide Group n-Alkyl Chain para-fluorobutyryl meta- fentanyl methylfentanyl cyclohexyl fentanyl furanylethyl fentanyl meta-fluorobutyryl ortho- tetrahydrofuran α-methyl acetyl fentanyl methylfentanyl fentanyl fentanyl ortho-fluorobutyryl para- fentanyl methoxyfentanyl meta-fluoro methoxyacetyl fentanyl ortho-fluoro methoxyacetyl fentanyl para-fluoroisobutyryl fentanyl ortho-fluoroisobutyryl fentanyl para-fluorofentanyl para- chlorofentanyl isobutyryl fentanyl 4’-methylfentanyl cyclopropyl fentanyl thiofentanyl acrylfentanyl α-methylfentanyl 29 Table 2.3 Test set for the initial models, replicate spectra indicated Amide and Aniline Ring Aniline Ring Amide Group n-Alkyl Chain para-fluorobutyryl para-methylfentanyl cyclohexyl fentanyl furanylethyl fentanyl fentanyl (n = 2) (n = 4) meta-fluorobutyryl meta-methylfentanyl fentanyl (n = 2) ortho-fluorobutyryl fentanyl (n = 2) para-fluoro (n = 2) ortho- methylfentanyl (n = 2) para- methoxyacetyl fentanyl methoxyfentanyl (n =2) tetrahydrofuran fentanyl (n = 2) (n = 2) α-methyl acetyl fentanyl (n = 2) isobutyryl fentanyl 4’-methylfentanyl (n = 2) (n = 2) cyclopropyl fentanyl thio fentanyl (n = 2) (n = 2) (n = 4) meta-fluoro methoxyacetyl fentanyl (n = 2) ortho-fluoro methoxyacetyl fentanyl (n = 2) para-fluoroisobutyryl fentanyl (n = 2) meta-fluoroisobutyryl fentanyl (n = 4) ortho-fluoroisobutyryl fentanyl (n = 2) (n = 2) para-fluorofentanyl acrylfentanyl α-methylfentanyl (n = 2) (n = 2) (n = 2) para-chlorofentanyl butyryl fentanyl (n = 2) (n = 4) α-methyl thio fentanyl (n = 4) cyclopentyl fentanyl (n = 4) 30 Table 2.4 Training set for the refined models and neutral loss models (all analog spectra in n = 4) Amide and Aniline Ring Aniline Ring Amide Group n-Alkyl Chain para-fluorobutyryl meta- fentanyl methylfentanyl cyclohexyl fentanyl furanylethyl fentanyl meta-fluorobutyryl ortho- tetrahydrofuran α-methyl acetyl fentanyl methylfentanyl fentanyl fentanyl ortho-fluorobutyryl para- fentanyl methoxyfentanyl meta-fluoro methoxyacetyl fentanyl ortho-fluoro methoxyacetyl fentanyl para-fluoroisobutyryl fentanyl ortho-fluoroisobutyryl fentanyl para-fluorofentanyl para- chlorofentanyl isobutyryl fentanyl 4’-methylfentanyl cyclopropyl fentanyl thio fentanyl acrylfentanyl α-methylfentanyl Table 2.5 Test set for the refined models and neutral loss models (all analog spectra in n = 4) Amide and Aniline Ring para-fluoro methoxyacetyl fentanyl Aniline Ring Amide Group para- methylfentanyl cyclopentyl fentanyl meta-fluoroisobutyryl fentanyl butyryl fentanyl n-Alkyl Chain α-methyl thio fentanyl 2.5.1 Principal Components Analysis (PCA) Principal components analysis (PCA) was applied to the full mass spectra (m/z 40-450) of all analogs selected for the training set. The scores plots were examined to determine the number of principal components (PCs) to retain based on separation of structural subclasses. The 31 loadings plots for the retained PCs were examined and the absolute values of the loadings were normalized to the largest loading value across all PCs retained to generate relative loadings. The relative loadings were then filtered at various thresholds (i.e., 1.5%, 2%, 2.5%, and 3%) to determine an optimal number of variables to retain for LDA. The thresholds were a percent of the relative loadings value and used to reduce the number of variables, as LDA requires the number of variables be less than the number of samples. The optimal threshold, and resulting number of variables, was determined by applying LDA to the selected variables for each threshold and assessing the leave-one-out cross validation success. The variables that resulted in the optimal leave-one-out cross validation were used for model development. All R codes for PCA are shown in the appendix (Table A2.1). 2.5.2 Linear Discriminant Analysis (LDA) The various relative loadings thresholds were used with LDA to determine the optimal variables (m/z values) by assessing the leave-one-out cross validation for the training set. The threshold and resulting variables with the best cross validation results were retained for model development and validation. Four LDA models were developed in this work: an initial model with apex data, an initial model with average data, a refined model, and a neutral loss model. Test sets were applied to these models to assess LDA classification accuracy. All R codes for LDA are shown in the appendix (Table A2.2). 2.5.3 Soft Independent Modelling of Class Analogies (SIMCA) Four SIMCA models were also developed: an initial model with apex data, an initial model with average data, a refined model, and a neutral loss model. Unlike LDA, SIMCA does not require variable reduction so SIMCA was applied to the full mass spectra. Test sets were also 32 applied to these models to assess SIMCA classification accuracy. All R codes for SIMCA are shown in the appendix (Table A2.3) 33 APPENDIX 34 A2 Standard Operating Procedure for Sample Preparation of Fentanyl and Analogs STANDARD OPERATING PROCEDURE Sample Preparation of Fentanyl and Analogs _______________________________________________ Research Group: ____Ruth Smith – Forensic Chemistry_____________________________ Author: ______Amber Gerheart and Hannah Clause_______________________________ Last revision date: _____06/28/2019_____________________________________________ Room and Building: ___204 and 205 Chemistry__________________________________________ Contact information: ___517-353-5283_________________________________________ Section 1: This standard operating procedure is for □ □ The generic use of a chemical A specific laboratory procedure involving a chemical Section 2: Chemical information Fentanyl – Solid white powder, odorless, Fatal if swallowed. Fatal if inhaled. Call 911 upon any potential exposure Do not breathe {dust/fume/gas/mist/vapors/spray}. 35 Wash hands thoroughly after handling. Wear respiratory protection. Symptoms of exposure may include: Contracted or pinpoint pupils (miosis) (may later become dilated), reduced level of consciousness (CNS depression), reduced respiratory function (respiratory depression), reduced blood oxygen content (hypoxia), accumulation of acid in the blood (acidosis), low blood pressure (hypotension), slow heart rate (bradycardia), shock, slowing of muscular movement of the stomach (gastric hypomotility) with intestinal obstruction due to lack of normal muscle function (ileus), accumulation of fluid in the lungs (pulmonary edema), lethargy, coma, and death. All standard protocols for handling and use of DEA Controlled Substances are required when using this product. Section 3: Potential Hazards Chemical Dangers – No hazardous polymerization will occur Explosion Hazards – None determined Fire Fighting Information – Burning may produce carbon monoxide, carbon dioxide, and nitrogen oxides Physical Exposure – Fatal if inhaled or swallowed MSDS: https://www.caymanchem.com/msdss/14719m.pdf Section 4: Personal Protective Equipment All work in laboratories must be performed under the guidelines for appropriate laboratory attire, as defined by the MSU Chemical Hygiene Plan: Long pants or long skirt covering the legs from the waist to the top of shoes Safety goggles Laboratory coat - Closed-toe shoes - - - - Disposable laboratory coat - Nitrile gloves (double glove when handling) - N-95 respirator PPE will be regularly stocked in the lab (room 204, Chemistry Building). http://home.iape.org/resourcesPages/IAPE_Downloads/Drugs/Evidence_Unit_Safety_Protocols_ in_Light_of_Fentanyl.pdf Section 5: Engineering Controls The eye wash and emergency shower are located to the right of the designated fume hood in room 204. The eye wash is at the sink by the door and the shower is between the door and the sink. The current lab safety coordinator will designate a student to be responsible for checking the condition of eye washes on a weekly basis. The fume hood where this work will be done is located in the back, right side of the room. 36 Section 6: Special Handling and Storage Requirements Fentanyl and 30 analogs (see attached Appendix) will be purchased. With the exception of fentanyl itself, only 1 mg of each analog will be purchased. For fentanyl, 10 mg will be purchased. On receipt of the fentanyl analogs, each will be assigned a unique identifier and will be logged in our electronic Controlled Substances log. All fentanyl analogs will be stored in the controlled substances safe in room 205. The safe is a combination-type safe that only Dr. Smith has access to. The safe is housed within an enclosed area that is accessible by key-card access only and again, only Dr. Smith has key card access to the area. When not in use, all fentanyl analogs will be stored in the safe. Prior to analysis, each analog will be prepared in solution. A 1 milliliter aliquot of suitable solvent (methanol or chloroform) will be added to the vial containing the analog. The solution will then be transferred using a glass pipet to a gas chromatography (GC) vial for analysis. The capped GC vial will then be transferred into a scintillation vial that will be used as secondary containment. The capped GC vial and the scintillation vial will be labeled with the analog name, the concentration of the solution, and appropriate hazard labels (Health: 4 Flammability: 1). The scintillation vial will be color coded according to the structural subclass of each analog to minimize handling of samples. Section 7: Accidental Release and Decontamination Procedures A mixture of 1 tablespoon OxiClean Versatile Stain Remover (main components are sodium percarbonate and hydrogen peroxide) and 500 mL water will be prepared each day of analysis. Prior to any sample handling, work surfaces will be cleaned three times using this solution. After preparation, the work area will again be cleaned three times with this solution. At any given time, 1 mg or less of the analog will be handled. If a spill occurs, the area will be cleaned with the OxiClean solution (minimum of three wipes with the solution to increase decontamination efficiency) and any solid waste (e.g., paper towels used to clean the spill) will be disposed of in a ziplock- type bag. The ziplock bag will be sealed and transferred into double-bagged 10 gallon-size ziplock-type bags. The hazardous waste tag will be placed between the two 10-gallon size bags. The smaller sealed bag containing the solid waste will be placed inside the larger bags, which will be sealed and stored in the secure, enclosed area, next to the controlled substances safe. As soon as reasonably possible after the spill, the lab safety coordinator, Dr. Smith, and EHS will all be notified. Spills not contained in a fume hood or spills leading to contamination of personnel or equipment will be reported to 911 immediately. Section 8: Exposure Procedures At all times that fentanyl analogs are being handled, two people will be in the lab, one to handle the analogs and the other as a safety measure. Any potential exposure to skin, eyes, or inhalation will be immediately reported to 911. It is important that any person assisting the victim does not contaminate themselves. Therefore, call 911 immediately and then don double gloves, lab coat, and safety goggles before assisting the victim. 37 - - If skin exposure occurs, the area should be washed immediately with soap and water while waiting for paramedics to arrive. If swallowed, the individual will be instructed to rinse out their mouth with water while waiting for paramedics to arrive. If, after any exposure, the individual exhibits signs of overdose (e.g., drowsiness, disorientation, sedation, pinpoint pupils, skin rash, clammy skin, or respiratory depression or arrest), nasal naloxone (Narcan) will be administered according to the manufacturer instructions. Narcan will be stored in 204 Chemistry, next to the fume hood where the samples will be prepared. The date of receipt and the listed expiration date will be noted on a log on the side of the fume hood. Prior to beginning any work with fentanyl, personnel must ensure the appropriate quantity of Narcan is available and that the Narcan has not expired. NOTE: in cases of skin exposure, DO NOT use hand sanitizers. These products penetrate the skin which may increase the absorption of fentanyl through the skin. Section 9: Waste Disposal Procedures All uncontaminated packaging, boxes, or other items that may indicate the presence of controlled substances should not be recycled or placed into trash cans for routine disposal. These items will be placed into a double-bagged garbage bag, labeled with a hazardous waste tag, for incineration. When working with the samples, solid waste will be placed into a gallon-size or smaller ziplock-type bag within the fume hood. At the end of sample preparation, the ziplock bag will be sealed and transferred into double-bagged 10 gallon-size ziplock-type bags. The hazardous waste tag will be placed between the two 10-gallon size bags. The smaller sealed bag containing the solid waste will be placed inside the larger bags, which will be sealed and stored in the secure, enclosed area, next to the controlled substances safe. Any liquid waste will be treated as “controlled-substance containing waste” and will be stored in a 250 mL amber bottle, labeled with the appropriate hazardous waste tags. The waste bottle will be stored in the secure, enclosed area next to the controlled substances safe. Contact Amber Bitters, EHS Hazardous Waste Coordinator at 517-432-5262 when ready for final disposal. Section 10: Material Safety Data Sheets / Safety Data Sheets Lab 205 – in safety binder in the drawer with all other safety coordinator information Also found online at https://www.caymanchem.com/msdss/14719m.pdf Section 11: Training and Awareness Employees working with chemicals must complete the following training: □ Chemical Hygiene and Hazardous Waste Initial / Refresher 38 □ Site Specific Training with PI or lab manager □ Review and signature of this completed SOP □ Other: Controlled Substances Training Biohazard Training Naloxone Training_(online training in the form of webpage instructions and video are available at https://www.narcan.com/patients/how-to-use-narcan/) Completion of the training will be recorded in the Training Folder located in 205 Chemistry and an electronic version of the completed training will be maintained by Dr. Smith. Section 11: Protocols The objective in this research is to characterize fentanyl and related analogs based on the corresponding mass spectral data. Prior to mass spectral analysis, each fentanyl analog will be prepared at a maximum concentration of 1 mg in 1 mL of appropriate solvent (methanol or chloroform). The protocol for preparing the analogs is as follows: 1. Each analyst working with fentanyl will wear appropriate PPE (lab coat, disposable lab coat, double gloves, safety goggles, and a disposable N-95 respirator). 2. The working area within the designated fume hood will be sprayed with OxiClean Versatile Stain Remover solution and cleaned. The area will be wiped a minimum of three times with this solution to increase decontamination efficiency. 3. A sheet of bench paper will be placed in the fume hood. 4. For each analog, the cap of the sample bottle will be removed and 1 mL of appropriate solvent (methanol or chloroform) will be transferred to the bottle using an automated pipet. 5. The solution will then be transferred to a glass GC vial using an automated pipet. 6. The GC vial will be capped and labeled with the analog name and concentration, along with hazard warning labels. 7. The capped GC vial will be placed in a scintillation vial that will be capped and labeled with the analog name and concentration, along with hazard warning labels. A color-coded label will also be adhered to the cap of the scintillation vial to readily identify the structural subclass of analog within the vial and thereby minimize sample handling. 8. For fentanyl, as 10 mg will be purchased, appropriate aliquots will be weighed into GC vials prior to the addition of solvent. In these cases, an analytical balance will be transferred into the fume hood and used to weigh the appropriate aliquots of fentanyl. Steps 4 – 7 will then be followed to prepare the fentanyl sample. The balance will be wiped down with OxiClean Versatile Stain Remover. The surface of the balance will be wiped a minimum of three times with the OxiClean 39 solution to increase the decontamination efficiency. The balance will then be returned to the lab bench. 9. Following preparation, the scintillation vials containing the prepared solutions will be returned to the controlled substances safe. 10. The bench paper will be folded and placed in the gallon-size solid waste bag, along with any other solid waste produced during the sample preparation procedure (e.g., disposable pipet tips). The solid waste bag will be sealed and placed in a double-lined 10-gallon bag and sealed. The sealed solid waste bag will be appropriately labeled and stored in the secure, enclosed area next to the controlled substances safe. 11. Any liquid waste will be transferred to the “Controlled Substances Hazardous Waste” bottle, which will be appropriately labeled and stored in the secure, enclosed area next to the controlled substances safe. 12. The fume hood will again be wiped down with the OxiClean Versatile Stain Remover solution. As before, the surface will be wiped down a minimum of three times to increase the decontamination efficiency. The solid waste generated during this cleaning will be disposed of as described in step 10. 13. Disposable PPE will be disposed of as solid waste (as described in step 10) and safety goggles will be wiped down with the OxiClean solution (minimum of three wipes). The samples will be analyzed by gas chromatography-mass spectrometry, using an instrument available in the Mass Spectrometry and Metabolomics Core (MSMC) on campus. The protocol for transferring the analogs to and analyzing the analogs in the MSMC is as follows: 1. The prepared analogs will be transferred from 205 Chemistry to 11 Biochemistry in secondary containment and only the specific samples to be analyzed each time will be transferred. 2. Two personnel will always transport the samples, with one wearing double gloves and carrying the analogs in a tray, while the second will carry a spill kit containing Narcan, a spray bottle containing a freshly prepared solution of OxiClean Versatile Stain Remover, and paper towels. 3. Once at the MSMC, the instrument septum, liner, and syringe will be replaced by MSMC staff and a series of solvent blanks will be analyzed. 4. Once the instrument is deemed sufficient for analysis (no contamination in solvent blanks), the analogs to be analyzed will be transferred to the autosampler tray and the sequence set up. Each sequence will include a minimum of three solvent blanks at the end, which will be used to assess the cleanliness of the GC column at the end of the analysis (no residual analogs present). 5. At the end of the sequence, the liner, septum, and syringe will be replaced by MSMC staff. The potentially contaminated liner and septum will be placed in a ziplock-type bag and treated as solid waste. The syringe will be rinsed thoroughly with solvent, placed in its original box, and returned with the analogs to 205 Chemistry. 6. The liquid waste from the syringe rinsing in Step 5 as well as liquid in the autosampler waste vial will be transferred to a scintillation vial clearly labeled as “Controlled Substances Hazardous Waste.” The scintillation vial will be transported back to 205 Chemistry along with the analogs. The liquid waste will be transferred into the Controlled Substances Hazardous Waste bottle, 40 which will be appropriately labeled and stored in the secure, enclosed area next to the controlled substances safe. 7. One laboratory personnel will remain with the analogs at all times during analysis in the MSMC. Section 12: SOP Review and Prior Approval I, the PI/Supervisor, grant the following laboratory personnel approval to perform the above SOP Name: _____Amber Gerheart_____________________________________________________ Name: _____Hannah Clause______________________________________________________ Name: _____Amanda Setser______________________________________________________ PI/Laboratory Supervisor signature: _____________________________________ Date: _____________ I have reviewed and understood this Standard Operating Procedure, and agree to abide by the protocols described herein: Signature: _____________________________________________________ Date: __________________ Signature: _____________________________________________________ Date: __________________ Signature: _____________________________________________________ Date: __________________ A completed copy of this Standard Operating Procedure has been reviewed and approved by MSU Office of Environmental Safety: MSU EHS Staff: __________________________________________________ Date: _______________ 41 Additional Reading 1. Froelich NM, Sprague JE, Worst TJ. Letter to the Editor – Elbow Grease and OxiClean™ for Cleaning fentanyl- and Acetylfentanyl-contaminated Surfaces. Journal of Forensic Sciences 2018 63 (1) 336. 2. Fentanyl. A Briefing Guide for First Responders. US Department of Justice, Drug Enforcement Administration. Available at https://www.nvfc.org/wp-content/uploads/2018/03/Fentanyl- Briefing-Guide-for-First-Responders.pdf (Accessed February 15, 2019). 3. Fentanyl. Safety Recommendations for First Responders. US Department of Justice, Drug Enforcement Administration. Available at https://www.dea.gov/sites/default/files/Publications/Final%20STANDARD%20size%20of%20Fen tanyl%20Safety%20Recommendations%20for%20First%20Respond....pdf (Accessed February 15, 2019). 42 Appendix: Fentanyl Analogs Included in SOP Sample preparation and handling of the following compounds are included in this SOP: 1. Fentanyl hydrochloride 2. Furanylethyl Fentanyl (hydrochloride) 3. alpha-methyl Acetyl Fentanyl (hydrochloride) 4. beta-hydroxythioacetylfentanyl 5. beta-hydroxy Fentanyl (hydrochloride) 6. 4'-methyl Fentanyl 7. Thiofentanyl (hydrochloride) 8. alpha-methyl Thiofentanyl 9. alpha-methyl Fentanyl 10. Butyryl Fentanyl (hydrochloride) 11. Isobutyryl Fentanyl (hydrochloride) 12. Acrylfentanyl (hydrochloride) 13. Cyclopropyl Fentanyl (hydrochloride) 14. Cyclopentyl Fentanyl (hydrochloride) 15. Tetrahydrofuran Fentanyl (hydrochloride) 16. Cyclohexyl Fentanyl (hydrochloride) 17. ortho-Methylfentanyl (hydrochloride) 18. meta-Methylfentanyl (hydrochloride) 19. para-Methylfentanyl (hydrochloride) 20. para-Methoxyfentanyl (hydrochloride) 21. para-Chlorofentanyl (hydrochloride) 22. para-Fluorofentanyl (hydrochloride) 23. ortho-Fluorobutyryl Fentanyl (hydrochloride) 24. para-Fluorobutyryl Fentanyl (hydrochloride) 25. meta-Fluorobutyryl Fentanyl (hydrochloride) 26. meta-Fluoroisobutyryl Fentanyl (hydrochloride) 27. ortho-Fluoroisobutyryl Fentanyl (hydrochloride) 28. FIBF (hydrochloride) 29. Ocfentanil 30. meta-Fluoro Methoxyacetyl Fentanyl (hydrochloride) 31. para-Fluoro Methoxyacetyl Fentanyl (hydrochloride) 43 ortho-fluorobutyryl fentanyl meta-fluorobutyryl fentanyl para-fluorobutyryl fentanyl ortho-fluoroisobutyryl fentanyl meta-fluoroisobutyryl fentanyl para-fluoroisobutyryl fentanyl Figure A2.1 Structures of all fentanyl analogs used in this work 44 Figure A2.1 cont’d ortho-fluoro methoxyacetyl fentanyl meta-fluoro methoxyacetyl fentanyl para-fluoro methoxyacetyl fentanyl butyryl fentanyl isobutyryl fentanyl acrylfentanyl 45 cyclopropyl fentanyl cyclopentyl fentanyl Figure A2.1 cont’d cyclohexyl fentanyl tetrahydrofuran fentanyl ortho-methylfentanyl meta-methylfentanyl 46 Figure A2.1 cont’d para-methylfentanyl para-methoxyfentanyl para-chlorofentanyl para-fluorofentanyl furanylethyl fentanyl α-methyl acetyl fentanyl 47 4’-methyl fentanyl Figure A2.1 cont’d thiofentanyl α-methyl thiofentanyl α-methyl fentanyl 48 Table A2.1 PCA R Code4 R Code Command getwd() setwd(“C:/directory”) data=read.table(“file name.txt”,header=TRUE) pca<-prcomp(data,scale=FALSE) print(pca) pca$rotation[,1:n] summary(pca) pca$x Identifies current directory Sets the directory containing data Inputs data, header=TRUE identifies the first column and row as headers Application of PCA to data Output for loadings values for all PCs Output for loadings values for only one through n number of PCs Output for scree plots for PCA Output for score values for PCs 49 Table A2.2 LDA R Code4 R Code Command getwd() setwd(“C:/directory”) data=read.table(“file name.txt”,header=TRUE) names(data)=c(“mass41”,”mass43”,…,”Class”) attach(data) library(MASS) data.lda=lda(Class~mass41+mass43+…,data,CV=1) data.lda train<-data[1:85,] test<-data[86:139,] Identifies current directory Sets the directory containing data Inputs data, header=TRUE identifies the first column and row as headers Names the variables in the top row of data sheet Attaches data Loads R package that applied LDA to data Preforms leave-one-out cross validation on dataset and provides output of results Identifies the training set by identifying which row contain the training set Identifies the test set by identifying which row contain the test set data.lda=lda(Class~mass41+mass43+…,data=train) Application of LDA to data data.lda Output for coefficientss of linear discriminants Obtains score values for training set Output for training set score values Application of test set to LDA model Output for probability of classification to each class for all training set samples Output for test set score values data.lda.values<-predict(data.lda,data[1:85,]) data.lda.values$x lda.pred<-predict(data.lda,test) lda.pred$posterior lda.pred$x 50 Table A2.3 SIMCA R Code R Code Command getwd() setwd(“C:/directory”) data=read.table(“file name.txt”) data2=data[,1:411] class=data[,412] X.c=data[1:44,] X.class=X.c[1:14,] library(mdatools) m.class=simca(X.class,'class',PC,alpha=n) m.class=selectCompNum(m.class,PC) Identifies current directory Sets the directory containing data Inputs data Identifies columns for variables Identifies column with class membership label Identifies rows with training set data Identifies rows containing class data, must be specified for each class Loads the R Package that applied SIMCA to data Sets SIMCA parameters for a class, must be specified for each class individually (X.class=class label, PC=number of PCs retained, n=alpha value) Sets number of PCs retained for a class, must be specified for each class individually m=simcam(list(m.class1,m.class2,m.class3,m.class4)) Compiles all classes together to summary(m) X.t=data[45:112,] c.t=data[45:112,412] print(m.class) print(m.class$calres$scores) print(m$modpower) perform multiclass SIMCA Output for a summary of all classes and parameters used for each class in multiclass SIMCA Identifies test set rows Identifies test set rows and which column has class membership label Output options for a class Output for scores values Output for modelling power values Cooman’s Plots Code plotCooman(m,c(1,2),show.labels=T) Plots the Cooman’s Plot of class 1 and class 2 in R (numbers can change based on which classes are being compared) 51 Table A2.3 cont’d Obtains Q values for the training set to specified class, n represents the number for a class (first class n=1, etc.) Obtains Q values for the test set to specified class, n represents the number for a class (first class n=1, etc.) Output for Q values corresponding to class identified in rn Output for Q critical limit boundary Residuals Plots Code Plots residuals plot for a class in R Output for T2 values for training set for indicated class Output for Q values for training set for indicated class Output for T2 values for cross validation of training set for indicated class Output of Q values for cross validation of training set for indicated class T2 critical limit boundary Q critical limit boundary rn=predict(m.class,X.c) rn=predict(m.class,X.t) rn$Q m.class$Qlim plotResiduals(m.class) print(m.class$calres$T2) print(m.class$calres$Q) print(m.class$cvres$T2) print(m.class$cvres$Q) m$T2lim m$Qlim 52 REFERENCES 53 REFERENCES (1) Cayman Chemical. Fentanyl Identification Cayman Currents. 28, Ann Arbor (2017). (2) Venables WN, Ripley BD (2002). Modern Applied Statistics with S, Fourth edition. Springer, New York. ISBN 0-387-95457-0, http://www.stats.ox.ac.uk/pub/MASS4/.LDA (3) Sergey Kucheryavskiy, mdatools – R package for chemometrics, Chemometrics and Intelligent Laboratory Systems, Volume 198, 2020 (DOI: 10.1016/j.chemolab.2020.103937 (4) Setser, A. L.; Waddell Smith, R. Comparison of Variable Selection Methods Prior to Linear Discriminant Analysis Classification of Synthetic Phenethylamines and Tryptamines. Forensic Chemistry. 2018, 11, 77–86. 54 3. LINEAR DISCRIMINANT ANALYSIS (LDA) FOR CLASSIFICATION OF FENTANYL ANALOGS ACCORDING TO STRUCTURAL SUBCLASS The gold standard for the analysis and identification of controlled substances is gas chromatography-mass spectrometry (GC-MS). Identification is made by comparing the retention time and mass spectrum of a sample to those of a reference standard. With the emergence of novel psychoactive substances (NPS), fentanyl analogs in particular, identification by mass spectral comparison may not be possible due to lack of reference materials. Different multivariate statistical methods have been investigated as a method to obtain further structural information or discriminate between structures.1-7 In this work, mass spectra of fentanyl analogs were subjected to linear discriminant analysis (LDA) to classify the analogs according to structural subclass. Different models were developed to investigate the effect of spectral variation within a peak, the effect of spectral variation over time, and the use of neutral losses (rather than fragment ions) as variables. Each model was validated, and external test sets were used to determine the success/accuracy in classifying fentanyl analogs according to structural subclass. 3.1 MASS SPECTRAL ANALYSIS OF FENTANYL ANALOGS The 28 fentanyl analogs investigated in this work were representative of four structural subclasses. The core structure of fentanyl, with initial cleavage sites, is shown in Figure 3.1. The four subclasses, which were determined based on the site of substitution on the core fentanyl structure, were as follows8: n-alkyl chain substituted (AN) subclass, aniline ring substituted (AR) subclass, amide group substituted (AG) subclass, and amide and aniline ring substituted (AA) subclass. The electron ionization (EI) fragmentation of these analogs has been hypothesized based on the known fragmentation of the core fentanyl structure.8 Regardless of the type of 55 substituent (e.g., halogen, methoxy, methyl, etc.), the site of substitution was expected to impact the order in which bonds cleave for the analogs. The first cleavage site was dependent upon where the substitution occurs on the structure.8 C B A Figure 3.1 Initial cleavage sites of fentanyl analogs A) cleavage of the amide group, B) cleavage on the piperidine ring, C) cleavage of the n-alkyl chain One compound from each of the four subclasses and its corresponding spectrum are shown in Figure 3.2, with the site of substitution highlighted. Representative spectra and structures of all compounds are shown in the appendix (Figure A3.1). The molecular ion was not observed for any of the fentanyl analogs. When spectra of analogs within a subclass were examined, there were similarities. For example, with the exception of α-methylfentanyl and α- methyl thiofentanyl, all of the AN subclass analogs had a base peak at m/z 245. However, similarities were also observed between subclasses. For example, a base peak at m/z 259 was observed in the spectra of α-methyl thiofentanyl (AN subclass) and ortho-methylfentanyl (AR 56 subclass). The variability in mass spectra within a subclass is due to the type of substitution which changes the base peak and many of the smaller mass fragments containing the substitution. Although the high intensity m/z ions are not always the same for every compound in a subclass, analogs with the same site of substitution are predicted to fragment in a similar manner. The predicted similarity in fragmentation supports the hypothesis that chemometric methods would be able identify characteristic ions and classify fentanyl analogs according to structural subclass using mass spectral data. All fragmentation comments in this work are hypothetical as no further chemical analysis was performed to obtain elemental formulae of fragment ions. 57 A) Thiofentanyl B) ortho-Methylfentanyl C) Cyclopropyl Fentanyl D) para-Fluorobutyrylfentanyl Figure 3.2 Mass spectra and chemical structures of selected fentanyl analogs A) thiofentanyl representing the AN subclass, B) ortho-methylfentanyl representing the AR subclass, C) cyclopropyl fentanyl representing the AG subclass, and D) para-fluorobutyrylfentanyl representing the AA subclass. 58 2004000.00.20.40.60.81.0Relative Intensitym/z245189146571112004000.00.20.40.60.81.0Relative Intensitym/z6943771461892022572004000.00.20.40.60.81.0Relative Intensitym/z431051602032592004000.00.20.40.60.81.0Relative Intensitym/z27720716410571 3.2 INITIAL LINEAR DISCRIMINANT ANALYSIS (LDA) MODELS TO ASSESS VARIATION WITHIN A CHROMATOGRAPHIC PEAK To develop classification models, the initial set of 28 analogs was split into a training set and a test set. For the initial models, the training set consisted of replicate spectra (n = 2) of 22 fentanyl analogs (44 spectra total). The test set consisted of replicate spectra (n = 4) of six analogs, along with additional spectra (n = 2) of the 22 training set compounds analyzed in two subsequent months. The six new analogs represented all four subclasses: two analogs were from the AA subclass, two analogs were from the AG subclass, one analog was from the AR subclass, and one analog was from the AN subclass. However, due to sample degradation, four spectra (representing four analogs) were excluded from the test set such that the final test set contained a total of 64 spectra. Forensic laboratories typically obtain the mass spectrum from the apex of the chromatographic peak; however, fragment ion intensities and ratios vary during elution of the chromatographic peak due to changes in concentration.9 When developing the LDA models, it was important to develop robust models by accounting for different factors that may have affected how the models performed. The first factor considered was the difference between classification success using the mass spectrum obtained from the apex of the chromatographic peak versus the average mass spectrum across the chromatographic peak. Thus, two data sets were generated: the first contained spectra collected at the apex for the training and test set and the second contained the average spectra collected across the peaks for the training and test set. Each data set was used to develop and validate LDA models and the classification success of each model was assessed. 59 3.2.1 Principal Components Analysis (PCA) for Variable Selection As LDA requires the number of variables to be less than the number of samples, the full mass spectrum (m/z 40-450, 411 variables) for each of the samples could not be used for modelling. Principal components analysis (PCA) was used as a dimensionality reducing method to identify the variables (m/z values) responsible for the most variance in the data set, to reduce the number of variables used in LDA. While PCA was conducted on both data sets (apex and average spectra), the following discussion focuses on the apex spectra. After examining the PCA data, only the first four principal components (PCs), which accounted for 68% of the total variance in the data set, were retained as they resulted in adequate separation among all four fentanyl analog subclasses. Scores plots representing the first four PCs are shown in Figure 3.3. 60 A) B) Figure 3.3 PCA scores plots of A) principal component 1 (PC1) vs principal component 2 (PC2), B) PC1 vs principal component 3 (PC3), and C) PC1 vs principal component 4 (PC4) 61 -1.201.2-1.501.5PC3 (14.5%)PC1 (27%)-1.501.5-1.501.5PC2 (17.7%)PC1 (27%) Figure 3.3 cont’d C) 62 -0.800.8-1.501.5PC4 (8.4%)PC1 (27%) The AA subclass was distinguished from the other three subclasses on PC1 (Figure 3.3A). The AG subclass and AN subclass were differentiated from the AR subclass on PC2. When PC3 was examined, no further separation among the subclasses was achieved (Figure 3.3B). However, on PC4, the AG subclass was differentiated from the AN subclass (Figure 3.3C). Positioning of each subclass can be explained with reference to the loading plots for each PC (Figure 3.4). The majority of the AA analogs were positioned negatively on PC1 and were thus distinguished from the other three subclasses. These analogs were positioned negatively on PC1 due to high intensities of m/z 43, 164, 207, and 277, which all contributed negatively to PC1 (Figure 3.4A). However, the duplicate spectra of ortho-fluoro methoxyacetyl fentanyl and meta- fluoro methoxyacetyl fentanyl were positioned close to zero on PC1 (Figure 3.3A). These samples were positioned around zero (as opposed to negatively on PC1) due to the low intensities of m/z 43, 164, 207, and 277, which all contributed negatively to PC1 (Figure 3.4B). The fluoro methoxyacetyl fentanyl isomers differed in structural substitutions from the other analogs in the AA subclass, which caused these isomers to instead have high intensities of m/z 42, 208, and 279, which minimally contributed to PC1 (Figure 3.4A). It should be noted that the other five AA analogs were all fluorobutyryl fentanyl or fluoroisobutyryl fentanyl isomers, which could cause a skewed representation of this subclass due to the similarity in structure and, therefore, mass spectral fragmentation. Due to the structural similarity, the analogs in the AA subclass are not truly representative of this structural subclass; however, there are not currently a variety of other analogs representative of this subclass available and these analogs are some of the more prominent fentanyl analogs submitted to operational forensic laboratories.10 63 A) B) Figure 3.4 Loadings plot for A) principal component 1 (PC1), B) principal component 2 (PC2), and C) principal component 4 (PC4) 64 -0.800.84090140190240290340390440PC2 Loadingsm/z-0.800.84090140190240290340390440PC1 Loadingsm/z Figure 3.4 cont’d C) 65 -0.800.84090140190240290340390440PC4 Loadingsm/z The AR subclass was positioned positively on PC2 while the other three subclasses were positioned negatively. Two analogs, ortho-methylfentanyl and para-methylfentanyl were positioned most positively due to high intensities of m/z 160, 203, 216 and 259, which all contributed positively to PC2 (Figure 3.4B). Para-methoxy fentanyl and para-chlorofentanyl were also positioned positively on PC2 due to high intensities of m/z 275 (base peak in para- methoxy fentanyl), m/z 279 (base peak in para-chlorofentanyl), m/z 91, and m/z 105, all of which were weighted positively on PC2 (Figure 3.4B). Two of these variables (m/z 91 and m/z 105) are common fragments in any compound that contains an aromatic ring. While para-fluorofentanyl also contained m/z 91 and m/z 105, this analog was positioned less positively (closer to zero) on PC2 than other AR analogs. The base peak in para-fluorofentanyl, m/z 263, was weighted negatively on PC2 although not strongly, which resulted in a more positive positioning of this analog on PC2. Additionally, the para-fluorofentanyl replicates were the only samples in this subclass positioned negatively on PC1, due to a high intensity of m/z 207. Although m/z 207 is a common background contaminant ion in mass spectral analysis, it was an important ion to take into consideration with fentanyl analogs that have fluorine-substituted aniline rings (Figure 3.5) 66 Figure 3.5 Predicted structure of fragment ion at m/z 207 The AN subclass was positioned negatively on PC2 due to high intensities of m/z 146 and m/z 245, while the AG subclass was positioned negatively on PC2 due to high intensity of m/z 146 (Figure 3.3A). The exceptions to this trend are isobutyryl fentanyl (AG subclass) and α- methylfentanyl (AN subclass). Isobutyryl fentanyl and α-methylfentanyl positioned positively due to high intensity of m/z 259, which contributed positively to PC2 (Figure 3.4B). While separation was not achieved on the first three PCs, the AG subclass and the AN subclass were distinguished on the fourth PC (Figure 3.3C). The AG subclass was positioned positively on PC4 due to high intensities of m/z 146, m/z 189, and the base peaks of most of the compounds in this subclass (m/z 243, 257, 287, and 299), which were all weighted positively on PC4 (Figure 3.4C). The exception was isobutyryl fentanyl, which had a base peak of m/z 259 that was not weighted positively on PC4. Instead, this ion was weighted negatively on PC4, 67 which caused the replicates to be positioned less positively than the other AG analogs. The AN subclass was positioned negatively due to high intensities of m/z 245 and m/z 259, which were the base peaks for all analogs in the AN subclass. The relative loadings across the first four PCs were used to determine the optimal number of variables to retain in the LDA model.11 Thresholds of 1.5%, 2.0%, and 2.5% were investigated (data not shown) and the 2% threshold, retaining 23 variables, was determined to have optimal leave-one-out cross validation success for LDA classification (Table 3.1). Table 3.1 Variables retained for LDA based on a relative loadings threshold of 2% 43 146 202 246 71 147 203 259 m/z 105 164 216 277 77 160 207 260 119 189 243 278 132 190 245 The majority of spectra in the training set contained all variables up to and including m/z 216 (Table 3.1). As all analogs contain the same fentanyl core structure, their fragmentation was predicted to be similar.8 This results in many of the same hypothetical fragment ions that would remain consistent amongst analogs, just a difference in the intensity of these ions. The retained variables greater than m/z 216 were all base peaks for analogs in each of the four subclasses (Table 3.1). Although they were all base peaks, they were not all the base peaks for all of the fentanyl analogs in this work. Some analogs had base peaks not retained by this LDA model, for example ortho-fluoro methoxyacetyl fentanyl which had a base peak of m/z 279. From an initial 68 comparison of the spectra and the retained variables, it was evident that all variables retained were in high intensity in at least some of the fentanyl analogs. However, not all the high intensity ions across all spectra were retained. For example, m/z 42, 56, and 69 all had a relatively high intensity (>25% of the base peak) in the spectra of multiple analogs but were not retained because they did not have a high contribution to the variation in the training set. Principal components analysis was also conducted on the average spectra collected across the chromatographic peaks for analogs in the training set. Positioning of analogs was similar to that observed from PCA of the apex mass spectral data. The associated scores and loadings plots for the averaged spectra are shown in the appendix (Figure A3.2-A3.3). Further, from an assessment of the relative loadings, the same 23 variables were retained for the average spectra (Table 3.1). 3.2.2 Linear Discriminant Analysis (LDA) Models Using the 23 variables determined by PCA, two LDA models were developed, one using the spectra collected at the apex of the chromatographic peak (referred to as the apex model) and the other using the average spectra from across the chromatographic peaks (referred to as the average model). Figure 3.6 shows the LDA scores plots for both models. 69 B) D) A) C) Figure 3.6 Scores plots for the apex data A) linear discriminant 1 (LD1) vs linear discriminant 2 (LD2), B) LD1 vs linear discriminant 3 (LD3), and scores plots for the average data C) LD1 vs LD2, D) LD1 vs LD3 70 -15015-1000100LD3 (0.5%)LD1 (90.8%)-25025-1000100LD2 (8.7%)LD1 (90.8%)-10010-1000100LD3 (0.4%)LD1 (91.6%)-30030-1000100LD2 (8%)LD1 (91.6%) For the apex model (Figures 3.6A and B), separation was observed between the AN subclass which positioned negatively on LD1 and the AG subclass which positioned positively on LD1. There was no separation between the AR subclass and the AA subclass until LD3, on which the AR subclass positioned positively and the AA subclass positioned negatively. When the apex model was compared to the average model (Figure 3.6C and D), the subclasses were positioned very similarly on all three LDs. The leave-one-out cross validation for both models was 100% and the classification success was 98% with only one sample (para-fluoro methoxyacetyl fentanyl, vide infra) misclassified. Although the average spectra account for variability in ion intensities within a peak, the apex spectra were still sufficiently representative, as comparable classification results were observed. As spectra are typically collected at the apex of the chromatographic peak in forensic laboratories, the apex spectra were used for all subsequent modelling using LDA. When the LD1 versus LD2 scores plot was examined, there was separation between the AN subclass and the AG subclass on LD1 (Figure 3.6A). The AN subclass was positioned negatively on LD1 due to higher intensities of m/z 77, 202, 203, and base peaks m/z 245 and m/z 259 (the only base peaks in this subclass), which all contributed negatively to LD1 (Figure 3.7A). The AG subclass was positioned positively on LD1 due to high intensities of m/z 132 and m/z 190. 71 A) B) Figure 3.7 Coefficients of A) linear discriminant 1 (LD1), B) linear discriminant 2 (LD2), and C) linear discriminant 3 (LD3) 72 -110001100437177105119132146147160164189190202203207216243245246259260277278Coefficients of LD2m/z-150001500437177105119132146147160164189190202203207216243245246259260277278Coefficients of LD1m/z Figure 3.7 cont’d C) 73 -130001300437177105119132146147160164189190202203207216243245246259260277278Coefficients of LD3m/z Both the AG subclass and the AN subclass were positioned positively on LD2, while the AR subclass and the AA subclass were positioned negatively (Figure 3.6A). The AG subclass and AN subclass were positioned positively due to high intensities of m/z 77, 147, and 189, while the AA subclass and AR subclass positioned negatively due to high intensity m/z 43, 105, and 160 (Figure 3.7B). The AA subclass and the AR subclass were not distinguished on LD1 or LD2, but separation between these groups was achieved on LD3 (Figure 3.6B). The largest negative loading was m/z 190, but did not provide separation between the AA and AR subclasses. This variable was observed in the AG and AN subclasses and explains the spread of the subclasses around LD3. The largest positive loading was m/z 260, which was observed in high intensity in the methylfentanyl isomers (AR subclass). This contributed to more positive positioning of these isomers on LD3. The AR subclass was positioned positively on LD3 due to high intensities of m/z 77 and m/z 160, which both contributed positively to LD3 (Figure 3.7C). The AA subclass was positioned negatively on LD3 due to high intensities of m/z 105 and m/z 277, the latter of which was the base peak for the fluorobutyryl fentanyl and fluoroisobutyryl fentanyl isomers. Although these variables had minimal contribution to LD3, analogs in the AA subclass either lacked or had a low intensity of the other variables that contributed positively to LD3 (Figure 3.7C). The average LDA coefficients of linear discriminants are shown in the appendix (Figure A3.4). When the scores plot for LD1 versus LD3 was examined, the separation between the AR subclass and the AA subclass was minimal (Figure 3.6B). The predicted fragmentation of fentanyl analogs with a substitution at the amide group and/or the aniline ring was that the first fragment was a result of an α-β cleavage of the n-alkyl chain (Figure 3.1C) and the second 74 fragment generated by a cleavage of the amide group (Figure 3.1A). However, it was predicted that a larger substituent on the amide group could cause the amide group to cleave first (Figure 3.1A).11 Figure 3.8 shows para-fluorofentanyl from the AR subclass and para-fluorobutyryl fentanyl from the AA subclass with these initial cleavage sites highlighted. These two cleavages would result in a consistent fragment containing the aniline ring and piperidine ring for analogs in both the AR and AA subclasses. There were nine analogs in the AA subclass with a fluorine substituent on the aniline ring and one analog in the AR subclass with a fluorine substituent. This caused a common fragment between both subclasses, m/z 207 (Figure 3.5). Additionally, once the fluorine substituent cleaved off the aniline ring, one would expect there to be similar fragments between all nine of the AA subclass analogs and at least three of the AR subclass analogs. The minimal separation of these subclasses on the scores plot was consistent with the fact that their predicted fragmentation patterns would be very similar, making it harder to differentiate between these two subclasses of fentanyl analogs. Due to this similarity it was predicted that any new samples belonging to either of these subclasses may be misclassified due to these similarities in fragmentation. It is worth noting that these classes were readily distinguished in the PCA scores plot (Figure 3.3A). However, for PCA, all m/z values in the scan range were included in the data set whereas, for LDA, a reduced number of variables was used for modelling. As a result, there was less separation of these two classes by LDA. 75 A) Chemical Formula: C4H7O+ Nominal Mass: 71 Da Chemical Formula: C7H7 + Nominal Mass: 91 Da Chemical Formula: C12H16FN2 + Nominal Mass: 207 Da B) Chemical Formula: C3H5O+ Nominal Mass: 57 Da Chemical Formula: C7H7 + Nominal Mass: 91 Da Chemical Formula: C12H16FN2 + Nominal Mass: 207 Da Figure 3.8 Predicted fragments of A) para-fluorofentanyl from the AR subclass and B) para- fluorobutyryl fentanyl from the AA subclass 76 Following the leave-one-out cross validation of the model, which resulted in 100% cross validation success, the model was then used to classify the test set. One replicate of para-fluoro methoxyacetyl fentanyl was misclassified, resulting in a successful classification rate of 98%. Para-fluoro methoxyacetyl fentanyl is an AA analog, but this replicate was misclassified as an AR analog (the other three replicates were correctly classified). The misclassification was likely due to an unusually high background in the sample which resulted in a higher intensity of m/z 207 than would be expected. As m/z 207 contributes positively to LD3, this caused para-fluoro methoxyacetyl to position more positively than its correct class. Overall, for both the apex and average LDA models, the successful classification rate was 98%, with one analog (para-fluoro methoxyacetyl) misclassified. 3.3 REFINED LINEAR DISCRIMINANT ANALYSIS (LDA) MODEL TO INCORPORATE INSTRUMENT VARIATION The second factor that was investigated was the effect of instrument variation over time on the rate of successful classification. The training set was redefined to include 22 analogs analyzed across four months, with three replicates from the last collection removed due to sample degradation (three total spectra). This resulted in a smaller test set; however, no samples in the test set were represented in the training set. The test set now consisted of six analogs analyzed across four months, with one replicate removed due to sample degradation (one spectrum). Additionally, the refined model was developed using mass spectra collected at the apex of the chromatographic peaks as no significant difference in classification success was observed using apex and average spectra in the initial model (Section 3.2). 77 3.3.1 Refined LDA Model for Classification of Fentanyl Analogs The full mass spectra of all training set analogs and replicates were input into PCA to identify the characteristic m/z values to retain for LDA model development. Only the first four PCs were retained as there was separation among all four subclasses. They accounted for 66% of the variation (compared to 68% for PCA based on the initial training set, Section 3.2.1). The scores plot for the first four PCs is shown in Figure 3.9. 78 A) B) Figure 3.9 Principal components analysis scores plot of A) PC1 vs PC2, B) PC1 vs PC3, C) PC1 vs PC4 79 -1.501.5-1.501.5PC2 (17.1%)PC1 (27.7%)-1.201.2-1.501.5PC3 (13.6%)PC1 (27.7%) C) Figure 3.9 cont’d 80 -1.201.2-1.501.5PC4 (8.1%)PC1 (27.7%) The PCA scores plot of PC1 versus PC2 showed separation of the AA subclass, which was positioned negatively on PC1, from the other three subclasses, which were positioned positively on PC1. The AN and AG subclasses were also separated from the AR subclass on PC2, as the former were positioned negatively, and the latter was positioned positively on PC2 (Figure 3.9A). The scores plot of PC1 versus PC3 (Figure 3.9B) did not provide further separation among the subclasses. When the scores plot of PC1 versus PC4 (Figure 3.9C) was examined, there was separation between the AG subclass and the AN subclass, as the AN subclass positioned negatively on PC4 and the AG subclass positioned positively. The positioning of all samples could be explained with reference to the loadings plots in the appendix (Figure A3.5). The relative loadings across all four PCs were calculated to determine the variables that should be retained for LDA. At a threshold of 2%, 25 variables were retained to develop the refined LDA model (Table 3.2). Table 3.2 Variables retained for the refined LDA model, as determined by PCA 41 105 160 203 246 43 119 164 207 259 71 146 190 243 277 77 147 202 245 278 m/z 57 132 189 216 260 81 The 25 variables retained for the refined LDA model included the same 23 variables that were retained for the initial model (Table 3.1). The additional variables retained were m/z 41 and m/z 57, which were observed in analogs across all four subclasses. The 25 variables were used to develop the LDA model and resulted in 100% successful leave-one-out cross validation. The LDA scores plots for the refined LDA model had tighter groupings of each subclass, which indicated less within-class variation (Figure 3.10). The initial LDA models (Figure 3.6) had more spread in sample positioning, which supported that the refined model, which accounted for instrument variation, optimized the LDA modelling. The AN subclass was positioned negatively on LD1 and the AG subclass was positioned positively on LD1 (Figure 3.10A). Although the AR subclass and the AA subclass were not separated on LD1 or LD2, they were able to be differentiated on LD3 (Figure 3.10B). The positioning of the subclasses on the LDs could be explained with reference to the coefficients of linear discriminants plots (Figure A3.6). The incorporation of instrument variation allowed for 100% correct classification of all test set samples due to enhanced separation between subclasses. 82 A) B) Figure 3.10 Scores plot for the refined LDA model A) LD1 vs LD2, B) LD1 vs LD3 83 -10010-40040LD3 (3.5%)LD1 (77.9%)-15015-40040LD2 (18.6%)LD1 (77.9%) 3.3.2 Additional Test Sets to Validate the Linear Discriminant Analysis (LDA) Model To investigate the applicability of the refined LDA model, two external test sets of mass spectra collected on different instruments and using different methods were applied. The first test set, Test Set 1, contained spectra of 42 non-fentanyl NPS compounds, including phenethylamines, tryptamines, and cathinones (Table 3.3).11,12 Full chemical names of the non- fentanyl NPS compounds can be found in the appendix (Table A3.1). The variables for these compounds varied significantly from the fentanyl analogs. Since LDA only took 25 variables into account, optimized for fentanyl analogs, less than half of the variables selected for model development were observed in Test Set 1. Table 3.3 List of non-fentanyl NPS compounds in the external test set Phenethylamines Tryptamines FMA* EMC* 4-hydroxy-N,N-Dimethyltryptamine 5-methoxy-N,N-Dimethyltryptamine APB* NBOMe* 2C* 2-FMA 2-EMC 4-APB 2CB 2CC 5-methoxy-N,N-Diisopropyltryptamine 3-FMA 3-EMC 5-APB 2CD 6-APB 4-FMA 4-EMC 7-APB 2CE 2CG 4E-APB 2CH 4M-APB 2CI 2CN 2CP 3,4-DMA 2CT 4-hydroxy Diethyltryptamine 4-methyl-α-Ethyltryptamine 25-B 25-C 25-D 25-E 25-G 25-H 25-N 25-P 25-T N,N-Dipropyltryptamine N,N-Dimethyltryptamine 5,7-Dichlorotryptamine α-Ethyltryptamine α-methyl Tryptamine *APB – aminopropyl benzofuran *NBOMe – N-methoxybenzyl *2C – 2,5-dimethoxy *FMA – fluoromethamphetamine *EMC - ethylmethcathinone 84 Even though the samples in the external test set were not fentanyl analogs, LDA is a hard classification method and forces classification. All Test Set 1 samples were classified to an available subclass with a posterior probability of 1 (the highest probability); however, when the scores plots were examined it was clear these samples did not belong to any of the subclasses (Figure 3.11). This highlights that the posterior probabilities should not be considered alone but should be examined in conjunction with the scores plots. All samples from the external test set were positioned far outside the grouping, and centroids, of the fentanyl subclasses. However, there were two exceptions: the fluoromethamphetamine (FMA) and ethylmethcathinone (EMC) isomers, which positioned closely to the AR subclass (Figure 3.11D) 85 A) C) B) D) Figure 3.11 Scores plot for the refined LDA model A) LD1 vs LD2, B) enlarged LD1 vs LD2, C) LD1 vs LD3, D) enlarged LD1 vs LD3 These isomers have mass spectra with very few ions (Figure 3.12). The two most intense ions in each spectrum were not variables retained in the LDA model, so the variables used in LDA had low intensity. The LDA model used 25 variables to model new samples, but the EMC and FMA isomers contained at most only 13 of the 25 variables. The low intensity of the few 86 -10010-40040LD3 (3.5%)LD1 (77.9%)-150001500-800008000LD3 (3.5%)LD1 (77.9%)-20020-40040LD2 (18.6%)LD1 (77.9%)-800008000-800008000LD2 (18.6%)LD1 (77.9%) ions contributed to the samples positioning close to zero, indicating that minimal variation was able to contribute to positioning of samples. A) B) Figure 3.12 Representative spectrum of A) 2-EMC and B) 2-FMA Test Set 2 contained mass spectra of six case samples obtained from the Michigan State Police Forensic Science Division. These samples had previously been analyzed by GC-MS and the fentanyl analog identified based on mass spectral comparison to a reference standard. The six samples and the corresponding spectra are shown in Figure 3.13. 87 02004000.00.20.40.60.81.0Relative Intensitym/z587702004000.00.20.40.60.81.0Relative Intensitym/z5895 A) Carfentanil B) Methoxy Acetyl Fentanyl Furanyl Fentanyl Valeryl Fentanyl C) D) E) Acetyl Fentanyl F) 3’-Methylfentanyl Figure 3.13 Structures and spectra of case samples for A) carfentanil, B) methoxy acetyl fentanyl, C) furanyl fentanyl, D) valeryl fentanyl, E) acetyl fentanyl, F) 3’-methylfentanyl 88 2004000.00.20.40.60.81.0Relative Intensitym/z23118814643912004000.00.20.40.60.81.0Relative Intensitym/z273189146105422004000.00.20.40.60.81.0Relative Intensitym/z952832401582004000.00.20.40.60.81.0Relative Intensitym/z24518914693422022004000.00.20.40.60.81.0Relative Intensitym/z303187105422432004000.00.20.40.60.81.0Relative Intensitym/z261218158105 Of the six case samples, four of the analogs identified belonged to the AG subclass (valeryl fentanyl, acetyl fentanyl, methoxy acetyl fentanyl and furanyl fentanyl, Figure 3.13B- E), one belonged to the AN subclass (3’-methylfentanyl, Figure 3.13F), and one (carfentanil, Figure 3.13A) did not belong to any of the four structural subclasses defined in this work. These spectra contained fewer fragment ions and at lower intensities than observed in the fentanyl analogs used to develop and optimize the models. Each spectrum only contained 20 m/z values, indicating an instrument parameter set to retain only the 20 most intense ions. The LDA model used 25 variables to classify each sample; however, due to the small number of ions in the full spectrum, only 15-50% of the variables were accounted for in the LDA model, depending upon the analog. This data indicated that the lower intensity ions may be responsible for separation and classification of analogs. Additionally, the case samples were likely in much lower concentrations than the samples that were prepared at MSU, which could cause variations in ion intensity not accounted for in the model. When the case samples were applied to LDA, valeryl fentanyl was the only sample that was correctly classified. Five of the six case samples were misclassified in LDA, likely because the case samples had very few variables. Valeryl fentanyl likely only classified correctly due to a high intensity of m/z 132 and m/z 190, without the presence of other ions contributing highly to the other LDs. As stated previously, the low number of variables was likely due to an instrument parameter and the concentration of the controlled substances in case samples being very low. This highlights the need for concentration to be incorporated in multivariate statistical models in order to correctly classify a wide variety of samples. 89 3.4 APPLICATION OF NEUTRAL LOSS SPECTRA TO REFINE THE LINEAR DISCRIMINANT ANALYSIS (LDA) MODEL The use of neutral loss spectra in multivariate methods and for obtaining structural information about unknowns has been demonstrated.13,14 This work explored the use of neutral losses, rather than fragment ions, for fentanyl analog classification according to structural subclass. To obtain representative neutral loss spectra, high-resolution mass spectrometry must be used to determine elemental formulae of fragment ions. As high-resolution mass spectrometry was not used here, this work represents a very preliminary investigation into the potential to classify analogs according to subclass based on neutral losses. The neutral loss spectra used here were generated by subtracting every m/z value from the base peak value for each analog. It was known that all neutral loss fragments do not derive from the base peak, but this was used as preliminary attempt to explore the potential for classifying fentanyl analogs according to structural subclass based on common neutral losses. The intensity of each neutral loss was assumed to be consistent with the intensity of the resultant m/z fragment in the mass spectrum. Because the neutral loss spectra were not obtained using high-resolution mass spectrometry, all predicted neutral loss spectra and fragments are hypothetical. 3.4.1 Neutral Loss Spectra of Fentanyl Analogs The first step was to identify common neutral losses within each subclass. As an example, consider ortho-methylfentanyl and para-methoxy fentanyl. These two analogs are members of the AR subclass and representative mass spectra are shown in Figure 3.14. Because of the different substitutions, the mass spectra of these two analogs were different: for example, the base peak in ortho-methylfentanyl was at m/z 259 with other dominant ions at m/z 160, 203, and 216. However, in para-methoxy fentanyl, the base peak was at m/z 275 with other dominant 90 ions at m/z 176, 219, and 232. Despite the different masses of the fragment ions, both compounds had the same neutral losses, which are highlighted in red in Figure 3.14. An important note is that this work assumes all neutral losses were derived from the base peak, which is not necessarily true. A neutral loss of 99 could consist of two neutral losses 43 and 56; however, without knowledge of the elemental formulae for fragment ions, this assumption was used for the preliminary investigation. A neutral loss spectrum for one compound from each of the four subclasses is shown in Figure 3.15. Neutral loss spectra for all other analogs are shown in the appendix (Figure A3.7). 91 A) ortho-methylfentanyl A) B) para-methoxy fentanyl B) Figure 3.14 Mass spectrum with common neutral losses highlighted for A) ortho-methylfentanyl and B) para-methoxy fentanyl 92 1602032161721469956438711325911814123217618816213499564387113141275219 A) Thiofentanyl B) ortho-Methylfentanyl C) Cyclopropyl Fentanyl D) para-Fluorobutyrylfentanyl Figure 3.15 Neutral loss spectra and chemical structures of selected fentanyls A) thiofentanyl representing the AN subclass, B) ortho-methylfentanyl representing the AR subclass, C) cyclopropyl fentanyl representing the AG subclass, and D) para-fluorobutyrylfentanyl representing the AA subclass. 93 01002000.00.10.20.30.40.50.60.70.8Relative IntensityNeutral Loss (Da)203995613416801002000.00.10.20.30.40.50.6Relative IntensityNeutral Loss (Da)1882166811118015216101002000.00.20.40.60.8Relative IntensityNeutral Loss (Da)4356992172021681540501001502002500.00.10.20.30.40.5Relative IntensityNeutral Loss (Da)11370234206172 3.4.2 Application of Linear Discriminant Analysis (LDA) to Neutral Loss Spectra for Classification of Fentanyl Analogs The 28 fentanyl analogs and replicates were divided into the same training and test sets that were used in the refined model (i.e., accounting for variations in spectra as a function of time, Section 3.3). The full neutral loss spectra for all training set analogs were subjected to PCA to determine which variables would be retained for LDA. The PCA data were examined and the four subclasses were differentiated across the first four PCs, which accounted for 72% of the total variation in the data set, 4% more variation than the refined LDA model (Figure 3.16). 94 A) B) Figure 3.16 PCA scores plot for neutral loss LDA model A) PC1 vs PC2, B) PC1 vs PC3, C) PC1 vs PC4 95 -101-101PC3 (10.3%)PC1 (37.5%)-101-101PC2 (16.8%)PC1 (37.5%) Figure 3.16 cont’d C) 96 -101-101PC4 (7.6%)PC1 (37.5%) The AA subclass was differentiated on PC1 by positioning negatively due to a higher intensity of neutral loss 70, 71, 113, and 234, which were all weighted negatively on PC1 (Figure 3.17A). Replicates (n = 4) of isobutyryl fentanyl (AG subclass) were also positioned negatively on PC1, close to the AA subclass, specifically the fluoro methoxyacetyl fentanyl analogs. Isobutyryl fentanyl positioned negatively on PC1 due to high intensities of neutral losses 70 and 113. The AG subclass was differentiated on PC2 by positioning positively due to higher intensities of neutral losses 153, 188, and 216 (Figure 3.17B). When PC3 was examined, the isobutyryl fentanyl replicates were differentiated from the fluoro methoxyacetyl fentanyl analogs (Figure 3.17C). The fluoro methoxyacetyl fentanyl analogs positioned positively on PC3 due to high intensities of neutral losses 43, 174, 188, 234, and 237, and the isobutyryl fentanyl replicates positioned negatively due to high intensities of neutral losses 70, 113, and 216. The AR and AN subclasses were differentiated on PC4 (Figure 3.17D). The AR subclass positioned positively on PC4 due to high intensities of neutral losses 56 and 141, while the AN subclass positioned negatively due to high intensities of neutral losses 43, 113, 188 and 201. It was predicted that the neutral loss data would provide more separation among subclasses. However, when the PCA scores plots were examined, they did not show enhanced separation among the subclasses as predicted. The refined PCA model using mass spectral data showed less spread among of the four subclasses (Figure 3.9). 97 A) B) Figure 3.17 Neutral loss PCA loadings plots for A) PC1, B) PC2, C) PC3, and D) PC4 98 -0.600.615099148197246PC2 Loadingsm/z-0.800.815099148197246PC1 Loadingsm/z Figure 3.17 cont’d C) D) 99 -0.500.515099148197246PC4 Loadingsm/z-0.500.515099148197246PC3 Loadingsm/z The relative loadings were calculated across the first four PCs. A threshold of 3.5% was determined to be optimal for LDA classification by resulting in 100% successful leave-one-out cross validation, which retained 17 variables that were used in LDA (Table 3.4). Table 3.4 Variables retained for neutral loss LDA model, as determined by the 3.5% threshold of the PCA data Neutral Loss (Da) 97 70 168 154 233 234 99 188 235 113 203 56 149 206 69 152 215 The variables retained were representative of neutral losses in all four subclasses. When the neutral loss spectra were examined, neutral losses 69, 70, 206, 233, 234, and 235 were observed in the AA subclass. The AG subclass contained neutral losses 188 and 215 in high intensity. The AR subclass contained neutral losses 56, 97, 99, and 113. The AN subclass contained neutral losses 56, 97, 99, 113, and 203. The other variables retained were observed in multiple analogs and subclasses. Also, due to the way in which the neutral loss spectra were generated, some of the retained variables were isotope peaks of a high intensity ion in the mass spectra, for example neutral loss 234 was the high intensity ion and neutral losses 233 and 235 were likely isotope peaks. The 17 variables were used to develop the LDA model and resulted in 100% successful leave-one-out cross validation. The LDA scores plots demonstrated the separation among fentanyl subclasses (Figure 3.18). 100 A) B) Figure 3.18 Scores plot for neutral loss LDA model A) LD1 vs LD2 and B) LD1 vs LD3 101 -808-10010LD3 (16%)LD1 (63%)-16016-15015LD2 (21%)LD1 (63%) The AG and AA subclasses were differentiated from the AN and AR subclasses on LD1. The AG and AA subclasses positioned negatively due to neutral losses 70, 188, 206, and 215 and the AN and AR subclasses positioned positively due to neutral losses 56, 149, 154, and 168 (Figure 3.19A). The AR and AN subclasses were differentiated on LD2. The AR subclass positioned positively due to neutral losses 56 and 168, while the AN subclass positioned negatively due to neutral losses 149, 154, and 203 (Figure 3.19B). The AG and AA subclasses were differentiated on LD3. The AG subclass positioned positively on LD3 due to neutral losses 97, 206, and 215, while the AA subclass positioned negatively due to neutral losses 69, 113, and 234 (Figure 3.19C). Once again, the LDA model developed with neutral losses showed more spread in subclass grouping than the refined LDA model (Figure 3.10). The refined LDA model with mass spectral data showed tight grouping of all subclasses and test set samples. For the neutral loss data, it was predicted that it would provide more separation between groups by reducing within- class variation, allowing for between-group variation to be maximized. These results do not support the theory that neutral loss data would enhance separation between groups; however, this is likely due to the manner in which the neutral losses were generated. With further investigation and identification of neutral losses, this model could be refined to be more specific, potentially providing the predicted tighter grouping of subclasses. Although there was more spread in the subclass groupings in the neutral loss model, when the test set was applied there was 100% successful classification. These classification results are comparable to the refined LDA model in Section 3.3, which supports the idea that these analogs could be classified according to structural subclass using neutral loss spectra. With high-resolution mass spectrometry, elemental formulae of fragment ions could be determined; 102 and along with the mass difference, the data could be used to identity the chemical composition of neutral loss fragments. The LDA model could be further refined to contain specific neutral loss fragments characteristic of each of the four subclasses. 103 A) B) Figure 3.19 Coefficients for neutral loss LDA model A) LD1, B) LD2, and C) LD3 104 -400405669709799113149152154168188203206215233234235Coefficients of LD2m/z-13001305669709799113149152154168188203206215233234235Coefficients of LD1m/z Figure 3.19 cont’d C) 105 -700705669709799113149152154168188203206215233234235Coefficients of LD3m/z 3.5 SUMMARY OF LINEAR DISCRIMINANT ANALYSIS (LDA) MODELS The first factor explored in this work was the effect of spectral variation within a peak on LDA classification of fentanyl analogs. There was minimal difference in the rate of successful classification for LDA models developed based on apex and average spectra across the full width at half maximum. Both models resulted in 98% correct classification, with one of the para-fluoro methoxyacetyl fentanyl replicates misclassified. These results showed that using the average spectra provided no benefit to classification and therefore supported using the mass spectra collected at the apex of the peak in model development and application. When the training set was refined to incorporate instrument variation, the LDA model was improved. The test set resulted in 100% correct classification. This model indicates that LDA was a suitable method to differentiate fentanyl analogs into four structural subclasses; however, the models were only tested with a very limited test set that contained only six new compounds. A much larger test set would need to be applied to validate the model for the wide range of potential analogs that a forensic laboratory may encounter. Additionally, three analogs in the test set were positional isomers of compounds in the training set. This may indicate a higher confidence in unknown classification than would be true for analogs without positional isomers represented in the training set. To further investigate the applicability of the refined LDA model, two external test sets were applied. When Test Set 1 (non-fentanyl samples) was applied, all samples were classified incorrectly because LDA is a hard classification method. When examining the scores plots, the FMA and EMC samples were positioned closely to the AR subclass. This could potentially mislead analysts to classifying unknowns as fentanyl analogs when they are not. When Test Set 2 (fentanyl samples) was applied, only one of the six case samples was correctly classified. The 106 case samples, likely in much lower concentration than the fentanyl standards used in this work, highlighted the need to account for concentration in model development. Changes in concentration result in mass spectral variation and can result in incorrect classification. In order to make robust models applicable to forensic laboratories, concentration should be a factor in developing the training set. The application of LDA to neutral loss spectra, rather than mass spectra, also showed potential for fentanyl analog classification. The neutral loss LDA model resulted in 100% correct classification of the test set, comparable to results obtained for the refined LDA model in Section 3.2. Although the model did not result in less within-class variation as expected, this is likely due to the method in which the neutral loss spectra were generated. The neutral loss spectra were generated to test the potential of these data to classify structurally similar compounds with different substituents. To truly use neutral loss data for classification, the chemical identity of the mass fragments and neutral loss fragments must be investigated using alternative mass spectrometry techniques. 107 APPENDIX 108 para-fluorobutyryl fentanyl ortho-fluorobutyryl fentanyl meta-fluorobutyryl fentanyl para-fluoro methoxyacetyl fentanyl ortho-fluoro methoxyacetyl fentanyl meta-fluoro methoxyacetyl fentanyl Figure A3.1 Mass spectra of all fentanyl analogs 109 2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z Figure A3.1 cont’d para-fluoroisobutyryl fentanyl ortho-fluoroisobutyryl fentanyl meta-fluoroisobutyryl fentanyl cyclohexyl fentanyl cyclopropyl fentanyl cyclopentyl fentanyl 110 2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z Figure A3.1 cont’d butyryl fentanyl isobutyryl fentanyl acrylfentanyl tetrahydrofuran fentanyl para-methylfentanyl ortho-methylfentanyl 111 2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z Figure A3.1 cont’d meta-methylfentanyl para-methoxyfentanyl para-chlorofentanyl para-fluorofentanyl furanylethyl fentanyl α-methyl acetyl fentanyl 112 2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z Figure A3.1 cont’d α-methyl thiofentanyl α-methylfentanyl 4’-methylfentanyl thiofentanyl 113 2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z2004000.00.20.40.60.81.0Relative Intensitym/z A) B) Figure A3.2 Initial average model PCA scores plots for A) PC1 vs PC2, B) PC1 vs PC3, and C) PC1 vs PC4 114 -1.501.5-1.501.5PC2 (17.7%)PC1 (26.9%)-1.201.2-1.501.5PC3 (14.6%)PC1 (26.9%) Figure A3.2 cont’d C) 115 -101-1.501.5PC4 (8.4%)PC1 (26.9%) A) B) Figure A3.3 Initial average model PCA loadings plots for A) PC1, B) PC2, C) PC3, and D) PC4 116 -0.800.84090140190240290340390440PC1 Loadingsm/z-0.800.84090140190240290340390440PC2 Loadingsm/z Figure A3.3 cont’d C) D) 117 -0.800.84090140190240290340390440PC3 Loadingsm/z-0.800.84090140190240290340390440PC4 Loadingsm/z A) B) Figure A3.4 Initial average LDA model coefficients of A) LD1 and B) LD3 118 -200002000437177105119132146147160164189190202203207216243245246259260277278Coefficients of LD3m/z-100001000437177105119132146147160164189190202203207216243245246259260277278Coefficients of LD1m/z A) B) Figure A3.5 Refined PCA loadings plots for A) PC1, B) PC2, C) PC3, and D) PC4 119 -0.700.74090140190240290340390440PC2 Loadingsm/z-0.700.74090140190240290340390440PC1 Loadingsm/z C) D) Figure A3.5 cont’d 120 -0.700.74090140190240290340390440PC4 Loadingsm/z-0.700.74090140190240290340390440PC3 Loadingsm/z A) B) Figure A3.6 Refined LDA model coefficients of A) LD1 and B) LD3 121 -25002504143577177105119132146147160164189190202203207216243245246259260277278Coefficients of LD1m/z-40004004143577177105119132146147160164189190202203207216243245246259260277278Coefficients of LD3m/z Table A3.1 Chemical names of non-fentanyl NPS compounds Abbreviation Chemical Name Abbreviation 4-APB 5-APB 6-APB 7-APB 4E-APB 4-(2- aminopropyl)benzofuran 5-(2- aminopropyl)benzofuran 6-(2- aminopropyl)benzofuran 7-(2-aminopropyl) benzofuran 4-(2-ethylaminopropyl) benzofuran 2CB 2CC 2CD 2CE 2CG 4M-APB 4-(2-methylaminopropyl) benzofuran 2CH 2CI 2CN 2CP 2CT 25-B 25-C 25-D 25-E 25-G 25-H 4-bromo-2,5-dimethoxy- N-[(2- methoxyphenyl)methyl]- benzeneethanamine 2-(4-chloro-2,5- dimethoxyphenyl)-N-(2- methoxybenzyl)ethanami ne 2-(2,5-dimethoxy-4- methylphenyl)-N-(2- methoxybenzyl)ethanami ne 2-(4-ethyl-2,5- dimethoxyphenyl)-N-(2- methoxybenzyl)ethanami ne 2,5-dimethoxy-N-[(2- methoxyphenyl)methyl]- 3,4-dimethyl- benzeethanamine 2-(2,5- dimethoxyphenyl)-N-(2- methoxybenzyl) ethanamine Chemical Name 2,5-dimethoxy-4- bromophenethylamine 2,5-dimethoxy-4- chlorophenethylamine 2,5-dimethoxy-4- methylphenethylamine 2,5-dimethoxy-4- ethylphenethylamine 3,4-dimethyl-2,5- dimethoxyphenethylamin e 2,5- dimethoxyphenethylamin e 2,5-dimethoxy-4- iodophenethylamine 2,5-dimethoxy-4- nitrophenethylamine 2,5-dimethoxy-4- propylphenethylamine 2,5-dimethoxy-4- methylthiophenethylamin e 4-hydroxy-N,N- Dimethyltryptamine 3-[2- (dimethylamino)ethyl]- 1H-indol-4-ol 5-methoxy-N,N- Diisopropyltryptamin e 5-methoxy-N,N-bis(1- methylethyl)-1H-indole- 3-ethanamine 122 Table A3.1 cont’d 5-methoxy-N,N- Dimethyltryptamine 5-methoxy-N,N- dimethyl-1H-indole-3- ethanamine N,N- Dipropyltryptamine N,N-dipropyl-1H-indole- 3-ethanamine N,N- Dimethyltryptamine N,N-dimethyl-1H-indole- 3-ethanamine 4-hydroxy Diethyltryptamine 4-methyl-α- Ethyltryptamine 5,7- Dichlorotryptamine α-Ethyltryptamine 3-[2- (diethylamino)ethyl]-1H- indol-4-ol α-ethyl-4-methyl-1H- indole-3-ethanamine 5,7-dichloro-1H-indole- 3-ethanamine α-ethyl-1H-indole-3- ethanamine α-methyl-1H-indole-3- ethanamine 2-(2,5-dimethoxy-4- nitrophenyl)-N-(2- methoxybenzyl)ethanami ne 2,5-dimethoxy-N-[(2- methoxyphenyl)methyl]- 4-propyl- benzeneethanamine 2,5-dimethoxy-N-[(2- methoxyphenyl)methyl]- 4-(methylthio)- benzeneethanamine 3,4-dimethoxy-N-[(2- methoxyphenyl)methyl]- α-methyl- benzeethanamine 2- fluoromethamphentamine 3- fluoromethamphetamine 4- fluoromethamphetamine 25-N 25-P 25-T 3,4-DMA 2-FMA 3-FMA 4-FMA 2-EMC 3-EMC 4-EMC 2-ethylmethcathinone α-methyl Tryptamine 3-ethylmethcathinone 4-ethylmethcathinone 123 para-fluorobutyryl fentanyl ortho-fluorobutyryl fentanyl meta-fluorobutyryl fentanyl para-fluoro methoxyacetyl fentanyl ortho-fluoro methoxyacetyl fentanyl meta-fluoro methoxyacetyl fentanyl Figure A3.7 Neutral loss spectra of all fentanyl analogs 124 0501001502002500.00.10.20.30.40.5Relative IntensityNeutral Loss (Da)0501001502002500.00.10.20.30.40.50.6Relative IntensityNeutral Loss (Da)0501001502002500.00.20.4Relative IntensityNeutral Loss (Da)0501001502002500.00.20.4Relative IntensityNeutral Loss (Da)0501001502002500.000.050.100.150.200.250.300.350.40Relative IntensityNeutral Loss (Da)0501001502002500.00.20.4Relative IntensityNeutral Loss (Da) Figure A3.7 cont’d para-fluoroisobutyryl fentanyl ortho-fluoroisobutyryl fentanyl meta-fluoroisobutyryl fentanyl cyclohexyl fentanyl cyclopropyl fentanyl cyclopentyl fentanyl 125 0501001502002500.00.10.20.30.40.50.6Relative IntensityNeutral Loss (Da)0501001502002500.00.10.20.30.40.50.6Relative IntensityNeutral Loss (Da)0501001502002500.00.10.20.30.40.50.6Relative IntensityNeutral Loss (Da)0501001502002500.00.10.20.30.40.50.6Relative IntensityNeutral Loss (Da)01002000.00.10.20.30.40.50.6Relative IntensityNeutral Loss (Da)0501001502002500.00.10.20.30.40.50.60.7Relative IntensityNeutral Loss (Da) Figure A3.7 cont’d butyryl fentanyl isobutyryl fentanyl acrylfentanyl tetrahydrofuran fentanyl para-methylfentanyl ortho-methylfentanyl 126 01002000.00.10.20.30.40.50.60.7Relative IntensityNeutral Loss (Da)01002000.00.10.20.30.40.50.6Relative IntensityNeutral Loss (Da)01002000.00.20.4Relative IntensityNeutral Loss (Da)0501001502002500.00.10.20.30.40.50.6Relative IntensityNeutral Loss (Da)01002000.00.10.20.30.40.50.60.70.8Relative IntensityNeutral Loss (Da)01002000.00.20.40.60.8Relative IntensityNeutral Loss (Da) Figure A3.7 cont’d meta-methylfentanyl para-methoxyfentanyl para-chlorofentanyl para-fluorofentanyl furanylethyl fentanyl α-methyl acetyl fentanyl 127 01002000.00.10.20.30.40.50.60.70.8Relative IntensityNeutral Loss (Da)0501001502002500.00.10.20.30.40.50.60.7Relative IntensityNeutral Loss (Da)0501001502002500.000.050.100.150.200.250.300.350.40Relative IntensityNeutral Loss (Da)01002000.00.10.20.30.40.50.6Relative IntensityNeutral Loss (Da)01002000.00.51.0Relative IntensityNeutral Loss (Da)01002000.000.050.100.150.200.25Relative IntensityNeutral Loss (Da) Figure A3.7 cont’d α-methyl thiofentanyl α-methylfentanyl 4’-methylfentanyl thiofentanyl 128 01002000.000.020.040.060.080.100.120.140.160.180.20Relative IntensityNeutral Loss (Da)01002000.00.10.2Relative IntensityNeutral Loss (Da)01002000.00.10.20.30.40.50.6Relative IntensityNeutral Loss (Da)01002000.00.10.20.30.40.50.60.70.8Relative IntensityNeutral Loss (Da) REFERENCES 129 REFERENCES (1) Bonetti, J. Mass Spectral Differentiation of Positional Isomers using Multivariate Statistics. Forensic Chemistry. 2018, 9, 50–61. (2) Quinn, M.; Brettell, T.; Joshi, M.; Bonetti, J.; Quarino, L. Identifying PCP and Four PCP Analogs Using the Gold Chloride Microcrystalline Test Followed by Raman Microspectroscopy and Chemometrics. Forensic Science International. 2020, 307, 110135. (3) Setser, A. L.; Waddell Smith, R. Comparison of Variable Selection Methods Prior to Linear Discriminant Analysis Classification of Synthetic Phenethylamines and Tryptamines. Forensic Chemistry. 2018, 11, 77–86. (4) Kranenburg, R. F.; Peroni, D.; Affourtit, S.; Westerhuis, J. A.; Smilde, A. K.; Asten, A. C. V. Revealing Hidden Information in GC–MS Spectra from Isomeric Drugs: Chemometrics Based Identification from 15 EV and 70 EV EI Mass Spectra. Forensic Chemistry. 2020, 18, 100225. (5) Roberson, Z. R.; Goodpaster, J. V. Differentiation of Structurally Similar Phenethylamines via Gas Chromatography–Vacuum Ultraviolet Spectroscopy (GC– VUV). Forensic Chemistry. 2019, 15, 100172. (6) Davidson, J. T.; Jackson, G. P. The differentiation of 2,5-dimethoxy-N-(N- methoxybenzyl)phenethylamine (NBOMe) isomers using GC retention indices and multivariate analysis of ion abundances in electron ionization mass spectra. Forensic Chemistry. 2019, 14, 100160. (7) Stuhmer, E.L.; McGuffin, V.L.; Waddell Smith, R. Discrimination of seized drug positional isomers based on statistical comparison of electron-ionization mass spectra. Forensic Chemistry. 2020, 20, 100261. (8) Cayman Chemical. Fentanyl Identification Cayman Currents. 28, Ann Arbor (2017). (9) Watson, J. T.; Sparkman, O. D. Introduction to mass spectrometry: instrumentation, applications and strategies for data interpretation; Wiley: Chichester, 2011. (10) Franki, R. Fentanyl Analogues an Increasing Factor in Opioid Deaths. https://www.mdedge.com/psychiatry/article/150493/addiction-medicine/fentanyl- analogues-increasing-factor-opioid-deaths (accessed Jun 9, 2020) 130 (11) Setser, A. L. Classification of Synthetic Phenethylamines and Tryptamines using Multivariate Statistical Procedures [Master’s Thesis]; Michigan State University, East Lansing, 2019. (12) Stuhmer, E.L. Statistical Comparison of Mass Spectral Data for Positional Isomer Differentiation [Master’s Thesis]; Michigan State University, East Lansing, 2019. (13) Fowble, K. L.; Shepard, J. R.; Musah, R. A. Identification and Classification of Cathinone Unknowns by Statistical Analysis Processing of Direct Analysis in Real Time- High Resolution Mass Spectrometry-Derived “Neutral Loss” Spectra. Talanta. 2018, 179, 546–553. (14) Moorthy, A. S.; Wallace, W. E.; Kearsley, A. J.; Tchekhovskoi, D. V.; Stein, S. E. Combining Fragment-Ion and Neutral-Loss Matching during Mass Spectral Library Searching: A New General Purpose Algorithm Applicable to Illicit Drug Identification. Analytical Chemistry. 2017, 89 (24), 13261–13268. 131 4. SOFT-INDEPENDENT MODELLING OF CLASS ANALOGIES (SIMCA) FOR CLASSIFICATION OF FENTANYL ANALOGS ACCORDING TO STRUCTURAL SUBCLASS Soft independent modelling of class analogies (SIMCA) is a multivariate statistical classification method that has been applied to various forensic disciplines.1-4 It is a soft classification method that develops PCA models for each class individually. This work explored the application of SIMCA to classify fentanyl analogs according to structural subclass. As in Chapter 3, SIMCA models were developed to investigate the effect of three factors on the classification success rate: spectral variation within a peak, the effect of spectral variation over time, and the use of neutral loss data. The same analogs and subclasses were used in this work as in Chapter 3: n-alkyl chain substituted (AN) subclass, aniline ring substituted (AR) subclass, the amide group substituted (AG) subclass, and the amide and aniline ring substituted (AA) subclass. All mass spectra and corresponding discussion is in Chapter 3, Section 3.1. All models were optimized using leave-one-out cross validation, and the applicability of the models was tested using external test sets. 4.1 INITIAL SOFT INDEPENDENT MODELLING OF CLASS ANALOGIES (SIMCA) MODELS TO ASSESS VARIATION WITHIN A CHROMATOGRAPHIC PEAK The same training set and test set was used as in Section 3.2. Unlike linear discriminant analysis (LDA), the number of variables used to develop SIMCA models was not limited by the number of samples. As such, the full mass spectrum of each analog was used to develop the initial SIMCA model. The apex spectrum and average spectrum were used to develop two separate models to assess the ability to successfully classify the analogs. As discussed in Chapter 1, SIMCA is based on PCA models for each subclass independently. The number of principal 132 components (PCs) and the optimum significance (α) value were determined by the user for each subclass. The optimal PCA models were then compared to one another using the squared residual distances (Q) to each subclass. The α values were optimized once more to increase separation between structural subclasses while ensuring no critical boundary was large enough to include training set analogs of other subclasses. The conditions for each of the subclasses for the final optimized apex and average models are listed in Table 4.1. Table 4.1 Conditions for each subclass in SIMCA for both apex and average models Apex Model Cumulative Cross α Variance Validation Average Model Cumulative Cross # of PCs α Variance (%) 99 99 78 97 Validation Success (%) 100 80 100 100 Class # of PCs AA AR AG AN 2 3 3 2 0.01 0.01 0.01 0.05 (%) 99 99 78 97 (%) 100 80 100 100 2 3 3 2 0.01 0.01 0.01 0.05 Conditions for both the apex model and the average model were the same for all subclasses. The α value was larger for the AN subclass than any of the other subclasses, which indicated more similarity among analogs in the AN subclass. The larger α value indicated a smaller critical limit boundary, which meant less variability among analogs in the AN subclass. The AR subclass was the only subclass without 100% successful cross validation. This indicated that the analogs within this subclass had more variability. When they were removed one by one 133 and applied to the model, as was done for leave-out-out cross validation, the model was not sufficiently representative to classify all analogs correctly. The overall cross validation success for both models was 95% with two spectra of para- chlorofentanyl misclassified as members of no subclass, rather than the AR subclass (Figure 4.1). The para-chlorofentanyl replicates were misclassified because their squared residual distance (Q) was higher than the critical limit for the subclass, even though their Hotelling’s T2 distance was within the defined critical limit for this subclass. This highlights the importance of utilizing both of these parameters for optimal classification with a minimal number of false positives. The para-chlorofentanyl replicates were not misclassified in LDA; this was likely due to the differences in modelling between LDA and SIMCA. Subclasses were modelled against each other in LDA, whereas in SIMCA they were modelled individually. This difference in modelling caused differences in cross validation success. 134 para-chlorofentanyl replicates Figure 4.1 Residuals plot for the AR subclass from the apex model The Cooman’s plots show the optimized separation among the four subclasses for the apex and average models (Figure 4.2). As discussed in Chapter 1, the Cooman’s plots show the distance to two of the subclasses plotted against one another. Each subclass has an optimal critical limit determined in the development of the model, in which all analogs belonging to that subclass fall under the critical limit and all analogs not belonging to that class fall outside the limit. When the Cooman’s plots were examined, all the training set analogs fall below the critical limit for their respective subclasses, with no samples falling below the critical limit for another subclass or outside the critical limit for its correct subclass. When the apex model was examined 135 00.1040Squared Residual Distance (Q)Hotelling's T2Training SetCross ValidationCritical Limit (Figure 4.2A-C), separation among the four subclasses was achieved. When the Cooman’s plots for the average model (Figure 4.2D-F) were examined, the separation among subclasses showed similar results and used the same parameters. Once again, the comparison of the apex and average models showed many similarities between the training sets and development of the models, so further discussion is only in reference to the apex model. 136 D) E) F) A) B) C) Figure 4.2 Cooman’s plots for the apex model A) amide and aniline ring (AA) subclass vs amide group (AG) subclass, B) AA vs aniline ring (AR) subclass, C) AA vs n-alkyl chain (AN) subclass, and Cooman’s plots for the average model D) AA vs AG, E) AA vs AR, F) AA vs AN 137 0505Distance to Amide Group Subclass (Q)Distance to Amide and Aniline Ring Subclass (Q)0505Distance to Aniline Ring Subclass (Q)Distance to Amide and Aniline Ring Subclass (Q)0505Distance to the n-Alkyl Chain Subclass (Q)Distance to the Amide and Aniline Ring Subclass (Q)0505Distane to the Amide Group Subclass (Q)Distance to the Amide and Aniline Ring Subclass (Q)0505Distance to the Aniline Ring Subclass (Q)Distance to the Amide and Aniline Ring Subclass (Q)0505Distance to the n-Alkyl Chain Subclass (Q)Distance to the Amide and Aniline Ring Subclass (Q) When the optimized Cooman’s plots were examined, there was clear separation among all subclasses. During model development, it was important to evaluate the modelling power plots that showed which variables contributed to each subclass. As an example, Figure 4.3 shows the modelling power plot for the AG subclass based on apex data and Table 4.2 shows the variables contributing most to this subclass. The modelling power of variables are specific to each subclass and show how the variables in new samples will contribute to classification in that subclass. The limits for modelling power are 0 to +1, with +1 contributing the most to modelling. Variables contributing over 0.3 are considered necessary variables for modelling.5 The modelling power plots are only for a specific subclass, as all subclasses are modelled individually. Therefore, the variables that contribute to the modelling do not necessarily contribute to discrimination among subclasses. 138 Figure 4.3 Modelling power plot for the AG subclass from the apex model Table 4.2 Variables contributing most to the AG subclass 83 204 300 158 243 301 189 244 332 m/z 190 245 389 200 264 390 201 297 202 299 Table 4.2 shows which variables contributed most (≥80%) to the modelling of the AG subclass. The 80% threshold was used to visualize the variables contributing most to modelling. When the spectra of AG analogs were examined, it was apparent that these variables highlight the variability within this model. Since each model is a PCA model, it follows that the modelling 139 -0.2140557085100115130145160175190205220235250265280295310325340355370385400415430445Modelling Powerm/z power would explain variability within the model. For example, many of the contributing variables were only present at high intensity in one of the analogs in the AG subclass, such as m/z 83 in cyclohexyl fentanyl. Additionally, some of the variables were isotope peaks of the dominant ion, for example cyclohexyl fentanyl had a high intensity peak at m/z 299 with isotope peaks at m/z 300 and m/z 301. The variables m/z 332, 389, and 390 were not visible in any spectra so it is unclear why these variables contributed to the AG subclass model. Some variables also showed modelling power below zero, outside the range for modelling power. This was potentially due to the processing of the data prior to when SIMCA was applied. The modelling plots and tables with the variables contributing most to the other three subclasses are shown in the appendix (Figure A4.1-A4.3 and Table A4.1-A4.3). When the test set was applied to the SIMCA models, the apex model resulted in a 55% correct classification rate, while the average model resulted in a 58% correct classification rate. The apex and average models misclassified the same 27 samples, with the apex model misclassifying two additional samples (total of 29). The two additional misclassifications were replicates of meta-fluoro methoxyacetyl fentanyl and α-methyl thiofentanyl, neither of which were misclassified in the initial LDA model. Spectra used in the training set were collected over two months, while many spectra used in the test set were collected over four months. Variation in spectral intensities across the four months was observed, leading to the low classification success rate. For example, replicates of m/z 69 in cyclopropyl fentanyl in the training set had a range in relative intensities of 5%; however, the range in relative intensities across the four months for the same m/z value was 18%. The SIMCA models were more affected by variation in spectral intensity because all m/z values in the spectra were used in model development, whereas only selected variables were used in LDA model development. Overall, the low classification 140 success indicated that the SIMCA model was over-trained and, therefore, not capable of correctly classifying new samples. This highlighted the importance of including instrument variation in the training set when developing SIMCA models. The classification success for both the apex and average models once again showed minimal difference between using the apex or average data. As such, the following sections discuss only models developed using data collected at the apex. 4.2 REFINED SOFT INDEPENDENT MODELLING OF CLASS ANALOGIES (SIMCA) MODEL TO INCORPORATE INSTRUMENT VARIATION The training and test sets were re-defined to incorporate instrument variation into the model development (the refined training and test sets are described in Chapter 3, Section 3.3). As before, the full mass spectra were used to develop, optimize, and test the refined SIMCA model. In terms of model development and optimization, the optimal conditions for each subclass are shown in Table 4.3. Table 4.3 Conditions for each subclass in refined SIMCA model Class # of PCs α Cumulative Variance (%) AA AR AG AN 2 3 3 2 0.01 0.01 0.01 0.01 97 97 78 95 Cross Validation Success (%) 100 95 100 100 141 The only difference between the conditions for the refined model, compared to the initial model, was the α value for the AN subclass. In the refined model, the α value was 0.01 (rather than 0.05 in the initial model). The lower α value meant a larger critical limit, allowing for samples with more variation to be classified to the model. With the refined model, some differences in the variables used to model the subclasses were observed compared to the initial model. Figure 4.4 shows a comparison of the modelling power plots for the AG subclass for the refined model (instrument variation), versus the initial model (no instrument variation). Table 4.4 shows a comparison of the variables contributing most (≥80%) to both models. The modelling power plots for the three other subclasses are shown in the appendix (Figure A4.4). 142 A) B) Figure 4.4 Modelling power plots for the AG subclass SIMCA model A) with instrument variation incorporated, B) without instrument variation incorporated 143 -0.2140557085100115130145160175190205220235250265280295310325340355370385400415430445Modelling Powerm/z-0.2140557085100115130145160175190205220235250265280295310325340355370385400415430445Modelling Powerm/z Table 4.4 Comparison of variables contributing most the initial and refined AG subclass SIMCA 83 200 243 297 332 Initial Model 158 189 202 201 245 244 299 300 390 389 models m/z 190 204 264 301 111 215 257 300 Refined Model 162 204 244 228 297 258 301 389 214 245 299 The initial model had 19 variables that contributed 80% or greater to the modelling of the AG subclass model. When the refined model was used, 15 variables contributed 80% or greater to the modelling of the AG subclass SIMCA model. When the variables contributing most were compared, there were only eight variables common between the two models (m/z 204, 244, 245, 297, 299, 300, 301, 389). The refined model had fewer variables contributing over 80% to the modelling power. The differences in variables contributing to the variation in the refined model was likely due to the variation (range in relative intensities between 0.4-19%) introduced by month-to-month instrument usage. Many of the new variables contributing to the refined model are present at lower intensity in the mass spectra of the AG analogs. The refined SIMCA model resulted in a 99% successful leave-one-out cross validation with only para-chlorofentanyl misclassified. The misclassification was due to a Q value that was higher than the critical limit for the AR subclass (Figure 4.5). The initial model misclassified two replicates of para-chlorofentanyl for the same reason. The larger critical limit boundary in the refined model, along with the additional variation accounted for by the model, allowed for 144 one additional replicate to classify correctly. The one para-chlorofentanyl replicate that misclassified had a spectrum with low relative intensities compared to the other three spectra. This was potentially due to a change in relative ion intensities as a result of the analog being in solution over time, unrefrigerated. para-chlorofentanyl replicate Figure 4.5 Residuals plot for the AR subclass in the refined SIMCA model When the test set was applied, the refined SIMCA model resulted in a 91% successful classification rate. Replicate spectra of cyclopentyl fentanyl (AG subclass) were misclassified as not belonging to any class. The samples had a low Hotelling’s T2 value but a large Q value for 145 00.25020Squared Residual Distance (Q)Hotellings T2Training SetCross ValidationCritical Limit this class, which caused them to fall outside the critical limit boundary. The two other cyclopentyl fentanyl replicates followed the same trend but were just below the critical limit boundary, permitting correct classification. The variability from month-to-month caused two of the four replicates to misclassify, even though all replicates were concentrated around the critical limit (Figure 4.6). Select ions were examined for the range in variation among the four cyclopentyl fentanyl spectra and three high intensity ions (m/z 69, 105, and 146) had a range in relative ion intensities that varied from 11-35.5% among the spectra. Figure 4.6 Residuals plot for the AG subclass in the refined SIMCA model 146 02.5020Squared Residual Distance (Q)Hotellings T2Training SetTest SetCritical Limit 4.2.1 Additional Test Sets to Validate the Classification Models In order to test the validity of the models, the two external test sets applied in Chapter 3 (Section 3.3.2) were also applied to the refined SIMCA model. For Test Set 1 (non-fentanyl NPS compounds), all samples were classified as ‘none’.6,7 The samples did not fall within the boundaries of any of the fentanyl subclasses. Test Set 1 showed the benefit of SIMCA, and soft classification methods, that have the option to classify samples as ‘none’ (not force classification). For a complete unknown, SIMCA is a more conservative classification method than LDA. Since LDA is a hard classification method, it forced classification to one of the subclasses, which would be incorrect for any non-fentanyl sample. When Test Set 2 (case samples) was applied to the refined SIMCA model, all six samples were classified as ‘none’, and thus only carfentanil was correctly classified. The case samples were not classified correctly by SIMCA, likely because the case samples had very few variables. As stated previously, the low number of variables was likely due to instrument parameters and the concentration of the controlled substances in case samples being very low. This highlighted the need for concentration to be incorporated in multivariate statistical models in order to correctly classify a wide variety of samples. 4.3 APPLICATION OF NEUTRAL LOSS SPECTRA FOR CLASSIFICATION OF FENTANYL ANALOGS Soft independent modelling of class analogies was applied to the neutral loss data to assess classification and was compared to the results obtained when mass spectral data were used. The 28 fentanyl analogs and replicates were divided into the same training and test sets that were used for the refined model. The neutral loss spectra of the fentanyl analogs are discussed in Section 3.4.1. The full neutral loss spectra were input into SIMCA. 147 The conditions for each subclass were optimized based on cross-validation success and are shown in Table 4.5. The conditions for the neutral loss model differed in several ways from the refined model (Table 4.3). First, the optimal PCA model for the AR subclass retained two instead of three PCs. Second, with the exception of the AG subclass, the neutral loss model accounted for lower cumulative variance than the refined model. Third, the neutral loss model resulted in a higher cross-validation classification success rate (100%) for the AR subclass than the refined model (95%). The AR subclass resulted in better cross validation but retained fewer PCs and explained less cumulative variance. This indicated that when the subclass models included more cumulative variance, they also included more within class variation. Table 4.5 Conditions for each subclass in the neutral loss SIMCA model Class # of PCs α Cumulative Variance (%) AA AR AG AN 2 2 3 2 0.01 0.01 0.01 0.01 95 77 83 89 Cross Validation Success (%) 100 100 100 100 The Cooman’s plots showed the optimized separation among the four subclasses (Figure 4.7). There was clear separation among the four subclasses, with all analogs belonging to a subclass below the critical limit for that subclass and no analogs overlapping subclasses. This separation was to be expected because during the development of a SIMCA model the classes are optimized for separation. The modelling of each subclass was independent and the variables 148 that contributed to the modelling of each subclass are shown in the modelling power plots in the appendix (Figure A4.5). 149 A) B) Figure 4.7 Cooman’s plots for the A) AA subclass vs AG subclass, B) AA subclass vs AR subclass, and C) AA subclass vs AN subclass 150 0505Distance to Amide Group Subclass (Q)Distance to Amide and Aniline Ring Subclass (Q)0505Distance to Aniline Ring Subclass (Q)Distance to Amide and Aniline Ring Subclass (Q) Figure 4.7 cont’d C) 151 0505Distance to n-Alkyl Chain Subclass (Q)Distance to Amide and Aniline Ring Subclass (Q) The neutral loss SIMCA model had a 100% correct leave-one-out cross validation across all four subclasses and resulted in 87% correct classification of the test set. This model resulted in better cross validation but had lower correct classification of the test set (one less sample) than the refined SIMCA model. Three replicates of cyclopentyl fentanyl (AG subclass) were misclassified as ‘none’, that is, not belonging to any of the subclasses. The residuals plot for the AG subclass showed that the cyclopentyl fentanyl replicates had a Q value outside the critical limit, which caused them to be classified as ‘none’ (Figure 4.8). Although three of the four cyclopentyl fentanyl replicates were misclassified, the fourth replicate was very close to the Q critical limit, indicating this analog varied more than the other analogs in this subclass. This was similar to the results observed for the refined SIMCA model (Figure 4.6), in which two of the four cyclopentyl fentanyl replicates were outside the critical limit but all were concentrated around the critical limit. When the mass spectra of cyclopentyl fentanyl were visually examined, the obvious difference was the intensity of m/z 69 which varied from 60% of the base peak to 95% of the base peak. Due to the manner in which the neutral loss spectra were developed, this variation was translated to the neutral loss data as well. This indicated that cyclopentyl fentanyl varied in both its mass spectrum and neutral loss spectrum more than the analogs used to train and develop these models. This highlighted a potential need for this class to have a more representative training set to account for analogs with structural variations. 152 Figure 4.8 Residuals plot for the AG subclass in the neutral loss SIMCA model 4.4 SUMMARY OF SOFT INDEPENDENT MODELLING OF CLASS ANALOGIES (SIMCA) MODELS The initial SIMCA models resulted in a 55% correct classification rate for the apex data and 58% for the average data. For the initial models, the same 27 samples were misclassified, with an additional two samples misclassified using the apex model. As apex spectra are typically used and the average spectra provided minimal benefit to classification, the data collected at the apex were used for all subsequent model development. The initial SIMCA models demonstrated lower classification success than the initial LDA models described in Chapter 3. As the 153 01.4020Squared Residual Distance (Q)Hotelling's T2Training SetTest SetCritical Limit subclasses were optimized individually and the full mass spectrum was used in SIMCA, the model was likely over-trained, resulting in poorer performance in classifying new samples than LDA. The refined SIMCA model performed with a 91% correct classification rate. The higher correct classification rate in the refined model highlighted the importance of incorporating instrument variation in model development. When Test Set 1 (non-fentanyl compounds) was applied, all samples were correctly classified as ‘none’, or not belonging to any of the fentanyl subclasses. However, when Test Set 2 (case samples) was applied, only one of six samples was correctly classified. The variation in the mass spectra for samples with low concentration were not accounted for in this model, as all analogs were prepared at a relatively high concentration (1 mg/mL) compared to the case samples. Further optimization of this model needs to incorporate concentration as a factor so samples with unknown concentrations can more accurately be classified. The neutral loss SIMCA model performed with 87% successful classification. These results were comparable to the refined SIMCA model, with three replicates of cyclopentyl misclassifying instead of two cyclopentyl replicates in the refined model. As stated in Chapter 3 Section 3.5, all neutral loss data are hypothetical and further analysis must be done to obtain accurate neutral loss information. 154 APPENDIX 155 Figure A4.1 Modelling power plot for the AA subclass from the apex initial SIMCA model Table A4.1 Variables contributing most to the AA subclass in the apex initial SIMCA model 59 237 176 238 177 250 43 207 277 45 219 278 46 220 279 m/z 58 236 280 156 -0.2140557085100115130145160175190205220235250265280295310325340355370385400415430445Modelling Powerm/z Figure A4.2 Modelling power plot for the AR subclass from the apex initial SIMCA model 157 -0.2140557085100115130145160175190205220235250265280295310325340355370385400415430445Modelling Powerm/z Table A4.2 Variables contributing most to the AA subclass in the apex initial SIMCA model 122 142 154 166 177 190 204 224 238 261 276 297 368 123 143 155 167 178 192 206 225 239 263 277 298 369 124 144 160 168 180 194 207 226 244 264 278 313 371 95 126 149 161 172 181 195 208 232 246 265 279 314 440 109 135 150 162 173 182 196 219 233 250 273 280 315 443 110 136 151 164 175 183 201 220 234 259 274 281 352 m/z 112 140 152 165 176 188 203 223 236 260 275 282 365 158 Figure A4.3 Modelling power plot for the AN subclass from the initial apex SIMCA model Table A4.3 Variables contributing most to the AN subclass in the apex initial SIMCA model 42 158 240 262 57 189 245 70 190 246 139 204 259 146 216 260 147 217 261 m/z 96 202 247 159 -0.2140557085100115130145160175190205220235250265280295310325340355370385400415430445Modelling Powerm/z A) B) Figure A4.4 Modelling power plots from the refined SIMCA model for the A) AA subclass, B) AR subclass, and C) AN subclass 160 -0.2140557085100115130145160175190205220235250265280295310325340355370385400415430445Modelling Powerm/z-0.2140557085100115130145160175190205220235250265280295310325340355370385400415430445Modelling Powerm/z Figure A4.4 cont’d C) 161 -0.2140567288104120136152168184200216232248264280296312328344360376392408424440Modelling Powerm/z A) B) Figure A4.5 Modelling power plots from the neutral loss SIMCA models for the A) AA subclass, B) AR subclass, and C) AN subclass 162 01191725334149576573818997105113121129137145153161169177185193201209217225233Modelling Powerm/z01191725334149576573818997105113121129137145153161169177185193201209217225233Modelling Powerm/z Figure A4.5 cont’d C) 163 01191725334149576573818997105113121129137145153161169177185193201209217Modelling Powerm/z REFERENCES 164 REFERENCES (1) Álvarez, Á.; Yáñez, J.; Contreras, D.; Saavedra, R.; Sáez, P.; Amarasiriwardena, D. Propellant’s Differentiation Using FTIR-Photoacoustic Detection for Forensic Studies of Improvised Explosive Devices. Forensic Science International. 2017, 280, 169–175. (2) Kaniu, M.; Angeyo, K. Challenges in Rapid Soil Quality Assessment and Opportunities Presented by Multivariate Chemometric Energy Dispersive X-Ray Fluorescence and Scattering Spectroscopy. Geoderma. 2015, 241-242, 32–40. (3) Pereira, J. F.; Silva, C. S.; Vieira, M. J. L.; Pimentel, M. F.; Braz, A.; Honorato, R. S. Evaluation and Identification of Blood Stains with Handheld NIR Spectrometer. Microchemical Journal. 2017, 133, 561–566. (4) Waddell, E. E.; Williams, M. R.; Sigman, M. E. Progress Toward the Determination of Correct Classification Rates in Fire Debris Analysis II: Utilizing Soft Independent Modelling of Class Analogy (SIMCA). Journal of Forensic Sciences. 2014, 59 (4), 927–935. (5) Vogt, N.; Knutsen, H. SIMCA Pattern Recognition Classification of Five Infauna Taxonomic Groups Using Non-Polar Compounds Analysed by High Resolution Gas Chromatography. Marine Ecology Progress Series. 1985, 26, 145–156. (6) Setser, A. L. Classification of Synthetic Phenethylamines and Tryptamines using Multivariate Statistical Procedures [Master’s Thesis]; Michigan State University, East Lansing, 2019. (7) Stuhmer, E.L. Statistical Comparison of Mass Spectral Data for Positional Isomer Differentiation [Master’s Thesis]; Michigan State University, East Lansing, 2019. 165 5. CONCLUSIONS AND FUTURE WORK 5.1 CONCLUSIONS Overall, this work aimed to explore and compare two multivariate statistical classification methods: linear discriminant analysis (LDA) and soft independent modelling of class analogies (SIMCA). Three factors were explored to validate and increase robustness of the models: within- peak variation, instrument variation, and the use of neutral losses rather than fragment ions for classification. Both LDA and SIMCA showed successful results for the ability to classify fentanyl analogs according to structural subclass. When within chromatographic peak variation was investigated, the LDA apex and average models performed with a 98% successful classification rate. For SIMCA, the apex model performed with a 55% successful classification rate and the average model performed with a 58% successful classification rate. The apex spectrum resulted in consistent classification results to the average spectrum, supporting the current practice of forensic laboratories collecting the mass spectrum at the apex of the peak. Instrument variation was also investigated in this work and highlighted the need for its incorporation in statistical classification methods. The refined LDA model resulted in a 100% successful classification rate and the refined SIMCA model resulted in a 91% successful classification rate. When instrument variation was not accounted for (initial models), the models, particularly SIMCA, performed worse than when instrument variation was incorporated in the training set (refined models). The third factor investigated, neutral loss spectra, showed promise for an alternative way to develop LDA and SIMCA models. Instead of using the mass spectra, neutral loss spectra were used to develop the models and showed results comparable to the refined models that 166 incorporated instrument variation. However, it was acknowledged that the neutral loss data were hypothetical and only serve to demonstrate the potential for classification purposes. In terms of forensic application, SIMCA may be the better classification method due to the lower likelihood of misclassifications or false positives. Linear discriminant analysis forces all new samples into one of the available subclasses, while SIMCA has the ability to classify samples as ‘none’ or ‘both’ making SIMCA potentially more conservative, which is preferred in forensic applications. This advantage was observed in the current work with the SIMCA model correctly classifying the external data set of non-fentanyl samples (Test Set 1) as ‘none’, as opposed to the LDA model which classified all samples in a fentanyl subclass. Additionally, SIMCA has more ways to optimize the model specific to a set of data, such as changing the critical limit and examining the modelling power plots to assess variables within a subclass. Linear discriminant analysis could be modified to incorporate an “other” class or could be used as a tiered system to start as a general class and get more specific, as a way to circumvent the challenges of forced classification. This work demonstrated the application of LDA and SIMCA to classify fentanyl analogs according to structural subclasses. Through this work, instrument variation and the introduction of neutral loss application were explored and highlighted limitations when statistical classification methods were used. Additionally, a limited number of analogs was used to represent each subclass in the training set and only one to two analogs were representative of each subclass in the test set. Within subclasses there were multiple sets of isomers. It is acknowledged that the small data set with limited diversity must be expanded upon for validation. 167 5.2 FUTURE WORK This work can be expanded to incorporate other types of instrument variation, including collection over longer periods of time, collection on other Agilent gas chromatography-mass spectrometry (GC-MS) instruments, and collection on GC-MS instruments from various manufacturers. As mentioned previously, a limited data set was used. Future work can expand upon this data set by creating a larger training set with more representative compounds of every subclass, as well as expanding the test set to assess model development further. For SIMCA specifically, there are a lot of optimization techniques; however, they are time consuming. One such optimization utilizes the modelling power plots, in addition with discrimination power plots. The modelling power plots show how variables contribute to modelling of each class individually. Discrimination power plots show how the variables discriminate between two classes. For example, a variable that is below 0.3 on the modelling power plot may be deemed irrelevant to modelling;1 however, if its discrimination power is 3 or greater, the variable must be retained to provide differentiation between classes.2 Discrimination power plots could be used in future work to optimize the SIMCA model further and eliminate variables that do not contribute highly to modelling or discrimination between classes. A limitation to this work was highlighted when the case samples from Michigan State Police Forensic Science Division were applied to the models. The case samples were suspected to be in lower concentration than the reference standards used in the development of the models. Minimal classification success of the case samples showed the need for concentration to be incorporated as a source of variation in the training set. Future work should include analyzing reference standards at various concentrations, including concentrations comparable to those commonly observed in casework samples. Beyond refining the LDA and SIMCA models to 168 incorporate instrument variation, there are other multivariate statistical methods that can be applied that may enhance classification success for samples with constantly changing concentrations (such as case samples). Partial least squares discriminant analysis (PLS-DA) is a classification method that also incorporates linear regression into the model and could be applied to improve classification of samples with varying concentration, such as case samples. As previously mentioned, the potential to develop models using neutral losses rather than fragment ions was explored in this work. However, all neutral loss data used were hypothetical due to the limitations of the low-resolution mass spectral data used in this work. The fentanyl analogs contained varying structural substitutions, even within a subclass, so mass spectral data varied depending on the type of substitution. Utilization of neutral loss data requires high- resolution mass spectral data with high mass accuracy so the chemical identity of the fragment ions is known and, therefore, the neutral loss identities can be determined. The analogs used in this work could be reanalyzed using gas chromatography-triple quadrupole-tandem mass spectrometry (GC-QQQ-MSMS) to obtain the high-resolution mass spectral data, as well as indicate of the fragmentation pattern of the analogs. Once the chemical identity of the neutral losses is determined, neutral losses characteristic of each subclass could be used in the development of new LDA and SIMCA models which may result in improved classification of fentanyl analogs according to structural subclass. It is predicted that the neutral losses would be more consistent within a subclass, which could result in better classification and separation between subclasses. This work has the potential to be improved upon but demonstrates the importance for this application. Multivariate statistical methods applied to fentanyl analogs has shown the ability to classify fentanyl analogs according to structural subclass. This work has the potential to be 169 utilized in forensic laboratories to obtain further structural information of newly synthesized fentanyl analogs as they appear in laboratories. 170 REFERENCES 171 REFERENCES (1) Vogt, N.; Knutsen, H. SIMCA Pattern Recognition Classification of Five Infauna Taxonomic Groups Using Non-Polar Compounds Analysed by High Resolution Gas Chromatography. Marine Ecology Progress Series. 1985, 26, 145–156. (2) Kucheryavskiy, S. Discrimination power plot for SIMCAM model. http://finzi.psych.upenn.edu/library/mdatools/html/plotDiscriminationPower.simcam.html. 2020 (accessed July 2020). 172