IMPROVING METHODS FOR THE ANALYSIS OF AMPHETAMINE - TYPE STIMULANTS By Fanny Chu A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of Forensic Science Master of Science 2015 ABSTRACT IMPROVING METHODS FOR THE ANALYSIS OF AMPHETAMINE - TYPE STIMULANTS By Fanny Chu Forensic analysis for the definitive identification of controlled substances using attenuated total re flectance - Fourier transform infrared ( ATR - FTIR ) spectroscopy is challenging for sample mixtures without extraction techniques. However, the application of principal components regression (PCR), to FTIR spectra is able to provide identification and quantifi cation of controlled substances in sample mixtures in a single analysis without separation of components. In this study, sample binary mixtures were analyzed and used in the development of a PCR model. After model development, two other sets of sample mixt ures were used to evaluate model accurate quantification were observed, demonstrating the potential of PCR to overcome the limitations of current analysis with A TR - FTIR. Synthetic designer drugs have recently become an international concern, leading research to be directed towards alternative methods of analysis for definitive identification of these drugs. The com bination of high - resolution mass spectrometry (HRMS) and mass defect filters is able to overcome the limitations of current forensic methods and enable prioritization of novel synthetic drugs for identification by allowing rapid classification to structura l class. In this study, three different types of mass defect filters were developed using phenethylamine and cathinone reference standards. The potential for mass defect filters to be incorporated into a classification scheme to discriminate between phenet hylamines and cathinones is demonstrated, thus allowing other resources to be directed towards identification of novel synthe tic designer drugs . iii ACKNOWLEDGEMENTS I would first like to wholeheartedly thank my advisor and committee member, Dr. Ruth career. She has been a wonderful advisor, not only for my research, but also in reading and rereading all the drafts of my thesis. Without her, none of this would be possible. Next, I would like to acknowledge Dr. Victoria McGuffin, who has been a great help to me throughout my research, especially in asking questions that have chall enged me into looking at aspects of my research that I would not have otherwise considered. I would also like to thank Dr. Steven Dow, for agreeing to serve as a committee member on such short notice. My Ph.D. advisor, Dr. A. Daniel Jones, has also been a great help to me in my research by asking questions that have allowed me to direct my research into more interesting avenues and helping me to arrive at logical interpretations of my results. Finally, I would like to express my thanks to the past and curr ent members of the Forensic Chemistry Group for their advice and support in these two years. Past members include John McIlroy, Christine Hay, and Jordyn Geiger, who have taught me so much in my first year. Current members include Kristen Reese, Trevor Cur tis, Rebecca Brehe, Alex Anstett, and Barbara Fallon, who have all spent countless hours listening to my presentation practices ad nausea m . I would especially like to thank KLR and TEC for motivating me when my research took some unexpected turns and throu ghout my thesis - writing process; without them, this thesis would not have been possible. Thank you all; this has been a crazy adventure in and of itself. iv TABLE OF CONTENTS LIST OF TABLES LIST OF FIGURES Part I . Principal Components Regression for the Quantification of Controlled Substances in Sample Mixtures Based on ATR - FTIR Spectra Chapter 1 Introduction 1.1 Controlled Substance Analysis 1.2 Application of Principal Components Analysis (PCA) 1.3 Application of Pr incipal Components Regression (PCR) 1.4 Non - s ample Sources of Variance in Spectroscopic Data 1.5 Research Objective R EFERENCES Chapter 2 Theory 2.1 Attenuated Total Reflectance - Fourier Transform Infrared (ATR - FTIR) Spectroscopy 2.2 Data Pretreatment 2.3 Principal Components Regression (PCR) 2.4 PC Selection for Regression 2.5 Leave - one - out Cross Validation R EFERENCES Chapter 3 Materials and Method 3.1 Sample Preparation and Collection 3.2 Instrument Parameters 3.3 Data Pretreatment 3.4 Example Spectra of Components in Sample Mixtures R EFER E NCES Chapter 4 Results and Discussion 4.1 Effects of Data Pretreatment 4.1.1 Baseline Correction 4.1.2 Smoothing 4.1.3 Standard Normal Variate Normalization 4.1.4 Multiplicative Scatter Correction 4.1.5 Optimal Sequence of Data Pretreatment 4.2 PCA 4.3 PC Selection for MLR 4.4 Model Performance 4.4.1 Multiple Linear Regression 4.4.2 Test Set 1 vii viii 1 1 1 2 3 4 6 8 1 1 1 1 1 6 1 8 2 1 2 2 2 4 2 7 2 7 29 29 30 3 4 3 6 3 6 3 6 42 4 4 49 52 53 56 61 61 6 7 v 4.4.3 Test Set 2 4.4.5 Summary APPENDIX R EFERENCES Chapter 5 Conclusions and Future Work 5.1 Conclusions 5.2 Future Work Part II . Development of Mass Defect Filters for the Classification of Novel Synthetic Designer Drugs Chapter 6 Introduction R EFERENCES Chapter 7 Theory 7.1 Gas Chromatography - Mass Spectrometry (GC - MS) 7.1.1 Chromatography 7.1.2 Gas Chromatography 7.1.3 Separ ation Efficiency 7.1.4 Mass Spectrometry 7.1.4.1 Resolution in Mass Spectrometry 7.1.4.2 Low - Resolution Mass Spectrometry 7.1.4.3 High - Resolution Mass Spectrometry 7.2 Mass Defect 7.2.1 Absolute Mass Defect 7.2.2 Kendrick Mass Defect 7.2.3 Relative Mass Defect R EFERENCES Chapter 8 Materials and Method 8.1 Sample Preparation 8.2 Instrument Parameters 8.3 Data Processing 8.4 Ion Selection for Mass Defect Filters 8.4.1 Absolute Mass Defect Filters 8.4.2 Kendrick Mass Defect Filters 8.4. 3 Relative Mass Defect Filters and Profiles Chapter 9 Results and Discussion 9.1 Comparison of GC - QMS and GC - TOFMS Spectra for Phenethylamines 9.2 Comparison of GC - QMS and GC - TOFMS Spectra for Cathinones 9.3 Absolute Mass Defect 9.4 Kendrick Mass Defect 9.5 Relative Mass Defect 9.6 Classification Scheme 71 7 5 7 7 79 84 86 86 87 88 88 9 8 10 0 10 0 10 0 100 102 105 109 1 10 1 12 1 15 1 17 1 18 1 19 1 21 1 24 1 24 1 27 1 28 1 30 1 30 1 31 1 32 1 34 1 34 1 44 1 50 1 60 1 79 1 83 vi APPENDIX R EFERENCES Chapter 10 Conclusions and Future Work 10.1 Conclusions 10.2 Future Work 185 220 222 2 22 223 vii LIST OF TABLES Table 3.1. Training set sample mixtures containing amphetamine and caffeine. Table 3.2. Test Set 1 mixtures containing amphetamine and caffeine. Table 3.3. Test Set 2 mixtures containing amphetamine and caffeine. res. Table 4.1. PCC values after applying data pretreatments. Table 4. 2 amphetamine regression. Table 4. 3 methamphetamine regres sion. Table 9.1. Molecular ion filter for phenethylamines using absolute mass defect. Table 9.2. Fragment ion filter at m/z 77 for phenethylamines using absolute mass defect. Table 9.3. Molecular ion filter for phenethylamines using Kendrick mass defec t. Table 9.4. List of the ions included in each fragment ion filter for the phenethylamine class using Kendrick mass defect. Table 9.5. List of the ions included in each fragment ion filter for the cathinone class using Kendrick mass defect. 2 8 2 8 28 28 49 7 5 7 7 1 51 1 53 1 61 1 63 1 71 viii LIST OF FIGURES Figure 2.1. Diagram of a Michelson interferometer in an FTIR instrument. Figure 2.2. Diagram of an ATR accessory set - up depicting the IR light path generated via reflection from the mirrors. Figure 3.1. Average spectrum of caffeine from the first collection of amphetamine - caffeine mixtures after baseline correction, smoothing, and standard normal variate normalization. Figure 3.2. Average spectrum of amphetamine from the first colle ction of amphetamine - caffeine mixtures after baseline correction, smoothing, and standard normal variate normalization. Figure 3.3. Average spectrum of methamphetamine from the first collection of methamphetamine - caffeine samples after baseline correctio n, smoothing, and standard normal variate normalization. Figure 4.1. The effect of baseline correction in spectra as compared to raw spectra. Figure 4.2. The effect of baseline correction observed in the PCA scores plot. Figure 4.3. PC 1 and PC 2 loadings plot corresponding to the PCA scores plot of raw spectra. Figure 4.4. Mean - centered spectra of replicates of 60% amphetamine samples after baseline correction. Figure 4. 5 . The effect of applying an automated smoothing function that incorporates a Savitzky - Golay smooth as well as block averaging on different regions of the spectrum. Figure 4. 6 . The effects of SNV normalization in conjunction with baseline correction and smoothing. Figure 4. 7 . Loadings plots for (a) PC 1 and (b) PC 2 associated with PCA scores plot of spectra after applying baseline correction, smoothing, and SNV normalization. Figure 4. 8 . The effect of multiplicative scatter correction on spectra and PCA scores plot. 1 1 1 3 31 3 2 3 3 3 6 38 40 42 4 3 4 5 48 50 ix Figure 4. 9 . (a) PC 1 and (b) PC 2 loadings plot associated with scores plot of spectra after applying baseline correction, smoothing, and multiplicative scatter correction. Figure 4. 10 . PCA applied to training set. Figure 4.1 1 . The standard error of validation pl otted as a function of the number of PCs included in the validation for amphetamine (black) and methamphetamine (red) mixtures. Figure 4.1 2 . Plot of total variance as a function of the number of PCs. Figure 4. 1 3 . PC 3 loadings plot associated with pretreated sample mixtures in the training set. Figure 4. 1 4 . Calibration curves generated in (a) amphetamine regression and (b) methamphetamine regression using the training set. Figure 4.1 5 . Regression vectors. Figure 4.1 6 . PCA scores plot of training set (filled in circles) with Test Set 1 plotted. Figure 4.1 7 . Calibration curves of training set with Test Set 1 plotted for the (a) amphetamine regression and (b) methamphetamine regression. Figure 4.1 8 . Scores plots of training set ( filled in circles) with Test Set 2 plotted. Figure 4.1 9 . Calibration curve of training set with Test Set 2 for the (a) amphetamine regression and (b) methamphetamine regression. Figure A. 1. Loadings plots for (a) PC 1 and (b) PC 2 corresponding to the b aseline - corrected spectra of the first collection of amphetamine mixtures. Figure A.2. Loadings plots for (a) PC 8 and (b) PC 9 corresponding to the pretreated spectra of the training set mixtures. Figure A.3. Average spectra of 80% methamphetamine mixtures in training set and in Test Set 1. Figure A.4. Average spectra of 20% amphetamine and 40% amphetamine from the training set as well as the average spectrum of 30% amphetamine from Test Set 2. Figure 6.1. Core structure of (a) phenethylamine and (b) cathinone with possible substitution sites designated with R n . 52 54 58 59 61 62 6 4 68 69 72 73 8 0 8 1 82 83 9 0 x Figure 6.2. Mass spectrum obtained using GC - QMS. Figure 7.1. Schematic of a gas chromatograph attached to a detector. Figure 7.2 . Example chromato gram of a 4 - component mixture, displaying ideal separation. Figure 7.3. Example of a peak showing fronting. Figure 7.4 . Schematic of an ion source for electron ionization. Figure 7.5 . Example mass spectrum illustrating mass resolution. Figure 7.6. Schema tic of a quadrupole with blue and red lines indicating two possible trajectories of ions at the same moment in time. Figure 7.7. Diagram of an orthogonal acceleration - time - of - flight (oa - TOF) mass analyzer in a high - resolution mass spectrometer. Figure 7. 8 . Structures of (a) 4 - APB and (b) 5 - APDI which have elemental formulae C 11 H 13 NO and C 12 H 17 N, respectively. Figure 8.1. Structures of phenethylamines used in this research. Figure 8.2. Structures of cathinones used in this research. Figure 9.1. Average ma ss spectrum of 2C - H acquired via (a) GC - QMS and (b) GC - TOFMS. Figure 9.2 Proposed fragmentation pathway for 2C - H. Figure 9.3. Average mass spectrum of 4 - APB. Figure 9.4. Proposed fragmentation pathway for 4 - APB. Figure 9.5. Average mass spectrum of 5 - MAPB. Figure 9.6. Proposed fragmentation pathway for 5 - MAPB. Figure 9.7. Average mass spectrum of 2 - methoxy MC obtained via (a) GC - QMS and (b) GC - TOFMS. Figure 9.8. Proposed fragmentation pathway for 2 - methoxy MC. Figure 9.9. Average mass spectrum of - PPP obtained via (a) GC - QMS and (b) GC - TOFMS. 9 1 101 10 2 10 5 10 6 1 1 0 1 1 1 11 4 1 18 12 4 1 26 1 35 1 37 1 39 1 41 1 42 1 43 1 45 1 47 1 48 xi Figure 9.10. Proposed fragmentation pathway for - PPP. Figure 9.11. Molecular ion filter for the phenethylamine class using absolute mass defect (82% CL, n = 3). Figure 9.12. Fragment ion filter for phenethylamines at m/z 77 using absolute mass defect (99.9% CL for n = 5). Figure 9.13. Fragment ion filter at m/z 56 using absolute mass defect for cathinones (99.99% CL for n = 5). Figure 9.14. Fragment ion filter at m/z 9 1 developed for the cathinones (99.998% CL, n = 5). Figure 9.15. Absolute mass defect values of synthetic designer drugs plotted as a function of their exact mass. Figure 9.16. Molecular ion filter for 2C - phenethylamines using KMD (78% CL for n = 2). Figure 9.17. Fragment ion Filter 2 for phenethylamines using KMD (99.9999998% CL, n = 16), with test sets from both classes plotted. Figure 9.18. Fragment ion Filter 4 for phenethylamines using KMD (99.995% CL, n = 11), with test sets from both classes p lotted. Figure 9.19. Fragment ion Filter 6 for phenethylamines (98% CL, n = 5) using KMD. Figure 9.20. Fragment ion Filter 2 for cathinones (99.99% CL, n = 11) using KMD. Figure 9.21. Fragment ion Filter 7 for cathinones (95% CL, n = 4) using KMD. Fig ure 9.22. Fragment ion Filter 9 for cathinones (98% CL, n = 5) using KMD. Figure 9.23. Molecular ion filter for phenethylamines using RMD (82% CL, n = 3), with phenethylamine test set plotted. Figure 9.24. Fragment ion profile for phenethylamines using R MD, with fragment ions from the phenethylamine test set plotted. Figure 9.25. Fragment ion profile for cathinones using RMD, with fragment ions from cathinone test set plotted. 150 1 52 1 54 1 56 1 57 1 60 1 62 1 64 1 67 1 68 1 72 1 74 1 77 1 80 1 81 1 82 xii Figure 9.26. Diagram of a proposed classification scheme using the three types of mass defects for classification of novel synthetic designer drugs to the cathinone or phenethylamine class. Figure B. 1. Mass spectrum of 2C - P obtained by (a) GC - QMS and (b) GC - TOFMS with (c) proposed fragmentation pathway. Figure B. 2. Mass spectrum of 2C - D obtained by (a) GC - QMS and (b) GC - TOFMS with (c) proposed fragmentation pathway. Figure B. 3. Mass spectrum of 6 - APB obtained by (a) GC - QMS and (b) GC - TOFMS with (c) proposed fragmentation pathway. Figure B. 4. Mass spectrum of 5 - MAPDB obtained by (a) GC - QMS and (b) GC - TOFMS with (c) proposed fragmentation pathway. Figure B. 5. Mass spectrum of 3,4 - MDPA obtained by (a) GC - QMS and (b) GC - TOFMS with (c) proposed fragmentation pathway. Figure B. 6. Mass spectrum of 3 - methyl PPP obtained by (a) GC - QMS and (b) GC - TOFMS with (c) proposed fragmentation pathway. Figure B. 7. Mass spectrum of methcathi none obtained by (a) GC - QMS and (b) GC - TOFMS with (c) proposed fragmentation pathway. Figure B. 8. Mass spectrum of 2 - methyl MC obtained by (a) GC - QMS and (b) GC - TOFMS with (c) proposed fragmentation pathway. Figure B. 9. Mass spectrum of 3 - MEC obtained by (a) GC - QMS and (b) GC - TOFMS with (c) proposed fragmentation pathway. Figure B. 10. Mass spectrum of pyrovalerone obtained by (a) GC - QMS and (b) GC - TOFMS with (c) proposed fragmentation pathway. Figure B. 11. Mass spectrum of mephedrone obtained by (a) GC - QMS and (b) GC - TOFMS with (c) proposed fragmentation pathway. Figure B. 12. Structure of cocaine. Figure B. 13. Structures of the (a) m/z 56, (b) m/z 77, and (c) m/z 91 ions common to cathinone training set standards. Figure B. 14. Fragment ion Filter 1 developed for the phenethylamine class using KMD. 1 84 1 86 1 88 1 90 1 92 1 94 1 96 198 200 202 204 206 208 209 210 xiii Figure B. 15. Fragment ion Filter 3 developed for the phenethylamine class using KMD. Figure B. 16. Fragment ion Filter 5 developed for the phenethylamine class using KMD. Figure B. 17. Fragment ion Filter 7 developed for the phenethylamine class using KMD. Figure B. 18. Fragment ion Filter 1 developed for the cathinone class using KMD. Figure B. 19. Fragment ion Filter 3 developed for the cathinone class using KMD. Figure B. 20. Fragment ion Filter 4 developed for the cathinone class using KMD. Figure B. 21. Fragment ion Filter 5 developed for the cathinone class using KMD. Figure B. 22. Fragment ion Filter 6 developed for the cathinone class using KMD. Figure B. 23. Fragment ion Filter 8 developed fo r the cathinone class using KMD. 2 11 2 12 2 13 2 14 2 15 2 16 2 17 2 18 219 1 Part I. Principal Components Regression for the Quantification of Controlled Substances in Sample Mixtures Based on ATR - FTIR Spectra Chapter 1 Introduction 1.1 Controlled Substance Analysis Controlled substances, such as methamphetamine and marijuana, are commonly seen in forensic laboratories. Drug seizures, including methamphetamine, marijuana, cocaine, and heroin, by the Drug Enforcement Agency (DEA) alone range from thousands of kilograms to tens of thousands of kilogr ams annually (1) , which are then sent to the laboratories for analysis and identification . Furthermore, as recently as 2013, drug - related deaths are among the top 10 causes of death in the United States (2) and arrests for drug abuse violations are upwards of 1.5 million, with over 80% of arrests for possession of controlled substances (3) . From th ese s tatistics, it is apparent that the use and abuse of controlled substances is a national concern. As such, the analysis and identification of controlled substances in forensic laboratories is crucial. The Scientific Working Group for the Analysis of Seized Drugs (SWGDRUG) has established recommended guidelines for the identification of controlled substances (4) . The analytical techniques detailed in the guidelines are split into three categories: Categories A, B, and C, which correspond to confirmatory, selective, and pres umptive tests, respectively. Of the three categories, Category A methods are of primary interest, as these are the techniques that allow forensic scientists to definitively identify the controlled substances in submitted samples (4) . In particular, mass spectrometry and i nfrared spectroscopy are two common methods used in controlled substance analyses. Mass spectrometry is often used in conjunction with gas chromatography, which is a SWGDRUG Category B technique . T he combination of the two methods fulfill s SWGDRUG guideli nes of controlled substance analysis, as the recommendations suggest a combination of a 2 Category A and a technique from Category B or C, or any two uncorrelated Category B techniques and one from Category C in the absence of a Category A technique (4) . The combination of the two techniques is known as GC - MS and it is able to confirm the identity controlled substances by separating out different components in mixtures and identifying the components based upon mass and chemical information. A complementary technique to mass spectrometry is infrared spectroscopy, and more specifically, Fourier transform infrared spectroscopy. For ease of analysis, the majority of forensic laboratories utilize attenuated total reflectance - Fourier transform infrared spectroscopy (ATR - FTIR). ATR - FTIR is a rapid and non - destructive method that is commonly used for preliminary analysis and screening of controlled substances in solid samples. It requires minimal sample preparation and is a high - throughput technique with spectroscopic output that is r ich in chemical information. However, due to the nature of the analysis, definitive identification of the controlled substance(s) in street samples can be challenging. Street samples most often contain a multitude of adulterants and diluents that complicat e the spectra and subsequent extraction and purification schemes of the samples are necessary. 1.2 Application of Principal Components Analysis (PCA) Despite these challenges, the complex spectra provide not only qualitative but quantitative information t hat can be extracted using multivariate statistical procedures. One of the most common procedures is principal components analysis (PCA), which is an exploratory approach that displays patterns and trends among complex samples using only a few dimensions. PCA has been widely used in forensic science to successfully discriminate ballpoint pen inks of similar color (5) , paint samples of similar color (6) , tablets containing illicit drugs (7) , and cocaine mixtures (8) . While PCA is strictly a qualitative approach to extracting information in 3 complex samples, there exists other multivariate statistic al procedures that are able to extract quantitative information, such as principal components regression (PCR), that are forensically relevant. 1.3 Application of Principal Components Regression (PCR) PCR is a two - pronged approach that combines PCA and mu ltiple linear regression. The relationship of two matrices, typically designated as X and Y, can be determined and displayed as a calibration curve. The X matrix consists of independent or predictor variables, such as spectroscopic data across a given freq uency range for many different samples, and the Y matrix contains a set of dependent variables such as concentration values for the given samples. Once the relationship between X and Y is determined via multiple linear regression, the calibration curve can then be used to predict the concentration of a questioned sample based on the spectrum of that sample (9) . PCR is widely used in the food and environmental industries for quality control purposes. Models to quantify the analytes of interest, such as methanol and water content in biodiesel (10) , protein content in wheat samples (11) , and various soil properties (12) were developed; the models were able to accurately quan tify the analytes of interest with minimal prediction error (< 5%). These studies have shown that the quantification of multiple components in a single analysis with minimal error is possible. However, these studies used near - infrared reflectance spectrosc opic data in conjunction with PCR, which is not as widely used in forensic laboratories. A few studies have shown the potential of PCR in quantifying controlled substances in sample mixtur es for forensic purposes based on the ATR - FTIR technique . For examp le, Goh et al. developed a PCR model to predict the concentrations of methamphetamine in simulated mixtures analyzed by ATR - FTIR (1 3) . The PCR model was successful in accurately predicting 4 methamphetamine concentrations in mixtures that also contained glucose and caffeine, with prediction error ranging from 3% to 6%. However , the scope of the study was limited to one controlled sub stance, indicating that the identity of the controlled substance had to be known a priori (13) . In the event of a complex mixtur e containing multiple controlled substances or if the identity of the controlled substance is not known, quantification with the model developed by Goh et al. may become challenging. Penido et al. also developed a PCR model for the quantification of contr olled substances in simulated samples (14) . The focus of the study was on cocaine mixtures analyzed by two different methods: Raman spectroscopy and FTIR spectroscopy. Despite the success of the model, the study concluded that spectroscopic data obtained by Raman spectroscopy resulted in a better model with higher prediction ability (14) . The limitation of this study is the application to forensic laboratories, as most forensic laboratories do not utilize Raman spectroscopy as a technique for definitive identification. Instead, FTIR spectroscopy is more widely used. Also, similar to the study by Goh et al. , the scope of the study was limited to one controlled substance. Thus, it is advantageous to develop a PCR model that is able to identify and quantify multiple controlled substances in sample mixtures in a single analysis with minimal error, and to investigate the inclusion of a wide range of controlled substances in the model. 1.4 Non - sample So urces of Variance in Spectroscopic Data Prior to performing any multivariate statistical procedure, it is imperative that only chemical information is considered and that the data are not dominated by non - sample sources of variance. This is because small d ifferences among samples can be extracted using multivariate statistics even though these differences may not be chemically meaningful. To that end, data pretreatments are used as preprocessing tools in order to minimize artifacts that may arise from instr ument 5 variation and ambient conditions. Those artifacts that are relevant to spectroscopic data include sloping baselines, instrument noise, differences in total signal due to variation in detector response, and signal reduction due to scattering. Sloping baselines in spectroscopic data may indicate the presence of water or water vapor. The OH stretch in water molecules results in vibrations around the 3600 3400 cm - 1 range, and manifests itself as a broad peak. As ATR - FTIR is performed under ambient cond itions, the sample may contain some water vapor that is absorb ed from the atmosphere during sample collection. But because exposure time of the sample to the atmosphere is fairly short, water vapor absorption is minimal, resulting in a shapeless peak that creates a sloping baseline in the higher wavenumber region of the spectrum. While this phenomenon indicates the presence of water, it is still a non - sample source of variance since it is not inherent to the sample itself ( i.e. the sample is not hygroscopic ) and thus, does not provide chemical information about the sample. This non - sample variance can be reduced using baseline correction methods. I nstrument noise , which manifests itself as small irregular spikes that resemble minute peaks in a spectrum , is a nother example of a source of non - sample variance . Typically, this phenomenon is due to the inability to fully r educe background noise and display only signals. Background noise is more apparent in the baseline where minimal signal is present as opposed t o peaks, since the signal - to - noise ratio is substantially higher in peaks. Smoothing the data is generally used to reduce background noise with the goal of improving the signal - to - noise ratio. Total s ignal across the entire frequency range can vary as the detector response can differ from sample to sample. In these cases, the total signal in each spectrum is different. While this may be expected for samples that are chemically different, replicates of the same sample theoretically should have the same tota l area as they express the same chemical information. To 6 correct for differences in detector response, normalization is performed, which re - scales the data so that the spectra are more comparable. The re - scaling across the frequency range does not affect t he peak pattern; instead, the magnitude of the re - scaling is similar, if not equivalent, across the frequency range. Particle size and differences in light scatter between samples are a major source of non - sample variance, particularly in solid samples. The amount of light scatter may vary from sample to sample and this variation may be compounded by the differences in particle size in a solid sample. Light scatter reduces the efficiency of light transmission to the detector, resulting in signal reductio n. To correct for these two phenomena, scatter correction is often used. After scatter correction, improvement in signal intensity is typically observed. 1.5 Research Objective The objective in this research was to develop a PCR model to identify and quantify the controlled substances present in simulated sample mixtures analyzed by ATR - FTIR. These sample mixtures contained amphetamine and caffeine or methamphetamine and caffeine at different concentrations. The first goal was to investigate a series of data pretreatment procedures and determine the optimal set of procedures prior to performing PCR. This is necessary to ensure that non - sample sources of variance are substantially reduced and that non - sample variance will not dominate the model. The second goal was to determine the optimal number of PCs for regression by investigating different methods of PC selection. Once the regression was performed, the third goal was to evaluat e performance of the PCR model in quantifying the controlled substance present in a test set of sample mixtures. With this PCR model, both identification and quantification are performed in a single analysis. 7 As ATR - FTIR is a rapid technique that provides definitive identification of controlled substance with minimal sample preparation, it is advantageous to apply this method for both identification and quantification of controlled substances using multivariate statistics. By applying the PCR model to FTIR spectra, both identification and quantification of controlled substances in sample mixtures can be achieved in a single analysis without the need to perform a separation. The use of PCR also increases the confidence and objectivity of the analysis, as a kn own error rate derived from the model can be assigned to the resulting output. 8 REFERENCES 9 R EFERENCES 1. DEA. Statistics and Facts. DEA.gov; 2015 [updated 2015; cited]; Available from: http://www.dea.gov/resource - center/statistics.shtml#seizures . 2. Statistics CNCfH. Detailed Tables for the National Vital Statistics Report (NVSR) 3. FBI. Crime in the U.S. 2013: Estimat ed Number of Arrests. FBI; 2015 [updated 2015; cited]; Available from: https://www.fbi.gov/about - us/cjis/ucr/crime - in - the - u.s/2013/crime - in - the - u.s. - 2013/tables/table - 29/table_29_estimated_number_of_arrests_united_states_2013.xls . 4. SWGDRUG. SWGDRUG Recommendations Version 7 - 0, Contract No.: Document Number|. 5. Senior S, Hamed E, Masoud M, Shehata E. Characterization and Dating of Blue Ballpoint Pen Inks Using Principal Component Analysis of UV Vis Absorption Spectra, IR Spectroscopy, and HPTLC. Jo urnal of Forensic Sciences. 2012;57(4):1087 - 93. 6. Muehlethaler C, Massonnet G, Esseiva P. The application of chemometrics on Infrared and Raman spectra as a tool for the forensic analysis of paints. Forensic Science International. 2011 6/15/;209(1 3):173 - 82. 7. Romão W, Lalli P, Franco M, Sanvido G, Schwab N, Lanaro R, et al. Chemical profile of meta - chlorophenylpiperazine (m - CPP) in ecstasy tablets by easy ambient sonic - spray ionization, X - ray fluorescence, ion mobility mass spectrometry and NMR. Anal B ioanal Chem. 2011 2011/07/01;400(9):3053 - 64. 8. Marcelo MCA, Mariotti KC, Ferrão MF, Ortiz RS. Profiling cocaine by ATR FTIR. Forensic Science International. 2015 1//;246(0):65 - 71. 9. Massy WF. Principal Components Regression in Exploratory Statistical R esearch. Journal of the American Statistical Association. 1965;60(309):234 - 56. 10. Felizardo P, Baptista P, Menezes JC, Correia MJN. Multivariate near infrared spectroscopy models for predicting methanol and water content in biodiesel. Analytica Chimica A cta. 2007 7/9/;595(1 2):107 - 13. 11. Mahesh S, Jayas DS, Paliwal J, White NDG. Comparison of Partial Least Squares Regression (PLSR) and Principal Components Regression (PCR) Methods for Protein and Hardness Predictions using the Near - Infrared (NIR) Hypers pectral Images of Bulk Samples of Canadian Wheat. Food Bioprocess Technol. 2015 2015/01/01;8(1):31 - 40. 10 12. Chang C - W, Laird DA, Mausbach MJ, Hurburgh CR. Near - Infrared Reflectance Spectroscopy Principal Components Regression Analyses of Soil Properties. S oil Science Society of America Journal. 2001;65(2):480 - 90. 13. Goh CY, van Bronswijk W, Priddis C. Rapid nondestructive on - site screening of methylamphetamine seizures by attenuated total reflection Fourier transform infrared spectroscopy. Applied spectro scopy. 2008;62(6):640 - 8. 14. Penido CAFdO, Silveira L, Pacheco MTT. Quantification of Binary Mixtures of Cocaine and Adulterants Using Dispersive Raman and Ft - Ir Spectroscopy and Principal Component Regression. Instrumentation Science & Technology. 2012;4 0(5):441 - 56. 11 Chapter 2 Theory 2.1 Attenuated Total Reflectance - Fourier Transform Infrared (ATR - FTIR) Spectroscopy Attenuated total reflectance - Fourier transform infrared (ATR - FTIR) spectroscopy is a rapid and non - destructive technique that enables identification of molecules based on the vibrational motions elicited by these molecules under irradiation by infrared (IR ) light. The ATR technique combines total internal reflection within a crystal and conventional FTIR spectroscopy in order to generate a spectrum containing peaks of various intensities that are characteristic of the vibrational motions of different functi onal groups. Through the identification of functional groups in a spectrum, the identity of a molecule can be determined. Molecules undergo irradiation via a source lamp that emits a broad spectrum of light, especially in the infrared region (10 13,333 cm - 1 ) . The light then enters into the Michelson interferometer ( Figure 2. 1) (1) . Figure 2.1. Diagram of a Michelson interferometer in an FTIR instrument. 12 The broad light beam travels from the source lamp to a partially reflective mirror known as the beamsplitter, which splits the light beam into two. One beam of light passes through the beamsplitter to a completely reflective mirror while the other is reflected orthogonall y to another reflective mirror. Of the two reflective mirrors, one is stationary while the other is a moving mirror that is set to continuously traverse a certain distance at a fixed rate. The two light beams are reflected from the two mirrors to recombine at the beamsplitter (2) . Because the distance from the moving mi rror to the beamsplitter may not be the same as the distance from the stationary mirror to the beamsplitter, different patterns of constructive and destructive interferences occur, generating superposed waves at characteristic , discrete frequencies. The sp ectrum that results from the superposed waves is known as the interferogram (2) . One of the most noticeable features in an interferogram is the oscillating wave with the highest amplitude. This point is known as the centerburst, and is the result of the superposition of two waves that are completely in phase. At this point, only constructive interference occurs, and is due to the distance from the stationary mirror to the beamsplitter being equivalent to that from the moving mirror to the beamsplitter. The distance at which this occurs is known as the zero - path differe nce (ZPD) (3) . ZPD is an impor tant parameter that is necessary for alignment in the instrument prior to data collection. The recombined IR beam is then introduced to the sample, which is placed on top of the crystal, through a total internal reflection mechanism. Total internal reflec tion is achieved due to differences in refractive index between the air and the ATR crystal itself, which is illustrated in Figure 2. 2. Examples of ATR crystals include diamond and zinc selenide (ZnSe) (4) . These materials are dense and have high refractive indices. Different materials in ATR crystals have different properties and are available depending on the type of analysis. Zinc selenide is a 13 relatively soft material and is suited for analysis of liquids and oils while diamond is a more robust material ideal for powders and other solids. For universal application, ATR crystals with a mixture of diamond and zinc selenide are advantageous (4) . Figure 2.2. Diagram of an ATR accessory set - up depicting the IR light path generated via reflection from the mirrors. As the IR beam is reflected from the mirrors towards the crystal, the difference in refractive indices at the first air - crystal boundary results in partial refraction and partial reflection of the incident light (3) . However, as the refracted light beam comes into contact with the crystal - air boundary, which is also the point at which the sample contacts the crystal surface, at a certain angle, the light is completely reflected. Two criteria must be met in order for total internal reflection to occur. First, the angle of the incident light must be greater than the critical angle needed to achieve reflection. Second, the medium at the other side of the boundary must have a smaller refractive index than the medium throu gh which incident light passes (5) . The generation of total internal reflection also creates an evanescent wave at the crystal - air boundary. This wave sits at the boundary and when a sample is in contact with the crystal , the wave penetrates into the sample. The depth of penetration is generally below 10 mm ; in order to ensure that the sample can be irradiated with IR light, good contact must be maintained between 14 the sample and the ATR crystal. This is especially importa nt for solid samples, due to packing and density differences that exist in these samples. A pressure arm can be used to apply a certain amount of force on the sample to ensure good contact (4) . When the evanescent wave penetrates into the sample, the molecules in the sample absorb some of the light at various wavelengths and are then excited due to the excess energy gained. The excitation manifests itself as molecular vibrations, suc h as bond stretching and bending. The IR beam not absorbed will then pass through the sample, and the photons in the beam can be detected and counted with a detector, which is usually a photomultiplier tube (3) . At this point, the interferogram of the transmitted spectrum is generated. The interferogram is then sent to the computer for processing, and this is where Fourier transform can occur, in order to reduce the complexity associated with an interferogram. Fourier transform is the conversion from the time domain back to the frequenc y domain. As the interferogram is that of the intensity of wavelengths for given frequencies in the time domain, it can quickly be converted into the frequency domain so that the independent variable is frequency rather than time. This simplifies the spect rum drastically, and the output is a spectrum where the percent of transmitted light, or transmittance, is plotted as a function of frequency, or wavenumber. While the transmittance is directly measured by the detector, the measurement that is most useful is absorbance, which is a quantitation of the amount of light absorbed. Since the two functions are related through the following relationship (Eq. 2.1) (2) : , (2.1) where A is the absorbance and T is the transmittance, the spectrum obtained can easily be converted to displ ay the absorbance intensity at each wavenumber. It is assumed that all light 15 that is not transmitted is absorbed by the analyte. Absorbance information is more useful due to Eq. 2.2: (2.2) where absorbance is dependent on , the molar absorptivity, b, the pathlength, and c, the concentration of the analyte. This equation illustrates the linear relationship between concentration and the amount of light absorbed for a given pathlength and known absorptivity of the analyte (3) . Some limitations of FTIR spectroscopy include small molar absorptivity values and variations in pathlengths, which subsequently affect absorbance measurements. Since molar absorptivity is i nherent to an analyte for a given wavelength, this property cannot be optimized to yield larger absorbance values (2) . Therefore, the only parameter that can be optimized is the pathlength. Due to the direct proportional relationship between pathlength and absorbance, larger pathlengths are desired. This can be achieved using ATR - FTIR spectroscopy, in which multiple internal reflections increase the pathlength from that of traditional FTIR spectroscopy. However, this operates under the assumption that all incident light is completely reflected at each internal reflection and that no light is lost in this process. Depending on the physical properties of the sample, light scattering can occur at each internal reflection point. In this case, scattering may be compounded by having multiple internal reflections and an ATR crys tal that only utilizes a one - bounce reflection may be advantageous. The compromise with using a crystal with less total internal reflection is the loss in sensitivity and the reduced pathlength, which results in lower absorbance measurements. Despite these limitations, the rapid analysis and minimal sample preparation associated with ATR - FTIR spectroscopy make it a useful technique for the analysis of solid samples. 16 2.2 Data Pretreatment Data pretreatment procedures are performed on various types of data post - acquisition to minimize non - sample sources of variance while still retaining all of the chemical information. These procedures include baseline correction, smoothing, normalization, and scatter correction for spectroscopic data. Baseline correction as applied in spectroscopic data is useful in reducing sloping baselines, which is a common phenomenon. A typical baseline correction involves the subtraction of a mathematical function from the original data. The mathematical function chosen is data - dependent, since different baseline patterns exist for different types of data. For sloping baselines, a polynominal function, such as a quadratic function, is fitted to the original spectrum and is then subtracted (6) . The function can be designed to target certain portions of the spectrum so that regions containing complex chemical information are not as heavily affected by the correction. This requires a function that can be adapted to tar get different regions given the original spectrum. With this method, a knowledge of the baseline region for each spectrum is necessary. The first - derivative of the original spectrum can convey this information. In first - derivative spectra, tangents with ma gnitudes close to zero indicate the baseline region whereas slopes with large magnitudes in the positive or negative directions indicate the presence of peaks that convey chemical information. Thus, a two - step process involving a first - derivative with a qu adratic function fitting is appropriate and advantageous for spectroscopic data. Smoothing is normally performed in order to reduce noise in the data, either in the peaks or in the baseline. Different algorithms can be applied to achieve various levels of smoothing, depending on the extent of the noise observed in the data. A common smoothing algorithm is the Savitzky - Golay filt er (6) . This algorithm utilizes moving windows across a spectrum. A window 17 contains a set number of data points in the spectrum, and the window is moved along the spectrum so that the smoothing occurs in e ach window (7) . U sing the least - squares method, a polynomial function is fit to the center point in the moving window so that the error between the function and the data points in that window is min imized, and this process continues across the spectrum with the moving windows until the entire spectrum is fit. These fitted functions then replace the original spectrum as the smoothed spectrum (7) . The block averaging, or moving average, smoothing algorithm is used as well. This type of smooth is the result of fitting the center point of a moving window of data points with a straight line rather than with different polynomial functions (8) . A combination of the two smoothing algorithms is advantageous as the smooth can be adapted to different regions of the spectrum. Normalization is a data pretreatment procedure that focuses on removing between - sample variance ar ising from signal differences between replicates. For example, two replicates of a sample can vary in total signal. While the peak pattern is maintained, instrument variation and differences in detector response may lead to spectra exhibiting this phenomen on. Although different normalization procedures can be applied to correct for this, standard normal variate (SNV) normalization is commonly applied to spectroscopic data (9) . This no rmalization procedure scales the spectrum in a manner that is similar to a z - score calculation at each independent variable. Each response at an independent variable in the X matrix is scaled using the following function: (2.3) where x n, i is the scaled response at the i th variable as a result of SNV normalization, x i the original response at the i th variable, is the average response across the independent variables, 18 and s is the standard deviation of the responses across the independent variables. Thus, the scaling is data set - independent, since the scaling factor varies from sample to sample (9) . Scatter correction is used to remove variation in samples due to differences in light scatter, which changes the efficiency of light transmission. A common algorithm to minimize such differenc es is multiplicative scatter correction (MSC) (10) . In this procedure, an average spectrum, also known as the reference spectrum, is determined across all spectra in a data set ( i.e. the X matrix). Each spectrum is then regressed against the reference spectrum to determine the multiplicative factor between the two spectra so that the product of the reference spectrum and the factor is equivalent to the measured spectrum. The measured spectrum is then norm alized to the multiplicative factor , which is a vector . An average value for the reference spectrum is calculated and the sum of the average and the normalized spectrum results in the corrected spectrum (10 ) . As a result, MSC is a data set - dependent method as the degree of correction performed is heavily influenced by other spectra in the data set. 2.3 Principal Components Regression (PCR) Principal components regression (PCR) is a multivariate statistic al procedure that enables the quantification of analytes of interest in complex samples through a combination of principal components analysis and multiple linear regression (11) . Principal components analysis (PCA) is an exploratory multivariate statistical procedure that can associate and/or discriminate complex sa mples based on variance (12) . PCA is the first step that is performed in PCR, and it is performed primarily to reduce the dimensions and the variables in the input data, or X matrix. For examp le, a spectrum generated via Fourier transform infrared spectroscopy in the mid - IR range will typically contain around 3000 independent variables since there is a signal associated with each wavenumber (or independent variable). 19 However, not all 3000 varia bles contribute or influence a sample equally, and therefore, it is necessary to reduce the number of variables to a substantially smaller number that can still provide chemically meaningful information. PCA does this by grouping independent variables that covary linearly into a principal component (PC) (12) . PCs are vectors that describe a set of independent variables and the main constraint is that each PC is orthogonal to the preceding one i n matrix space. Therefore, all PCs are uncorrelated to each another. The eigenvalue of each PC, or in broader terms, the variance accounted for by each PC, is determined by decomposition of the X matrix, which is detailed below (12) . Variance can be thought of as the magnitude component of a vector. PCs are vectors that have a magnitude and direction; each PC extends out in multidimensional space by the magnitude specified in the variance. Vari ance among the samples in the data set exist due to different responses of the samples to the independent variables. By decomposing the independent variables into PCs, the variance corresponding to a set of variables described by a PC is quantified in that PC. In PCA, a common algorithm for the decomposition is singular value decomposition (SVD). This algorithm determines singular values that become the variance associated with each PC (13) . Singul ar values are determined with this relationship: X = USV T (2.4) where U is the matrix that contains eigenvectors in the rows, S is the diagonal matrix that contains the singular values, and V is the matrix that contains the eigenvectors in the colu mns. As observed in the above equation, eigenvectors are also found during the decomposition, and these vectors are the PCs themselves. Each vector describes a set of variables that covary linearly and all vectors are positioned orthogonally in multidim ensional space (12) . In the simplest sense, the application of PCA to an X matrix is described by Eq. 2.5 (12) : 20 X = TL T (2.5) where T is the scores matrix and L T is the transposed matrix of the loadings. The loadings matrix contains loadings, or the weights and direction of each independent variable, for each PC. The scores matrix contains the scores of each sample for each eigenvector, or PC; a score is determined as the summation of the multiplicative effects of the loadings and the vector of the mean - centered X matrix at each independent variable for a PC (1 2) . SVD can be used as the decomposition method in PCA with relative ease since the following relationships hold: L V and (2.6) T US (2.7) Once the X matrix is decomposed into its components, multiple linear regression can be performed on the data set in PCR. The goal of multiple linear regression (MLR) is to accurately predict Y, the dependent variable, given any X matrix. The regression takes the general form of the following equation (14) : Y = X + E (2.8) where Y is the matrix containing the dependent variables, such as concentration values, is the regression vector which contains the regression coefficients, X is the matrix containing the independent variables, such as a set of spectra, and E is the error associated with the regression in the Y direction. The relationship between X and Y is determined in this procedure, and is typically found by using the least - squares method of minimizing the error associated with each value in the X matrix. This is a calibration curve similar to ordinary linear regression, but the regression is performed with multiple variables, such as an absorbance spectrum that contains over 3000 independent variables. 21 Replacing X with the decomposi tion from PCA yields Y = (USV T ) + E (2.9) which can be rewritten as = (U T S - 1 V)Y (2.10) assuming that E is minimal and not significant. Given a new X, or a new set of spectra, a new Y, or a matrix containing dependent variables such as concentration values, can be predicted as shown in Eq. 2.11: Y new = X new (2.11) It is important to note that the decomposition in PCA does not depend on the Y matrix, and that the X matrix input for the regression is solely dependent on the scores and loadings determined in PCA. 2.4 PC Selection for Regression A main concern prior to establishing a calibration curve is the number of PCs to include in the regression. Intuitively, prediction ability is positively correlated with the number of PCs included in the model: more PCs allow for the calibration curve to better represent the X matrix, from which the PCs ar e determined. The incorporation of more PCs in the regression means that a greater proportion of the variance in the X matrix is accounted for in the calibration curve. However, overfitting the calibration curve can occur with the inclusion of too many PCs , as subsequent PCs begin to describe non - sample sources of variance (15) . Therefore, it is imperative to determine the optimal number of PCs to include in the regression. - order of greatest contribution to variance, and subsequent PCs are selected until a criterion is met (16) . The most often used criterion is the minimum predicted residual sum of squares (PRESS) 22 value for a certain number of PCs included in the regression, which can be determined via internal val idation. The PRESS value is then converted into an error value associated with the validated model (15) . More information regarding PRESS values and internal validation are detailed in Section 2.5. In order to develop an optimal regression model, the error must be at a minimum, and this point is determined in PC selection. 2.5 Leave - one - out Cross Validation Leave - one - out cross validation (LOO CV) is an example of an internal validation method (15) . It is commonly used in validating a regression model; in PCR, it is primarily used to determine the optimal number of PCs to include i n a regression. Using this method, a model is created with one sample in the data set (or X matrix) left out and the prediction error in the regression is calculated for that sample. This process is then repeated until all samples have been left out once a nd the errors are then summed to give the prediction residual error sum of squares (PRESS) value (15) . Through leave - one - out cross validation, a PRESS value is determined for a regression that only includes the first PC. A second PRESS value is determined for a regression that includes the fir st two PCs, and a third PRESS value is determined for a regression that includes the first three PCs, and so on until all PCs are included or until a certain criterion is reached. These PRESS values are compared to determine the minimum value; the number o f PCs for which a minimum PRESS exists is considered to be the optimal number of PCs to be used in the calibration (15) . This is because the minimum PRESS also represents the point at which the error in the regression model is minimal. The PRESS values are compared through an F test at a user - defined confidence level to determine whether the values are significantly different from one another. 23 While PRESS values are useful, a more objective measure of error is standard error; in the case of validation, this is known as the standard error of va lidation (SEV) (15) . The two are relat ed using the following equation: (2.12) where n is the number of samples in the data set used in the validation. By normalizing to the number of samples in the data set, the error associated with the model is not biased by including too few or too many samples. In the cross - validation of a PCR model, an SEV value is determined for each PC. The comparison of SEV values results from the comparison of PRESS values described above, as SEV comparisons are typically within - sample comparisons between the validation errors associated with different PCs. 24 REFERENCES 25 R EFERENCES 1. Smith LM, Dobson CC. Absolute displacement measurements using modulation of the spectrum of white light in a Michelson interferometer. Applied Optics. 1989;28(16):3339 - 42. 2. Smith BC. Fundamentals of Fourier Transform Infrared Spectroscopy, Second Edition: CRC Press, 2011. 3. Skoog DA, Holler FJ, Crouch SR. Principles of Instrumental Analysis : Thomson Brooks/Cole, 2007. 4. PerkinElmer. FT - IR Spectroscopy Attenuated Total Reflectance (ATR). 2005. 5. Axelrod D. Cell - substrate contacts illuminated by total internal reflection fluorescence. The Journal of cell biology. 1981;89(1):141 - 5. 6. Perk inElmer. Spectrum One FT - IR Spectroscopy. 2004. 7. Savitzky A, Golay MJ. Smoothing and differentiation of data by simplified least squares procedures. Analytical chemistry. 1964;36(8):1627 - 39. 8. Tong H. Fitting a smooth moving average to noisy data (Corresp.). Information Theory, IEEE Transactions on. 1976;22(4):493 - 6. 9. Barnes R, Dhanoa M, Lister SJ. Standard normal variate transformation and de - trending of near - infrared diffuse reflectance spectra. Applied spectroscopy. 1989;43(5):772 - 7. 10. Isa ksson T, Næs T. The effect of multiplicative scatter correction (MSC) and linearity improvement in NIR spectroscopy. Applied Spectroscopy. 1988;42(7):1273 - 84. 11. Massy WF. Principal Components Regression in Exploratory Statistical Research. Journal of th e American Statistical Association. 1965;60(309):234 - 56. 12. Wold S, Esbensen K, Geladi P. Principal component analysis. Chemometrics and Intelligent Laboratory Systems. 1987 8//;2(1 3):37 - 52. 13. Lathauwer LD, Moor BD, Vandewalle J. A Multilinear Singular Value Decomposition. SIAM Journal on Matrix Analysis and Applications. 2000;21(4):1253 - 78. 14. Sutter JM, Kalivas JH, Lang PM. Which principal components to utilize for principal component regression. Journal of Chemometrics. 1992;6(4):217 - 25. 1 5. Varmuza K, Filzmoser P. Introduction to Multivariate Statistical Analysis in Chemometrics: CRC Press, 2009. 26 16. Xie Y - L, Kalivas JH. Local prediction models by principal component regression. Analytica Chimica Acta. 1997 8/20/;348(1 3):29 - 38. 27 Cha pter 3 Materials and Method 3.1 Sample Preparation and Collection Amphetamine sulfate and methamphetamine hydrochloride were obtained from Sigma - Aldrich (St. Louis, MO). Caffeine was purchased from Eastman Chemical (Kingsport, TN). Training set mixtures containing amphetamine and caffeine at different concentrations were prepared by first weighing out the appropriate mass of both compounds (Table 3.1) and homogenizing with a mortar and pestle. The total mass of each mixture was 10 mg . The samples were stored in capped vials at room temperature. Additional sample mixtures (denoted as Test Set 1 and Test Set 2) were prepared and used to validate the model. Test Set 1 mixtures were prepared in the same manner and contained the same conce ntrations by mass that were used in the training set (Table 3.2). Test Set 2 mixtures were prepared in the same manner at concentrations not included in the training set (Table 3.3). Training and test set mixtures containing methamphetamine and caffeine we re prepared in a similar manner, except replacing amphetamine with methamphetamine. Each training set sample mixture was analyzed by ATR - FTIR five times in triplicate over a five - month period to account for variation in the instrument and ambient conditio ns over a period of time. The samples were not re - mixed and homogenized again after preparation. A small amount of each mixture (approximately 1 mg) was taken out of the vial in each sampling and then replaced after analysis. Replicate measurements were pe rformed by sampling from the same vial. Test set mixtures were sampled and analyzed in a similar manner six times in triplicate over a two - month period. methamphetamine an d mixing with an amount of caffeine (Table 3.4). The total mass varied 28 from sample to sample. The mixtures were then homogenized with a mortar and pestle prior to Table 3.1. Training set sample mixtures containing amphetamine and caffeine. Concentration (% w - %w) Amphetamine (mg) Caffeine (mg) 00 - 100 0 10 20 - 80 2 8 40 - 60 4 6 60 - 40 6 4 80 - 20 8 2 100 - 00 10 0 Table 3.2. Test Set 1 mixtures containing amphetamine and caffeine. Concentration (% w - %w) Amphetamine (mg) Caffeine (mg) 20 - 80 2 8 40 - 60 4 6 60 - 40 6 4 80 - 20 8 2 Table 3.3. Test Set 2 mixtures containing amphetamine and caffeine. Concentration (% w - %w) Amphetamine (mg) Caffeine (mg) 10 - 90 1 9 30 - 70 3 7 50 - 50 5 5 70 - 30 7 3 90 - 10 9 1 Amphetamine (mg) Caffeine (mg) Methamphetamine (mg) 0.8 5 0 2.3 5.8 0 0 4.7 3.0 29 3.2 Instrument Parameters Spectra were collected using a Spectrum One FTIR with a Universal ATR Accessor y (Perkin Elmer, Waltham, MA ). The ATR crystal was a d iamond/ZnSe one - bounce crystal. The sca n range was 4000 650 cm - 1 , and raw spectra were obtained by averaging four scans in transmittance mode. The pressure of the anv il against the sample was set to 80 units of pressure. Prior to data collection, a system suitability check was performed on the instrument in order to assess noise, throughput, and contamination levels. Minimal levels of each below the threshold values we re observed, indicating that the instrument was in good condition. A background scan was also performed prior to data collection. The ATR crystal was cleaned with acetone after each replicate. A background scan was performed after every two samples were an alyzed in triplicate. 3.3 Data Pretreatment Raw spectra were first converted to absorbance mode through the instrument software (Spectrum v.5.0.1, Perkin Elmer , Waltham, MA). The data pretreatment investigation was only performed using spectra of the firs t collection of amphetamine - caffeine mixtures. Baseline correction and smoothing were performed using the appropriate functions available in the instrument software. The baseline region was first identified through a first - derivative plot and a quadratic function was then fit to the baseline region. This function was subsequently subtracted from the raw spectrum. Baseline - corrected spectra were then smoothed using a combination of the Savitzky - Golay and the Block Averaging algorithms to ensure that the bas eline regions were more heavily smoothed than the peaks. Standard normal variate normalization was performed in Microsoft Excel (Microsoft, Redmond, WA ). Principal components analysis and principal components regression were performed using Pirouette v.4.0 30 (Infometrix, Bothell, WA). All output data were exported to Microsoft Excel for further processing. The effect of the data pretreatment procedures was visually assessed in the spectra and quantified in the PCA scores plot. Changes in the PCA scores plot as a result of the data pretreatment procedures were quantified by determining the average percent change in the clustering (PCC) of replicates between the PCA scores of the raw spectra of the first collection of samples ( i.e. 6 samples in triplicate) and those of the spectra after each pretreatment (1) . For each sample, the average score s on PC 1 and PC 2 were calculated along with vari ance. The variance for both PC 1 and 2 was then summed, as variance is additive for independent and normally distributed data. The standard deviation accounting for both PCs was then calculated. This procedure was then repeated for all six samples in the u ntreated data and then following each data pretreatment. The percent change in the standard deviation of the scores for each sample was calculated and averaged across all samples. The average PCC for each pretreatment w as assessed to determine optimal pret reatment based on the PCC value with the largest magnitude. Improvement in clustering due to pretreatment is indicated by a negative PCC value; the optimal set of data pretreatments is indicated by the largest negative PCC value. 3.4 Example Spectra of Co mponents in Sample Mixtures The pretreated spectra of caffeine, amphetamine, and methamphetamine from the first collection of sample mixtures are shown in Figure s 3.1 3.3 , respectively . The spectrum of caffeine (Figure 3.1) displays two small peaks at 31 15 and 2950 cm - 1 that correspond to CH stretches of the methyl groups in caffeine. The two intense peaks at 1700 and 1600 cm - 1 correspond to the C=O stretches from the two amide groups in the molecule. The peaks ranging from approximately 1600 1400 cm - 1 correspond to ring stretches of both the pyrimidine and 31 imidazole components of the molecule. The sharp peak s at 1250 cm - 1 and between 1100 and 900 cm - 1 correspond to various C - N stretch es , and finally, the peak at 700 cm - 1 cm - 1 result s from CH out - of - plane bends. Figure 3.1. Average spectrum of caffeine from the first collection of amphetamine - caffeine mixtures after baseline correction, smoothing, and standard normal variate normalization. The spectrum of amphetamine is shown in Figure 3 .2 and displays intense peaks that range from 3150 26 00 cm - 1 and correspond to CH stretches . Other charac teristic peaks include those ranging from 1650 1300 cm - 1 that correspond to C - C stretches in the benzene ring , as well as the intense peaks betwee n 1200 and 900 cm - 1 , which correspond to NH bends . The two peaks between 750 and 650 cm - 1 correspond to CH out - of - plane bends. 32 Figure 3.2. Average spectrum of amphetamine from the first collection of amphetamine - caffeine mixtures after baseline correct ion, smoothing, and standard normal variate normalization. The spectrum of methamphetamine is displayed in Figure 3.3. Characteristic peaks ranging from 3150 2300 cm - 1 correspond to CH stretches. Similar to amphetamine, the peaks at 1650 1300 cm - 1 correspond to C - C stretches in the benzene ring. The peaks ranging from 1200 900 cm - 1 are less intense in the spectrum of methamphetamine compared to amphetamine due to the weaker dipole moment in the NH bend for methamphetamine , as the amine is a seco ndary amine rather than a primary amine. The two peaks between 750 and 650 cm - 1 correspond to CH out - of - plane bends . While similarities between the spectra of amphetamine and methamphetamine are observed, absorbance in the higher wavenumber region (i.e. 31 50 2300 cm - 1 ) is more intense for methamphetamine, resulting in more intense peaks in this region. 33 Figure 3.3. Average spectrum of methamphetamine from the first collection of methamphetamine - caffeine samples after baseline correction, smoothing, and standard normal variate normalization. 34 REFERENCES 35 R EFERENCES 1. McIlroy JW. Effects of data pretreatment on the multivariate statistical analysis of chemically complex samples [M.S.]. Ann Arbor: Michigan State University, 2014. 36 Chapter 4 Results and Discussion 4.1 Effects of Data Pretreatment 4.1.1 Baseline Correction The effect of each data pretreatment procedure was visually inspected in spectra and quantified in the PCA scores plot. Figure 4.1 shows the effect of baseline correction in the replicate spectra of caffeine. In the raw spectra, a sloping baseline is observed at the higher wavenumber region (4000 3150 cm - 1 ) (Figure 4.1a). The rise in baseline does not originate from chemical sources; instead, it is the result of water vapor absorption from the atmosphere during sample collection. After baseline correction, the sloping baseline is substantially reduced in all replicates (Figure 4.1b). However, other non - sample sources of variance are present in the spectra even after baseline correction, indicating that further pretreatment is necessary. Figure 4.1. The effect of baseline correction in spectra as compared to raw spectra. (a) Raw absorbance spectra of three replicates of caffeine. (b) Baseline - corrected spectra of the caffeine sample. 37 Figure 4.2 displays the effect of baseline correction in the PC 1 vs. PC 2 scores plots. The total variance accounted for by PCs 1 and 2 in the scores plot of the raw spectra (Figure 4.2a) is 98.2%. PC 1 separates samples based upon amphetamine content. Sa mples with higher caffeine content and lower amphetamine content are positioned more positively on PC 1; caffeine samples are positioned most positive on the axis. Samples with higher amphetamine content and lower caffeine content are positioned more negat ively on PC 1; replicates of amphetamine are positioned most negative on PC 1. The other sample mixtures are positioned along PC 1 according to their amphetamine and caffeine content. PC 2 separates replicates within each of the six samples. In this PCA sc ores plot, the replicates for each mixture span a large space in both dimensions, which is not expected for replicates that contain the same chemical information. Ideally, replicates of the same mixture should overlap since they are chemically the same; 38 ho wever, the large spread among replicates indicates that non - sample sources of variance, especially differences in peak heights, dominate PC 2. Figure 4.2. The effect of baseline correction observed in the PCA scores plot. (a) PCA scores plot of PC 1 vs . PC 2 of the raw spectra consisting of all replicates of the six samples. (b) PCA scores plot for the baseline - corrected spectra. 39 Associated loadings plots for PC 1 and PC 2 corresponding to the raw spectra are shown in Figure 4.3. From the loadings plot for PC 1 (Figure 4.3a), the dominant source of variance is the difference in amphetamine and caffeine content among the samples. In the loadings plot, all peaks that are present in amphetamine ( e.g. peaks between 3150 2600 cm - 1 ) have a negative weightin g while peaks that correspond to caffeine ( e.g. two peaks in the 1700 1600 cm - 1 region) are weighted positively. Therefore, samples in PC 1 are separated based on concentration, where samples with a higher amphetamine content are positioned more negative ly on PC 1 while those samples with a higher caffeine content are positioned more positively on this PC. This is somewhat reflected in the scores plot; however, the replicates in each sample are spread over a wide range along the PC 1 axis and samples even overlap on this axis ( i.e. 60% amphetamine and 80% amphetamine). Positioning of the samples on PC 2 can be explained in a similar manner; however, all of the loadings in PC 2 (Figure 4.3b) are in the positive direction, indicating that the sample separat ion is based upon peak heights and intensities. The PC 2 loadings plot looks quite similar to an absorbance spectrum of the mixtures, with peaks corresponding to both amphetamine and caffeine (Figures 3.1 and 3.2, respectively) . Spectra of samples that con tain the majority of these peaks at higher intensities are positioned more positively on PC 2 in the scores plot ( e.g. all replicates of 40% amphetamine). Those samples with spectra displaying these peaks at lower intensities or slightly different peak pat terns are positioned more negatively on PC 2 ( e.g. caffeine and amphetamine samples). The positive positioning of the 40% amphetamine mixtures is due to all replicates of the mixture containing this pattern shown in the PC 2 loadings. Other mixtures with a higher amphetamine content display less intense caffeine peaks, and thus, deviate from the pattern, which accounts for their more negative positioning on PC 2 (Figure 40 4.2b) . Similarly, those samples with higher caffeine content display less intense amphet amine peaks, and also deviate from this pattern, resulting in their negative positioning on PC 2. Figure 4.3. PC 1 and PC 2 loadings plot corresponding to the PCA scores plot of raw spectra. (a) PC 1 loadings plot associated with PCA scores plot of raw spectra. (b) PC 2 loadings plot associated with PCA scores plot of raw spectra. 41 Some mixtures also have replicates that are positioned both negatively and positively on PC 2 ( e.g. 60% amphetam ine). Based on the PC 2 loadings plot, the replicates in these mixtures vary in peak intensities, where spectra containing the characteristic amphetamine and caffeine peaks at higher intensities are positioned more positively on PC 2 and spectra with these peaks at lower intensities are positioned more negatively. This is apparent in the mean - centered spectra of the replicates of 60% amphetamine samples shown in Figure 4.4. Of the three replicate spectra, one spectrum displays less intense peaks than the ot her two replicate spectra, resulting in a negative ly - oriented mean - centered spectrum. The PC 2 score of a sample is determined by summing the multiplicative effects of the PC 2 loadings and the mean - centered spectrum. Because the mean - centered spectrum of the first replicate is negative ly - oriented and the PC 2 loadings is positive ly - oriented , the score for this replicate on PC 2 is negative. From the scores plots, it can be observed that the separation of samples on PC 2 is due to differences in peak height s within replicates as well as peak patterns among samples. While the differentiation of samples based upon peak patterns and peak heights arises from chemical differences, the separation of sample replicates on PC 2 due to peak heights is attributed to no n - sample sources of variance since replicates are chemically the same. 42 Figure 4.4. Mean - centered spectra of replicates of 60% amphetamine samples after baseline correction. After the application of baseline correction, the PCA scores plot (Figure 4.2b ) does not display substantial change from the scores plot based on the raw spectra (Figure 4.2a). The associated PC 1 and 2 loadings plot ( Figure A. 1 ) are also similar to the loadings plots observed for the raw spectra. The average percent change in the c lustering (PCC) of replicates determined for baseline correction was - 3.7%. The negative value indicates improved clustering of the replicates after baseline correction. Despite slight variance reduction with baseline correction, the small magnitude in the percent change supports the conclusion that other sources of non - sample variance are present and that baseline correction alone is not effective at removing all non - sample variance. 4.1.2 Smoothing After baseline correction, noise in the baseline region w as still observed in the spectra (4000 3150 cm - 1 region in Figure 4.1b). A smoothing function was applied in conjunction with baseline correction to assess its efficacy in non - sample variance reduction. Figure 4. 5 displays 43 spectra where the smooth occurr ed on the baseline - corrected spectra, with the associated PCA scores plot. Baseline - corrected and smoothed spectra (example shown in Figure 4. 5 a) display noise reduction in the baseline regions and minimal signal reduction, indicating the removal of non - sa mple variance while maintaining the chemical information. S lightly closer clustering of the replicates was observed in the PCA scores plot (Figure 4. 5 b) after baseline correction and smoothing . However, the PCC value of - 5.1% from the raw spectra indicat es that sloping baselines and noise are not the major sources of non - sample variance. It can be concluded from the small PCC value and the large spread in the clustering of replicates that other non - samples sources of variance of a greater magnitude are pr esent and that further pretreatment beyond baseline correction and smoothing is necessary. Figure 4. 5 . The effect of applying an automated smoothing function that incorporates a Savitzky - Golay smooth as well as block averaging on different regions of t he spectrum. (a) Baseline - corrected and smoothed spectra of three replicates of the caffeine sample. (b) PCA scores plot displaying the samples after baseline correction and smoothing. 44 4.1.3 Standard Normal Variate Normalizatio n A noticeable feature across all spectra was the difference in intensity among the replicates. For example, the first replicate of the caffeine sample had a peak intensity of 0.19 arbitrary units (AU) at 1650 cm - 1 while the second replicate of the same s ample had a peak intensity of 0.28 AU at the same wavenumber. The peak pattern and the peak ratios within the spectrum remained the same, but the overall peak intensities differed; essentially, the area under the curve varied between replicates of the same sample. Standard normal variate (SNV) normalization was applied to each spectrum in order to reduce the differences observed in spectra and the PCA scores plots due to peak area variation. Figure 4. 6 displays the effects of SNV normalization in combinatio n with baseline correction and smoothing. 45 Replicate overlay is greatly improved in spectra after SNV normalization (Figure 4. 6 a) as compared to the overlay observed in raw spectra (Figure 4.1a), indicating that SNV normalization is effective at minimizing this type of non - sample variance. Figure 4.6 . The effects of SNV normalization in conjunction with baseline correction and smoothing. (a) Spectra of replicates of the caffeine sample after baseline correction, smoothing, and SNV normalization. (b) PCA scores plot for spectra after baseline correction, smoothing, and SNV normalization. 46 The effect of SNV normalization can also be observed in the PCA scores plot, where the clustering of replicates is substantially improved (Figure 4.6b) after the series of baseline correction, smoothing, and normalization, especially on PC 2. This indicate s that the dominant source of non - sample variance is instrument response as a result of differences in ATR crystal coverage by the sample, which leads to differences in peak intensities and areas. After reducing non - sample variance, the variance among the samples accounted for by PC 1 is 92.3%, while PC 2 accounts for 6.6%. Of the pretreatments investigated, this series of pretreatment procedures yielded the greatest improvement in the clustering of replicates ( - 91.5%). The differentiation of samples on bo th PCs is clearly displayed in the scores plot (Figure 4.6 b) . Based on the PC 1 loadings plot (Figure 4. 7 a), the samples are separated by amphetamine and caffeine content, similar to the separation on PC 1 observed for the scores plot of raw spectra (Figur e 4.2a). However, PC 1 accounts for more of this variation after SNV normalization 47 (92.3% compared to 69.7%), meaning that the variables associated with PC 1 are more dominant with the reduction in non - sample variance by SNV normalization. The PC 2 loading s plot (Figure 4. 7 b) appears to highlight differences in peak heights among the mixtures. Thus, those samples with more intense peaks across the entire wavenumber region are positioned more positively ( e.g. 40% amphetamine or 60% amphetamine) whereas, thos e samples that do not display this peak pattern are positioned more negatively on PC 2 ( e.g. caffeine and amphetamine). The overall peak patterns of these loadings plot and those of the raw spectra (Figure 4.3) are similar; the major differences are the we ightings on the more dominant peaks ( e.g. amphetamine peaks between 3150 2600 cm - 1 and the caffeine peaks between 1700 1600 cm - 1 ). The weightings are greater for the peaks in the loadings plot corresponding to the normalized spectra for both PCs, indic ating that the contributions of these variables to the positioning of samples on the scores plot are greater than for raw spectra. Also, the replicates of the samples in the scores plot cluster well, and it is apparent that spread along the PC 2 axis for r eplicates is substantially reduced. While the distinction of samples on PC 2 is not necessarily meaningful, the variance highlighted by PC 2 still originates from chemical differences among the samples. 48 Figure 4.7 . Loadings plots for (a) PC 1 and (b) PC 2 associated with PCA scores plot of spectra after applying baseline correction, smoothing, and SNV normalization. It is apparent that sequential application of pretreatments is more effective than applying any single data pretreatment, as the PCC afte r applying a sequence of data pretreatments is greater than the PCC after applying any pretreatment individually (Table 4.1). 49 Table 4.1. PCC values after applying data pretreatments. Data Pretreatment PCC Value (%) Baseline Correction - 3.7 Smoothing - 1.3 Standard Normal Variate Normalization - 89.6 All - 91.5 4.1.4 Multiplicative Scatter Correction Scatter correction was applied to spectra in conjunction with baseline correction and smoothing in order reduce any effects due to light scatter and part icle size differences, which are not inherent to the sample. Multiplicative scatter correction was not applied to SNV normalized spectra as the two pretreatments are complementary (1) . The effects of applying a scatter correction were observed in the spectra and PCA scores plots (Figure 4. 8 ). An example of substantial improvement in replicate overlay is shown in Figure 4. 6 a, where the overlay appears to span both the baseline and peak regions. Taking into account other pretreatment procedures in conjunction with multiplicative scatter correction, visual assessment of spectra indicates comparable performance to SNV normalization (Figure 4. 8 a compared with Figure 4. 6 a). But, from the scores plot, it is evident that the cl ustering of replicates is not as close as that observed in the scores plots where the spectra were pretreated using SNV normalization (PCC of - 86.6% compared to PCC of - 91.5% for SNV normalization) (Figure 4. 8 b compared to Figure 4. 6 b). The non - sample vari ance is not minimized as well in PC 2 using scatter correction as compared to SNV normalization. This is observed in the PC 2 loadings plot (Figure 4. 9 b), where the spectrum characteristics that are highlighted are those that describe replicate variation a nd differences in noise among replicates. While it is evident that non - sample sources of variance are minimized with other pretreatment procedures, scatter correction is not as efficient as SNV 50 normalization. This may be due to the way in which scatter cor rection is performed on the data as compared to SNV normalization. Scatter correction is data set - dependent, whereas SNV normalization is data set - independent. As a result, each spectrum is treated individually by normalization, whereas the degree of scatt er correction to each spectrum is dependent on the data set. Based upon the inherent difference in the way these two procedures are applied, the replicates are more likely to overlay after normalization as opposed to scatter correction. Figure 4. 8 . The effect of multiplicative scatter correction on spectra and PCA scores plot. (a) Spectra of replicates of caffeine after applying baseline correction, smoothing, and multiplicative scatter correction. (b) PCA scores plot of spectra after pretreatment with baseline correction, smoothing, and multiplicative scatter correction. 51 52 Figure 4. 9 . (a) PC 1 and (b) PC 2 loadings plot associated with scores plot of spectra after applying baseline correction, smoothing, and mul tiplicative scatter correction. 4.1.5 Optimal Sequence of Data Pretreatment The optimal set of data pretreatments was determined to be baseline correction, smoothing, and SNV normalization. This sequence offered a PCC of 91.5%, which was the highest PC C 53 value of all pretreatment sequences investigated. The application of this set of pretreatment procedures results in PCs 1 and 2 accounting for a total variance of 98.9%, with both PCs representing variables that are chemically meaningful. Sloping baseli nes and noise in the baseline regions are substantially reduced, and replicate spectra overlay more closely. Thus, all spectral data were baseline - corrected, smoothed, and SNV normalized before being used to develop and test the PCR model. 4.2 PCA PCA u sing the singular value decomposition (SVD) algorithm was performed on pretreated spectra of the amphetamine - caffeine and methamphetamine - caffeine mixtures designated as the training set. The first two PCs account for 94.0% of the total variance (Figure 4. 10 a). Sample replicates clustered closely and distinctly from samples of different concentrations. Samples containing only caffeine are positioned most positively on PC 1, and slightly negatively on PC 2. Samples containing 100% amphetamine and methampheta mine are positioned negatively on PC 1, but differ on PC 2. Amphetamine samples are positioned negatively on PC 2, whereas, methamphetamine samples are positioned positively on PC 2. The positioning of all samples on this scores plot can be explained with the PCs 1 and 2 loadings plots (Figure 4. 10 b and 4. 10 c). The PC 1 loadings plot (Figure 4. 10 b) displays characteristic caffeine peaks ( i.e. 1700 1600 cm - 1 in Figure 3.1 ) that are weighted positively while those that correspond to both amphetamine and me thamphetamine ( i.e. 3150 2 6 00 cm - 1 in Figure 3.2 and 3150 2300 cm - 1 in Figure 3.3 ) are weighted in the negative direction. The peaks in the higher wavenumber region corresponding to methamphetamine (3150 2300 cm - 1 ) are not as prominent as those for amphetamine, but overtones of the two controlled substances can be observed in this region. Based upon the loadings for PC 1, samples with higher 54 amphetamine and methamphetamine content are positioned more negatively on PC 1 , while those with higher caffeine content are positioned more positively. PC 1 separates samples based upon controlled substance and caffeine concentration. The loadings plot for PC 2 (Figure 4. 10 c) shows distinctive methamphetamine peaks in the 3150 2300 cm - 1 region (due to CH stretches) weighted positively while those peaks in the 1200 900 cm - 1 region (as a result of NH bends) which are attributed to amphetamine are weighted negatively. Accordingly, all methamphetamine samples are positioned p ositively on PC 2, with samples containing higher methamphetamine content positioned more positively. In contrast, amphetamine mixtures are positioned negatively on PC 2, with samples containing higher amphetamine concentrations positioned more negatively. PC 2 separates samples based on controlled substance content. Figure 4. 10 . PCA applied to training set. (a) PCA scores plot of training set with both sets of mixtures. (b) PC 1 loadings plot. (c) PC 2 loadings plot. 55 From the scores plot, a larger spread in the amphetamine mixtures as opposed to the methamphetamine samples can be observed on PC 1, especially for the 40% amphetamine samples. This may be due to slight differences in amphetamine peak intensities among the 56 replica tes, where replicates with slightly lower peak intensities for amphetamine are positioned more positively and replicates with lower peak intensities corresponding to caffeine are positioned more negatively. As more amphetamine than methamphetamine characte ristics are highlighted in PC 1 (Figure 4. 10 b), the spread among replicates is more apparent in the amphetamine mixtures. Normalization should have addressed replicate variation as a result of differences in peak intensities; the manifestation of this vari ation in the scores plot indicates that perhaps the data pretreatment procedures applied to the spectra did not fully remove this non - sample source of variance. However, the focus of this research was not on data pretreatments for spectral data; as close c lustering of the replicates are observed in the scores plot for the majority of the samples, and samples of different concentrations are distinguished, the set of data pretreatments applied to the spectra was deemed sufficient for the PCR model. 4.3 PC Se lection for MLR An internal validation, and more specifically leave - one - out cross validation (LOO CV), was performed on the training set to determine the optimal number of PCs for multiple linear regression. The validation was performed on the entire set o f samples, but with separate results for amphetamine and methamphetamine regressions. A total of 9 PCs was selected with which to perform the cross validation; the total variance accounted for by this number of PCs was 99.8%. The optimal number of PCs for the regression model is determined as the total number of PCs for which the standard error of validation (SEV) is not significantly different from the SEV corresponding to the maximum number of PCs selected. Thus, in this cross validation, the optimal numb er of PCs to retain in the model is that in which the corresponding SEV is not significantly different from the SEV for 9 PCs. This method of selecting the appropriate number - 57 The SEV was plotted as a function of the number of PCs included in the model for both amphetamine and methamphetamine (Figure 4.1 1 ). Methamphetamine initially shows greater error than amphetamine (34.8% compared to 24.4%) when only PC 1 is included in the regression. The SEV is quickly reduced with the inclusion of PC 2 to 4.2% and 5.4% for methamphetamine and amphetamine, respective ly. With the addition of subsequent PCs up to 9 PCs, slightly greater error is found with amphetamine mixtures. For example, with 3 PCs in the model, the SEV for amphetamine is 5.2% compared to 3.0% for methamphetamine. Also, the addition of subsequent PCs to the regression does not substantially reduce the SEV for both sets of samples. Eight PCs was determined to be optimal for amphetamine regression because the SEV for including 8 PCs was not significantly different compared to the SEV at 9 PCs ( i.e. 2.9% at both points) at the 95% confidence level . For methamphetamine samples, 9 PCs was determined to be optimal since the inclusion of less than 9 PCs yielded SEV values that were significantly greater than the SEV at 9 PCs (1.9%) at the 95% confidence level . When a total of 10 PCs was selected for cross validation, the optimal number of PCs to include in the methamphetamine regression was still 9 PCs; this result was observed when increasing the maximum number of PCs to 11 PCs. 58 Figure 4.1 1 . The standard error of validation plotted as a function of the number of PCs included in the validation for amphetamine (black) and methamphetamine (red) mixtures. Even though the SEV may reach a local minimum or be statistically equivalent at the 95% confidence level - PC selection may lead to the regression model being overfit. An overfit model is one that is specific to the training set used in the development and cannot be used to accurately predict other datasets. This is because of the inherent nature of PCA, where the PCs attempt to account for all variance observed among the samples. As higher PCs often account for very little variance, the information that is highlighted by these PCs may not be chemically meaningful. This is shown in Figure A. 2 , where the loadings plots for PC 8 ( Figure A. 2a ) and PC 9 ( Figure A. 2b ) highlight replicate and inter - sample variance rather than chemically meaningful variance. - of PC selection is to include variance as a criterion. That is, the optimal number of PCs to retain in the model is the number that describes a user - defined percent of the total variance. In this work, the criterion was the number of PCs that 59 accounted fo r 95% of the total variance , as this percentage of total variance accounts for the majority of variance described in PCA. Based upon the total variance plotted as a function of PCs (Figure 4.1 2 ), 2 PCs account for 94.0% and 3 PCs account for 98.6% of the t otal variance. Thus, 3 PCs were selected as the maximum number of PCs to include in the LOO cross validation. Figure 4.1 2 . Plot of total variance as a function of the number of PCs. Red dashed line indicates 95% variance. Using this method, 2 PCs were optimal for amphetamine (SEV = 5.4%). This is a logical result as there exists only two variables in the training set. The first variable is the difference in controlled substance in the mixtures; methamphetamine mixtures are different from amphetamine mixtures. The second variable is the constraint of varying the diluent caffeine in the amphetamine and methamphetamine mixtures so that the total amount in each mixture was 10 mg. Also, as previously discussed, PC 2 accounts for samples differing in controlled substance content; thus, PC 2 is chemically meaningful and should be included in the regression. 60 With the methamphetamine regression, 3 PCs were determined to be optimal. PC 3 accounts for 4.6% of the total variance. However, in examining the PC 3 loadings plot (Figure 4.1 3 ), the loadings are all weighted negatively. As previously described, this peak pattern indicates that the differentiation of samples on PC 3 is based upon differences in peak heights. It is apparent that these differences affect methamphetamine mixtures more than the amphetamine mixtures due to the characteristic peaks in the 3150 2300 cm - 1 region corresponding to methamphetamine . Similar differentiation of mixtures based on peak heights was observ ed for amphetamine samples on PC 1 (Figure 4. 10 b) . H owever, this variance was greater in the amphetamine samples and thus, affected the samples on PC 1 . Because t his same variance affected methamphetamine to a lesser degree , it is only present in PC 3. To avoid multicollinearity, where subsequent PCs describe variation in the samples that is observed in the earlier PCs, PC 3 was not included in the methamphetamine regression. Thus, regression for both amphetamine and methamphetamine was performed with 2 PCs . 61 Figure 4.1 3 . PC 3 loadings plot associated with pretreated sample mixtures in the training set . 4.4 Model Performance 4.4.1 Multiple Linear Regression Multiple linear regression was performed in a single analysis using only 2 PCs for both of the r egressions; two calibration curves were generated (Figure 4. 1 4 ). The predicted concentrations of the training set are plotted as a function of the known concentrations. Ideally, all samples in the training sets should be localized in 20% increments along t he y = x line, with a slope of 1 and a y - intercept value of 0; however, some deviations from the line do exist. 62 Figure 4.1 4 . Calibration curves generated in (a) amphetamine regression and (b) methamphetamine regression using the training set. The deviations of the data points from the calibration curves that are most noticeable are those on the y - axis. For the amphetamine regression, the caffeine samples and the methamphetamine mixtures range in concentration from - 10% 10%. As these samples d o not 63 contain amphetamine, their concentrations should be 0% amphetamine. However, this range in concentration is observed due to the differences in the spectral pattern among the samples with respect to the pattern observed in the amphetamine regression v ector (Figure 4.1 5 a). The regression vectors determined in the calibrations are shown in Figure 4.1 5 and indicate the magnitude of variable contributions to the calibration curve similar to a loadings plot. For the amphetamine regression, the vector (Figur e 4.1 5 a) displays characteristic methamphetamine peaks (3150 2300 cm - 1 ) weighted negatively, caffeine peaks between 1700 and 1600 cm - 1 also weighted negatively, and peaks characteristic of amphetamine (1200 900 cm - 1 ) weighted positively. As this regres sion was performed to quantify the amount of amphetamine in mixtures, it is logical the peaks characteristic of amphetamine would be weighted positively. A positive weighting in this vector indicates a positive contribution to the overall calibration curve ; variables that are weighted negatively influence the calibration curve so that the samples with these peaks have lower predicted concentrations for that regression. As all methamphetamine mixtures do not contain amphetamine as a component, the known con centration for these samples is 0% amphetamine. Characteristic caffeine peaks are also weighted negatively since the binary mixtures with higher caffeine content have lower amphetamine concentrations. The caffeine and methamphetamine mixtures that have con centration values ranging from - 10% 10% display different intensities of these peaks in their spectra, resulting in the different concentration values along the y - axis. For example, the peak intensities at 3150 2300 cm - 1 are higher for 100% methampheta mine than for 60% methamphetamine or 80% methamphetamine, which results in the 100% methamphetamine having more negative predicted concentrations. As negative concentrations cannot exist, the predicted concentrations of caffeine and methamphetamine should be taken as 0%. 64 Amphetamine peaks in the regression vector are weighted positively since more intense amphetamine peaks indicate a higher concentration of amphetamine. It is unusual however, that the amphetamine peaks in the 3150 2600 cm - 1 region are not contributing to the amphetamine regression; this may be due to the more intense methamphetamine absorptions in this region that overshadow the peaks characteristic of amphetamine. Nevertheless, characteristic peaks of all components in the mixtures are pr esent in the amphetamine regression vector as expected, indicating substantial contribution to the regression of the samples. Figure 4.1 5 . Regression vector s for (a) amphetamine regression and (b) methamphetamine regression. 65 The deviation of predicted concentrations from the calibration curve are also present in the methamphetamine regression (Figure 4.1 4 b), where the measured concentrations of caffeine and the amphetamine mixtures range from - 10% 10% methamphetamine eve n though the mixtures do not contain methamphetamine. The range in predicted concentrations can be explained in a similar manner to the amphetamine regression using the methamphetamine regression vector (Figure 4.1 5 b). The contributions of the variables i n the methamphetamine regression vector are more straight - forward; characteristic methamphetamine peaks in the 3150 2300 cm - 1 region are weighted positively and characteristic amphetamine peaks in the 1200 900 cm - 1 region are weighted negatively. Peaks characteristic of caffeine are not apparent in this regression vector, indicating that the most influential components of this regression are the intensities of the methamphetamine and amphetamine peaks. This is expected, as the pattern observed in this 6 6 r egression vector resembles that in the PC 2 loadings plot (Figure 4. 10 c). The methamphetamine regression was performed using only PCs 1 and 2 . I t is logical that the methamphetamine regression vector would display variable contributions similar to the PC 2 loadings plot, as the variance accounted for by PC 2 allows the sample mixtures to differentiate based upon controlled substance content. Also, the PC 2 scores for caffeine (Figure 4. 10 a) are close to zero, and thus, do not contribute heavily to the varia nce. As such, all caffeine and amphetamine mixtures have predicted concentrations close to 0% methamphetamine. The deviations from the calibration curve are quantified as the root - mean - square error (RMSE). The error associated with the internal validation is described with the root - mean - square error - of - validation (RMSEV), and is equivalent to SEV. A good model is one with a minimized RMSEV. Through LOO CV, the RMSEV for the amphetamine regression using 2 PCs was 5.4% whereas, the methamphetamine regression with 2 PCs yielded a validation error of 4.2%. The lower RMSEV for methamphetamine regression can be explained with reference to the calibration curves in Figure 4.1 4 . It is apparent that the replicates of each sample mixture are clustered more closely fo r the methamphetamine regression (Figure 4. 14b ) than for the amphetamine regression (Figure 4.1 4 a). The spread in the replicates of the sample mixtures are more apparent in amphetamine mixtures, and this is due to PC 1 accounting for differences in the pea k heights of the replicates in amphetamine mixtures (Figure 4. 10 a). See S ection 4.2 for a detailed explanation. Despite the deviations from the amphetamine calibration curve, the correlation coefficient was 0.988, which indicates good linearity between th e known and predicted concentrations. It can be further concluded that the X and Y matrices, that is, the set of spectra and the known concentration values, display good linearity. The correlation coefficient for methamphetamine 67 regression was 0.993, which is not surprising, given that the majority of the data points are well positioned on the calibration curve. 4.4.2 Test Set 1 Model performance was first assessed using Test Set 1 (Table 3.2). T he model was used to predict the concentrations of the contro lled substances in Test Set 1. The PCA scores of the mixtures in Test Set 1 were calculated and projected on the PCA scores plot for the training set (Figure 4. 16 ). Test Set 1 contains mixtures of amphetamine and caffeine and of methamphetamine and caffein e that have the same concentrations by weight as those in the training set. Ideally, the scores of the samples in Test Set 1 should completely overlay with scores of the corresponding samples in the training set. This is observed for the majority of the sa mples (Figure 4.1 6 ) with the exception of all 80% methamphetamine replicates and some of the 40% methamphetamine and 80% amphetamine replicates. The 80% methamphetamine samples in Test Set 1 have PC 1 scores that are more negative than the corresponding sc ores in the training set. Visual comparison of the average of the 80% methamphetamine spectra in Test Set 1 against that of the training set (Figure A. 3) indicates that the test set mixtures have less intense peaks characteristic of caffeine as compared to the corresponding mixtures in the training set, leading to more negative PC 1 scores. This also indicates that the sample preparation process is not readily reproducible; even though the two sets of mixtures have the same composition, differences in spect ral profiles are still apparent, especially using multivariate statistics. 68 Figure 4.1 6 . PCA scores plot of training set (filled in circles) with Test Set 1 plotted. The concentration of amphetamine and methamphetamine in Test Set 1 was predicted using the amphetamine and methamphetamine regression equations (Figures 4.1 7 a and 4.1 7 b), and prediction errors, or RMSEPs, were determined. RMSEPs were determined similarly to RMSEV, except using the known and predicted concentrations of the test set. The RMSE P for Test Set 1 in the amphetamine regression was 3.8%. This value is lower than the validation error for the training set ( 5.4% ) , which is unusual, given that the model should theoretically be able to predict concentrations of samples used in its development more accurately than those of samples in an 69 external validation. However, the predicted concentrations of the samples in Test Set 1 overlay on the training set data points used to generate the calibration curve, so a low prediction error is not surprising. Further, the PRESS contributions for 100% amphetamine and 100% methamphetamine in the training set are 22.2% and 28.4%, resp ectively, which are the highest of all the sample mixtures. This indicates that these samples contribute most to the RMSEV, and therefore, the RMSEV is inflated. As Test Set 1 does not contain 100% amphetamine and 100% methamphetamine, it is reasonable tha t the RMSEP values are lower than the RMSEV for the amphetamine regression. The RMSEP of 3.8% for amphetamine regression indicates good prediction ability of the PCR model. Figure 4.17 . Calibration curves of training set with Test Set 1 plotted for the (a) amphetamine regression and (b) methamphetamine regression . S ee Figure 4.14 for slope, intercept, and R 2 values. 70 The prediction ability of the methamphetamine regression was then assessed using Test Set 1. From the calibration curve (Figure 4.1 7 b), a slight curvature is observed in the predicted concentration of the test set mixtures as a function of measured concentration. The predicted concentrations of the test set mixtures with lower methamphetamine content are l ower than expected, especially those for the 40% methamphetamine, and the predicted concentration of 80% methamphetamine is higher than expected. Given the scores of these mixtures on the PCA scores plot (Figure 4.1 6 ), this trend is not surprising. The clo se clustering of the replicates on the calibration curve indicates good precision despite the slightly higher prediction error of 6.9%. Unlike the amphetamine regression, this RMSEP value is higher than the RMSEV value for the methamphetamine regression (4 .2%), which is expected since the mixtures in Test Set 1 are not 71 the same mixtures used in the training set. As stated previously, this is indicative of the non - reproducible nature of sample preparation. 4.4.3 Test Set 2 The scores of Test Set 2 mixtures (Table 3.3) were calculated and projected on the PCA scores plot for the training set (Figure 4.1 8 ). Since the concentrations of these mixtures are those that were not included in the training set, the scores of these mixtures are expected to lie between t he scores of the training set sample. For example, the scores of the replicates of the 30% amphetamine should be positioned ideally between the scores of 20% amphetamine mixtures and those of 40% amphetamine. This is somewhat reflected in the scores plot, where the 30% amphetamine replicates lie between 20% and 40% amphetamine . H owever, the scores of these replicates are more positive on PC 1 than expected, and lie closer to the 20% amphetamine mixtures with some overlap. A comparison of the average spectru m of the 30% amphetamine replicates and the average spectra of the 20% and 40% amphetamine mixtures (Figure A. 4) was performed. Visual assessment of the spectra indicate s that the peak intensities for those peaks that heavily influence the positioning on PC 1 ( i.e. 3150 2300 cm - 1 and 1200 900 cm - 1 ) are more similar to those of the 20% amphetamine mixtures as opposed to those of the 40% amphetamine replicates. This obse rvation is not surprising as solid sample preparation is not exact nor reproducible. Thus, the 30% amphetamine replicates overlay with some of the 20% amphetamine mixtures in the training set. With the exception of 50% methamphetamine, 10% amphetamine, and 30% amphetamine, the majority of the scores for Test Set 2 mixtures lie in the expected positions on PCs 1 and 2. 72 Figure 4.1 8 . Scores plots of training set (filled in circles) with Test Set 2 plotted. Model performance was then assessed with Test Se t 2 (Figures 4.1 9 a and 4.1 9 b). The amphetamine regression for Test Set 2 does not yield predicted concentrations that are ideal. The positioning of the Test Set 2 sample data points in the calibration curve indicates that the calibration data are curved; h owever, this is not the case (Figure 4.1 9 a). The predicted concentrations of the 10% and 30% amphetamine mixtures in Test Set 2 are lower than expected, while those of the 50% and 70% amphetamine mixtures in the test set are higher than expected. This tren d is consistent with that observed on the PCA scores plot (Figure 4.1 8 ). Despite the 73 slight curvature and deviations from the regression vector, the RMSEP for this test set was 4.6%, which is lower than the RMSEV value for the amphetamine regression (5.4%) , and indicates that the model has good prediction ability. Figure 4.19 . Calibration curve of training set with Test Set 2 for the (a) amphetamine regression and (b) methamphetamine regression . S ee Figure 4.14 for slope, intercept, and R 2 values. 74 The prediction ability of the methamphetamine regression in the PCR model was evaluated. A slight curvature is also observed for the Test Set 2 mixtures plotted on the calibration curve (Figure 4.1 9 b). The samples that deviate from the calibration curve are the 30%, 50%, and 90% methamphetamine mixtures. Using the model, the predicted concentrations for the 30% and 50% methamphetamine mixtures are lower than expected, and the predicted concentration of the 90% methamphet amine samples is higher than expected. This observation is analogous to the trend observed in the PCA scores plot (Figure 4.1 8 ) and can be explained by examining the corresponding spectra. The prediction error associated with Test Set 2 is 5.9%, which is o nly slightly higher than the validation error for methamphetamine regression (4.2%), but is lower than the prediction error in methamphetamine regression for Test Set 1 (6.9%). This is unusual, given that the mixtures in Test Set 2 are those not included i n the training set, as compared to the 75 mixtures in Test Set 1. However, the lower RMSEP value can be explained by the deviations of the Test Set 2 sample mixtures from the calibration curve. The predicted concentrations for two of the mixtures ( i.e. 10% an d 70% methamphetamine) lie on the calibration curve and those of the other mixtures are close to the calibration curve. The higher RMSEP value for methamphetamine regression in Test Set 1 is likely due to the large deviation in predicted concentrations for the 80% methamphetamine. The slightly higher prediction error as compared to validation error is expected, and the low RMSEP for Test Set 2 indicates good prediction ability of the model. After external validation of the model using the two test sets, the model was applied to samples as determined by the amphetamine regression are shown in Table 4. 2 . Table 4. 2 regression. Blind Sample Predicted Concentration (%) Known Concentration (%) 1 A 20 14 B 21 2 A 30 28 B 24 3 A - 5 0 B - 6 RMSEP (%) 5.3 In the amphetamine regression, negative predicted concentration values are again observed. methamphetamine mixtures in the training set, the deviation of the predicted conc entrations of 76 predicted concentration. As concentration values below 0% are physically impossible, these values should be taken as 0% amphetamine. the two samples were correctly identified as containing amphetamine based upon the predicted this sample is a binary mixture of methamphetamine and caffeine. The known amphetamine regression for the two replicates are 20% and 21%, respectively. The averag e percent error determined for this sample is 47%, which is a substantial amount. The large percent error associated with this sample may be due to the difficulty with homogenizing small samples, as the total mass of the sample was only 5.8 mg. On the othe r hand, the average percent error measured amphetamine content in both mixtures are not high ( i.e. 14% and 28% amphetamine, respectively), and the calculation of percent error has a tendency to bias samples with small expected values. To avoid bias from percent error calculations, RMSEP was used as a measure of accuracy in t he regression models. For the amphetamine regression, the prediction error was 5.3%, which is comparable to the validation error of 5.4% associated with this calibration curve, indicating that the model is able to correctly identify and accurately quantify the amount of amphetamine in different samples. 4. 3 77 Samples 1 and 2 did not conta in any methamphetamine, the expected concentration values are 0% methamphetamine; the negative concentration values ( e.g. - 2% methamphetamine) quantified, as the percent error for the two replicates are 3% each. The RMSEP for the set of associated with the methamphetamine calibration curve. The low prediction error may arise from the i the methamphetamine regression. As the majority of the values used to calcul ate RMSEP are regression is small. However, it is apparent that the methamphetamine calibration curve is able to identify the presence of methamphetamine as well a s quantify methamphetamine content with high accuracy. Table 4. 3 methamphetamine regression. Blind Sample Predicted Concentration (%) Known Concentration (%) 1 A - 2 0 B - 2 2 A - 2 0 B - 1 3 A 38 39 B 41 RMSEP (%) 1.7 4.4.5 Summary An internal validation was performed on the PCR model that was developed for predicting the concentrations of amphetamine and methamphetamine in sample mixtures. Validation errors 78 of 5.4% and 4.2% for amphetamine and methamphetamine, respectively, were obtained. Good linearity was observed for both curves, with correlation coefficients of 0.988 and 0.993, respectively. The performance of the model was evaluated using two test sets; one with mixtures at the same concentrations as those used in the training set, and another with concentrations not used in the training set. Based on the prediction errors generated (between 3.8% and 6.9% for both test sets on both curves), the prediction ability of the model was acceptable. The applicability of the model to identify and quantify controlled substance in a single analysis assessed using prediction error. T he low prediction errors for these samples indicate that the PCR model is able to accurately identify and predict the concentrations of amphetamine and methamphetamine in sample mixtures. 79 APPENDIX 80 Figure A. 1. Loadings plots for (a) PC 1 and (b) PC 2 corresponding to the baseline - corrected spectra of the first collection of amphetamine mixtures. 81 Figure A.2. Loadings plots for (a) PC 8 and (b) PC 9 corresponding to the pretreated spectra of the traini ng set mixtures. 82 Figure A. 3. Average spectra of 80% methamphetamine mixtures in training set and in Test Set 1. 83 Figure A. 4. Average spectra of 20% amphetamine and 40% amphetamine from the training set as well as the average spectrum of 30% amphetamine from Test Set 2. 84 REFERENCES 85 R EFERENCES 1. Dhanoa M, Lister S, Sanderson R, Barnes R. The link between multiplicative scatter correction (MSC) and standard normal variate (SNV) transformations of NIR spectra. Journal of Near Infrared Spectroscopy. 1994;2(1):43 - 7. 86 Chapter 5 Conclusions and Future Work 5.1 Conclusions A PCR model was developed to identify and quantify controlled substances in simulated samples with a single analysis. Prior to model development, the optimal set of data pretreatment procedures was determined by visually examining spectra and quantifying the improvement in the cluster of replicates in the PCA scores. After PCA was performed on the training set, the PCs - del performance was evaluated using an internal validation method as well as an external validation with two test sets that had similar controlled substance and caffeine content by quantifying validation and prediction errors. Not surprisingly, the model p erformed well and was able to samples to ascertain its ability to identify and quantify controlled substances in the mixtures. Correct identification and low pred iction errors were observed, indicating good model performance. Conventionally, forensic analysts have used ATR - FTIR for preliminary screening of controlled substances in submitted samples despite its ability to provide definitive identification. Instead, analysts have preferred gas chromatography - mass spectrometry for identification of controlled substances, as both separation and identification of components is achieved in a single analysis. However, it is advantageous for forensic practitioners to utili ze multivariate statistics in conjunction with ATR - FTIR for identification and quantification of multiple controlled substances in a single analysis without separation. 87 5.2 Future Work Despite good model performance, the model was developed using simple binary mixtures; it is necessary to broaden the scope of the model to accurately identify and quantify a wider range of controlled substances in more complex mixtures that are typically observed in a forensic setting. More complex samples, such as tertiar y mixtures that include other common adulterants or mixtures that include multiple controlled substances should be investigated. Furthermore, more controlled substance mixtures should be included in the model, such as mixtures that contain cocaine, heroin, morphine, and so on, to develop a more comprehensive model that will allow identification and quantification for a larger sample set. 88 Part II. Development of Mass Defect Filters for the Classification of Novel Synthetic Designer Drugs Chapter 6 Introd uction Synthetic designer drugs are psychoactive analogs of traditional controlled substances ( e.g. amphetamine, marijuana, etc.). The abuse of these drugs has increased in the U.S. since 2009, with more than 95,000 arrests annually for sale, manufacture, or possession of synthetic drugs (1) . Over 200 synthe tic designer drugs have been encountered by U.S. law enforcement since their emergence (2) , leading forensic laboratories to receive more questioned sam ples that contain them for analysis and identification. Synthetic drugs are widely available commercially, (2) . The primary concern with these drugs is the frequency with which novel synthetic drugs emerge on the illicit market once more established analogs are regulated under the Controlled Substance s Act. A multitude of analogs are ab le to be synthetized quickly and frequently with only minor differences in compound structure (2) . But because of the turnover frequency with the synthe sis and emergence of these drugs, the identification of novel compounds in forensic laboratories is crucial. Current forensic analysis of controlled substances utilize s gas chromatography - mass spectrometry with a single quadrupole mass analyzer (GC - QMS) a nd the technique is considered the gold standard for definitive identification (3) . The chemical information of the compound is obtained in the form of a mass spectrum, which plots the ion abundances as a function of mass - to - charge ( m/z ) ratio, and the patterns of the ions provide unique information from which the structure of the compound can be determined. The ions observed have unit mass resolution, meaning that only nominal mass is obtained. Definitive identification of controlled substances also requires a comparison of the ma ss spectrum of the unknown in a sample to that of a reference standard (3) . Often times, the appropriate reference standard is selected based upon an 89 initial comparison of the mass spectrum of the unknown to spectra in a database, whether it is one generated in - house or a commerci ally available database. The mass spectrum of the reference standard is then generated using the same analytical parameters as that for the unknown and the mass spectra of the two are compared. This is typically sufficient for the analysis of traditional c ontrolled substances, as the mass spectra of traditional controlled substances are well characterized and available in databases. However, due to the turnover rate of novel synthetic drugs, reference standards may not be readily available for comparison, n or would their mass spectra be found in a database, thus necessitating the need for analysts to elucidate the structure of the synthetic designer drug using only the mass spectrum. Complicating the structural elucidation process is the structural similar ity among classes of synthetic designer drugs along with the variety of compounds found within each class. Among the most popular synthetic designer drugs are synthetic cannabinoids, synthetic phenethylamines, and synthetic cathinones (2) . The classes of interest in this research are the latter two compound classes, which have core structures as shown in Figure 6.1. The most distinctive difference between the two classes is the presence of a carbonyl functional group positioned on the beta - carbon to the amine group; the structures are otherwise extremely similar to one another. Furthermore, these compounds can differ in the identity and the position of the substituents (2) , but these differences may not be captured in a chemical profile obtained by GC - QMS, which limits the utility of current methods of an alysis for definitive identification of synthetic designer drugs. 90 (a) (b) Figure 6.1. Core structure of (a) phenethylamine and (b) cathinone with possible substitution sites designated with R n . The challenges of using conventional low - resolution GC - MS for the analysis of synthetic designer drugs are illustrated with Figure 6.2, which display mass spectra of two cathinones obtained using GC - QMS. The two cathinones are - pyrro lidinopropiophenone ( - PPP) and 3 - methyl - - pyrrolidinopropiophenone (3 - methyl PPP). It is evident that the two compounds contain the core cathinone structure, and the only difference between the two structures is the methyl group on the benzene ring for 3 - methyl PPP. As stated previously, this example is common in synthetic designer drugs, where compounds differ only in the identity and the position of the substituents (2) . Despite this difference, the mass spectra are highly similar; neither exhibits a molecular ion (expected at m/z 203 for - PPP and m/z 217 for 3 - methyl PPP), 91 and the two compounds have a base peak at m/z 98. Furthermore, few ions are obs erved in both mass spectra, which limits the chemical information obtained from the spectra. As a result, definitive identification of these compounds using the corresponding mass spectra obtained using GC - QMS is not likely. Figure 6.2. Mass spectrum ob tained using GC - QMS of (a) - pyrrolidinopropiophenone ( - PPP) and of (b) 3 - methyl - - pyrrolidinopropiophenone (3 - methyl PPP). 92 Other analytical methods for the identification of synthetic designer drugs have been investigated for research purposes. In these studies, high - resolution mass spectrometry (HRMS), and in particular, liquid chromatography - mass spectrometry (LC - MS), is use d to provide chemical information necessary for definitive identification (4, 5) . Unlike GC - QMS, HRMS is able to measure the m/z ratio of an ion accurately to the milli - Dalton and sub - milli - Dalton place. In doing so, the exact mass of a compound can be obtained. The exact mass information is then used to determine the chemical formula, from which definitive identification of the compound is possible. However, these studies have used HRMS in combination with other techniques, such as Fourier transform infrared (FTIR) spectroscopy and nuclear magnetic resonance (NMR) spectroscopy, which are techniques that allow for unequivocal identification of compounds. It is appar ent that any one technique is not sufficient for definitive identification of novel synthetic designer drugs. The challenge for forensic analysts in using multiple techniques for structural 93 elucidation is the lack of access to the instrumentation, as well as the time - consuming nature of the elucidation process itself. Therefore, it is advantageous to utilize other tools to prioritize the analysis of synthetic designer drugs in questioned samples. A simple and rapid method for classification to compound cla ss can be considered, as the identification of novel compounds is facilitated once structural class is determined. Classification to structural class is a preliminary step to unequivocal identification of novel synthetic designer drugs. Zuba outlined a cla ssification scheme to distinguish between different classes of synthetic designer drugs based upon mass spectral data obtained via GC - QMS; however, it was concluded in this study that HRMS and other complementary techniques are needed for further analysis (6) . Alternatively, a technique that has been investigated for classification to compound clas s is mass defect filtering. Mass defect refers to the fractional portion of the exact mass. Filtering using mass defect is a screening tool that is applied post - data acquisition to obtain more chemical information, such as compound class. This technique ne cessitates the use of HRMS, as the mass defect is only obtained from exact, or accurate mass. Grabenauer et al. used a combination of HRMS and mass defect filtering in order to classify a group of synthetic cannabinoids in sample mixtures (7) . It was determined that the majority of the compounds in the class contained an in dole core structure and had mass defect values within the range 0.135 0.235 Da. This filter was then input into the data acquisition software and applied to chromatographic and mass spectral data of sample mixtures containing synthetic cannabinoids. Afte r application of the filter, components in the sample mixtures that had mass defect values within the range were observed while components with mass defect values outside of the specified range were removed. Using this technique, successful classification based upon mass defect was possible. 94 Furthermore, co - eluting chromatographic peaks were resolved and additional synthetic cannabinoids previously masked in the analysis were identified with this technique (7) . From this study, it is apparent that mass defect filtering is a powerful tool. However, this technique has not been applied to other classes of synthetic designer drugs, namely the phenethylamine and cathinone classes, which show high structural similarity. Other types of mass defects have been shown to be useful for classification purposes. Kendrick mass defect (KMD) is used to identify compounds in a class that span a wide range of masses. Its original purpose was to characterize compounds with a wide range of masses to a class known as a homologous series (8) . This is not feasible with ma ss defect, as the range of masses in a homologous series can span an order of magnitude or more, meaning that the mass defect values associated with this range of masses are vastly different. KMD can be used since it is characteristic of a series of compou nds differing only in the number of methylene (CH 2 ) groups ( e.g. butane, pentane, hexane, etc.). Hughey et al. utilized KMD to group compounds found in petroleum crude oil into different classes, types, and alkylation series, thus, simplifying the identifi cation of compounds in the mixture (9) . This type of mass defect is potential ly useful for discrimination of phenethylamines and cathinones for classification across a large mass range. A third type of mass defect reported to be useful for classification purposes is relative mass defect (RMD). This scale is based upon mass defect, but is normalized to the exact mass of the compound and rescaled to a parts per million (ppm) range. The RMD value of a compound is indicative of its fractional hydrogen content, as hydrogen contributes substantially to mass defect. A high RMD value ( e.g. 800 ppm) is indicative of a compound with a high hydrogen content, termed hydrogen - rich, whereas, a low RMD value ( e.g. 200 ppm) is indicative of a 95 compound with a low hydrogen content but larger oxygen content, termed oxygen - rich (10) . Using RMD, compounds can be rapidly screened by applying a filter charact eristic of a structural class to identify compounds for further investigation. Stagliano et al. utilized this technique to filter out all compounds not of interest and retain all compounds consistent with a lipid, which was the class of interest, with RMD values that ranged from 600 1000 ppm (10) . With this technique, not only are similar compounds classified together, but the patterns of fragment ions across a m/z range are also able to be utilized to group similar compounds to a class (11) . Thus, this type of mass defect filter is potentially useful for classification of phenethylamines and cathinones, not only using the molecular ions of these compounds, but also including fragment ion informati on. From the three different types of mass defects, it is evident that different chemical information in a compound is probed by each mass defect filter. It may be advantageous to utilize all three types of mass defect filter in a classification scheme. T his would allow better discrimination between two classes of synthetic designer drugs that show high structural similarity, thus facilitating identification of novel synthetic drugs. The objective in this research was to devise methods to overcome the lim itations in the analysis of synthetic designer drugs, and in particular, those in the phenethylamine and cathinone classes. To that end, the first goal was to investigate the utility of high - resolution mass spectral data for forensic practitioners to use f or comparison to chemical profiles of unknown compounds. All reported HRMS analyses in conjunction with mass defect filtering thus far have utilized liquid chromatography - mass spectrometry (LC - MS); however, the instrumentation associated with this analytic al method is typically not available to forensic practitioners. Furthermore, a crucial difference between LC - MS and GC - MS is the ionization method 96 associated with this technique; the soft ionization used in LC - MS produces minimal fragmentation while the ha rd ionization in GC - MS allows for a multitude of fragment ions to be generated in a mass spectrum. As the chemical and structural information provided by the fragment ions are crucial for identification of novel compounds, a technique that incorporates HRM S and the hard ionization needed to generate fragment ions is advantageous. Gas chromatography - mass spectrometry with a time - of - flight mass analyzer (GC - TOFMS) is ideal for this research, as HRMS data are obtained, which provides the accurate mass informat ion necessary for classification via mass defect filtering. Further, the hard ionization method in GC - TOFMS provides the chemical information from fragment ions that is similar to the information obtained with a low - resolution mass spectrometer currently u sed in forensic laboratories. Utility of HRMS was investigated by analyzing phenethylamine and cathinone reference standards using GC - QMS and GC - TOFMS to determine whether HRMS data would be comparable to mass spectra obtained by low - resolution mass spectr ometry. Substantial similarity in mass spectra obtained by these two techniques would indicate the applicability of HRMS in assisting practitioners in the identification of novel synthetic drugs via comparison to mass spectral data generated using low - resolution mass spectrometry. The second goal was to develop mass defect filters using absolute, Kendrick, and relative mass defects to allow for rapid classification of novel drugs to the phenethylamin e or cathinone classes. These filters were developed using sets of phenethylamine and cathinone reference standards. Filter performance was then evaluated using test sets consisting of other phenethylamine and cathinone standards. The success of this rese arch will result in tools that can then be incorporated into a classification scheme and find application in prioritizing the analysis of different questioned samples in forensic laboratories. By incorporating mass defect filters into the classification 97 sc heme, chemical information of unknown compounds can quickly be obtained and a preliminary classification to compound class can be made, which will allow for further resources to be directed towards identification. 98 REFERENCES 99 R EFERENCES 1. FBI. Crime in the U.S. 2013: Estimated Number of Arrests. FBI; 2015 [updated 2015; cited]; Available from: https://www.fbi.gov/about - us/cjis/ucr/crime - in - the - u.s/2013/crime - in - the - u.s. - 2013/tables/table - 29/table_29_estimated_number_of_arrests_united_states_2013.xls . 2. DEA. 2014 National Drug Threat Assessment Summary; 2015 Contract No.: Document Number|. 3. SWGDRUG. SWGDRUG Recommendations Version 7 - 0, Contract No.: Document Number|. 4. Uchiyama N, Shimokawa Y, Kawamura M, Kikura - Hanajiri R, Hakamatsuka T. Chemical analysis of a benzofuran derivative, 2 - (2 - ethylaminopropyl)benzofuran (2 - EAPB), eight synthet ic cannabinoids, five cathinone derivatives, and five other designer drugs newly detected in illegal products. Forensic Toxicol. 2014 2014/08/01;32(2):266 - 81. 5. - dimethoxy - 3,4 - dimethyl - - pheneth ylamine (2C - G) A new designer drug. Drug Testing and Analysis. 2013;5(7):549 - 59. 6. mass spectrometric methods. TrAC Trends in Analytical Chemistry. 2012 2//;32(0):15 - 3 0. 7. Grabenauer M, Krol WL, Wiley JL, Thomas BF. Analysis of Synthetic Cannabinoids Using High - Resolution Mass Spectrometry and Mass Defect Filtering: Implications for Nontargeted Screening of Designer Drugs. Analytical Chemistry. 2012 2012/07/03;84(13): 5574 - 81. 8. Kendrick E. A Mass Scale Based on CH2 = 14.0000 for High Resolution Mass Spectrometry of Organic Compounds. Analytical Chemistry. 1963 1963/12/01;35(13):2146 - 54. 9. Hughey CA, Hendrickson CL, Rodgers RP, Marshall AG, Qian K. Kendrick Mass Def ect - Resolution Broadband Mass Spectra. Analytical Chemistry. 2001 2001/10/01;73(19):4676 - 81. 10. Stagliano MC, DeKeyser JG, Omiecinski CJ, Jones AD. Bioassay - directed fractionation for discovery of bioactive neutral lipids guided by relative mass defect filtering and multiplexed collision - induced dissociation. Rapid Communications in Mass Spectrometry. 2010;24(24):3578 - 84. 11. Ekanayaka EP, Celiz MD, Jones AD. Relative mass defect filtering of mass spectra: a path to discovery of plant specialized metabolites. Plant physiology. 2015;167(4):1221 - 32. 100 Chapter 7 Theory 7.1 Gas Chromatography - Mass Spectrometry (GC - MS) Gas chromatography - mass spectrometry (GC - MS) enables both separation and identif ication of compounds in complex mixtures in a single analysis by coupling a gas chromatograph, which performs the separation, and the mass spectrometer, which is the detector (1) . This technique is widely used in forensic chemistry for the analysis of questioned drug samples, ignitable liquids and fire debris, and other types of evidence where the separation and subsequent identification of compounds in complex mixtures is necessary. 7.1.1 Chromatography The two most important components of chromatography are the mobile and stationary p hases. The physical state of the mobile phase defines the type of chromatography that is used (1) . A gaseous mobile phase is indicative of gas chromatography while liquid chromatography utilizes a liquid mobile phase. Separation of compounds can be performed by introducing the solution of analytes into the mobile phase and allowing the mobile phase to flow across the stationary phase. This is typically done in a column, where the stationary phase is located inside the column, and the mobile phase flows through the column at a certain ra te. As the analytes travel through the column, interactions between the analytes and the stationary phase occur. The extent of these interactions differ among analytes, which leads to physical separation of the analytes in the same sample as they travel th rough the column. 7.1.2 Gas Chromatography In GC, the mobile phase is the inert carrier gas and the stationary phase is a thin film coated on the inside wall of the capillary column. In order for the analytes to be separated, the sample must first be load ed onto the column. 101 Compounds in solution are first introduced into the gas chromatograph (see Figure 7.1) via a syringe into a hot injection port ( e.g. at 250 ° C), where they are quickly volatilized (1) . A steady flow of the carrier gas ( e.g. at a rate of 1 mL/min), such as helium, i nto the injection port moves the gaseous compounds onto the column. The analytes spend a certain amount of time in the column depending on the conditions of the mobile and stationary phase as well as the retention mechanism (1) . Retention of the compounds in the column in GC is based upon the vapor pressure of the analyte as well as partitioning into the stationary phase. Compounds with higher vapor pressure are less retained in the column than those with lower vapor pressures, and can then elute in less time. More interaction between the analytes and the stationary phase lead to longer retention in the column while analytes that do not have strong interactions with the stationary phase are less retained and will elute from the c olumn in a shorter amount of time (1) . The elution of compounds from the column is defined as the time taken to leave the column and reach the detector from when the sample is injected. The retention time and abundance of each analyte can then be plotted in a chromatogram. Figure 7. 1. Schematic of a gas chromatograph attached to a detector. The separation of compounds by gas chromatography can be performed either at a fixed temperature (isothermal) or by using a temperature gradient (temperature programming) (1) . 102 Using the former method, the temperature is cons tant throughout the analysis. A nalytes with high vapor pressures and boiling points below that of the fixed temperature used in the analysis will elute rapidly from the column and the likelihood of co - elution with unretained species is high. On the other h and, analytes with high boiling points are more likely to remain in the column, leading to increased analysis times. With temperature programming, a temperature ramp at a constant rate can be used, and this method is considered to be advantageous for analy tes that span a range of boiling points. It is also a high - throughput method as the analytes elute based on both boiling point and partitioning , allowing for reduced analysis times for a wider range of analytes (1) . 7.1.3 Separation Efficiency The efficiency of the separation can be defined by the resolution between two closest eluting compounds in the chromatogram as well as the overall peak shape of the compound. Optimized separations are those where the compounds of interest are well - separated with narrow Gaussian - like peaks (Figur e 7.2 ) (1) . These conditions c an be influenced by diffusion, mass transfer , and equilibration between the two phases. All of these effects contribute to peak broadening and can affect the separation of compounds that elute close to one another. Figure 7.2. Example chromatogram of a 4 - component mixture, displaying ideal separation ( i.e. Gaussian peak shapes and baseline - resolved peaks) . 103 A thin film of polymer as a stationary phase is typically used in GC. This film is coated on the inside wall of the column, and compounds can diffuse into and out of the stationary phase depending on their affinity for the polymer (1) . Typical polymers that are used as stationary phase s include silic one with various functional groups attached such as methyl groups and phenyl groups. The different functional groups possess different polarities that influence the types of interactions that can occur with the analytes, and subsequently, af fect retention time. For example, a more polar stationary phase ( e.g. polyethylene glycol) will have more interactions with a polar analyte whereas a more nonpolar stationary phase ( e.g. polymethylsiloxane) will have less interactions with the same analyte and the analyte will elute from the column in a shorter amount of time (1) . In order to ensure that all of the analytes are able to reach the stationary phase, especially the compounds that are traveling along the center of the column, the diameter of the column must be small; general ly, capillary columns have internal diameters in the sub - millimeter range. Diffusion can also increase band broadening due to analyte molecules partitioning into and out of the stationary phase at slightly different rates. Ideally, all molecules of the sa me analyte should travel through the column as a narrow band; however, depending on the extent of diffusion for different analytes, the analytes may not travel in a narrow band towards the end of the column (1) . This is due to the large diffusion coefficients for analytes in the gas ph ase; as the analytes travel through the column, longitudinal diffusion occurs, which allow the band of analytes to widen, resulting in broadened peaks. Band broadening due to diffusion can be reduced by using a shorter column so that the extent of longitud inal diffusion is reduced as the analytes travel through the column (1) . 104 Another method to reduce band broadening due to diffusion is to increase the flow rate of the mobile phase. An increase in flow rate results in the reduced retention of analytes in the column, as there is less ti me for the analytes to diffuse and form wider bands. A caveat to increasing the flow rate, however, is the limited mass transfer that occurs with faster mobile phase flow. Mass transfer of analytes between the two phases occurs naturally due to the need fo r the analytes in the two - phase system to reach dynamic equilibrium (1) . However, if the flow rate is increased, the time needed to reach equilibrium is not reached, and therefore, the band of analytes become broadened as they move in and out of the two phases. Resistance to mass trans fer is dependent on the analyte. The analytes most affected by mass transfer are those with a stronger affinity for the stationary phase; as more of these molecules diffuse into and out of the stationary phase, this analyte travels through the column at a slower rate and the molecules are more likely to be spread over a wider range (1) . Band broadening due to resistance to mass transfer can also be compounded by increased film thickness of the stationary phase. Despite being able to increase sample load, thicker films limit the diffusio n rates of the analytes, since the analytes will require more time to leave the stationary phase, which can lead to broader peaks (1) . Therefore, it may be advantageous to use columns with stationary phase films that are thin; common film thicknesses are in the sub - micrometer range. Another point of concern in chromatography is the amount of sample injected into the column (2) . The issue of sample overloading was briefly discussed above in conjunction with film thickness. As the separation of analytes is dependent on the partitioning of the analytes in to the stationary phase, the thickness of the film is a concern. The film contains stationary phase polymers that are available for interactions with analytes. If a large volume of sample is loaded onto the column, the analytes may not be able to partition into the stationary phase film , and as 105 such, some of the analytes may be left to flow through with the mobile phase, resulting in earlier elution out of the column (2) . This phenomenon manifests itself in chromatograms as fronting, where the peak is broadened at the front, leading to an earlier retention time, and rises gradually to the apex (Figure 7.3 ). While thicker films do minimize fronting, peak broadening due to analytes requiring more time to diffuse into and out of the thick films can occur. Therefore, it is necess ary to reduce the amount of sample loaded onto the column; a split injection is typically used to eliminate fronting (2) . This type of injection utilizes a split valve that is located in the injection port. When the syringe injects the sample into the liner in the injection port, a pre - determined volume of sample is moved to the split valve and eventually flows to waste while the other portion of the sample is successfully introduced onto the column. By doing so, the likelihood of sample overload is reduced (2) . Typical split ratios in these injections range from 25:1 to 100:1; the former indicates that 1 part of sample is introduced onto the column while the other 25 parts of sample flow to waste. The latter split ratio indicates that 1 part of the sample is loaded onto the column while the o ther 100 parts of the sample flow to waste. Figure 7.3. Example of a peak showing fronting. 7.1.4 Mass Spectrometry Once the separated analytes elute from the column, they are introduced into a detector through a heated transfer line, which interface s the gas chromatograph and the mass 106 spectrometer. In GC - MS, the detector is a mass spectrometer, from which each analyte can be detected and identified via mass information. Since the mass spectrometer can only analyze ions, the analytes from the gas chro matograph must first be ionized. Ions of each analyte are created in the ion source (Figure 7.4 ) (3) . In the case of electron ionization, which is often the ionization method that is used in GC - MS, ions are created once fast - moving electrons h ave accelerated past the analyte molecules. Th e electrons are continuously generated in the ion source with 70 eV of energy. Some of the kinetic energy of the electrons is then imparted to the analyte, and an electron is ejected, creating a positively - char ged radical. However, the molecule can still have excess energy after ejecting an electron, and this manifests as vibrational energy. With these vibrations in the molecule, the weakest bonds are likely to break in order to reduce instability to the molecul e, leading to the formation of various fragment ions that are more stable (3) . The ions are then introduced to the mass analyzer via a positively - charged repeller located at one end of the ion source and a negatively - charged focusing lens positioned between the ion source and the mass analyzer. Figure 7.4. Schematic of an ion source for electron ionization. 107 The mass analyzer in the mass spectrometer is the region in which the io ns are distinguished based upon mass - to - charge information. The accuracy by which the mass - to - charge ( m/z ) ratio of the ions is detected is dependent on the resolution of the mass analyzer (3) . Discussions on mass resolution and the mechanism by which mass information is obtained by the mass analyzer are detailed in Section 7.1.4.1. Once the mass - to - charge information of ions is known, the ions travel to a detector in order to measure the quantity of each ion at the specific mass - to - charge rat io . Common detectors in mass spectrometers include an electron multiplier and a microchannel plate (MCP). As the name suggests, an electron multiplier detects and quantifies electrons as well as increases , by a fixed number, the electrons that are detecte d (4) . Through a continuous dynode that is connected to a power source, the ions that come through the mass analyzer collide ag ainst the walls of the dynode and each collision results in the emission of a secondary electron. This electron can then go on to collide against the walls of another region of the dynode which generates between 1 and 3 electrons in a process called second ary emission (4) . This process continuously occurs until the electrons that are generated for each ion are on the order of 2 6 . The electrons are passed through a resistor and are then converted into a voltage, and the magnitude of the voltage is indicative of the abundance of each ion. The microchannel plate utilizes a similar mechanism to the electron multiplier; however, there are many smaller dynodes that are located inside each channel on a plate that allow for more ions to be simultaneously detected (5) . A n MCP detector usually contains over a million channels, all of which have diameters in the single micron range and contain dynodes. Secondary electron emission occurs once ions collide against the walls of the dynode and continue to occur until there is approximately 10 6 electrons generated for each ion (6) . Similar to the electron 108 multiplier, the current from the electron flow is converted into a voltage, which corresponds to the abundance of an ion. Due to the high gain associated with the MCP detector, the saturation limit for ion detection is 1 × 10 4 counts (7) . Once the abundance of an ion with a certain mass - to - charge ratio surpasses this limit, the MCP enters into detector dead time, and is not able to accurately count the subsequent ions that reach the detector. This is particularly detrimental in high - resolution mass spectrometers that depend upon time to determine the mass - to - charge ratio of the ions. Further discussion of high - resolution mass spectrometers is detailed in Section 7.1.4.3. Saturation of the detector manifests itself in the mass spectrum with peaks shifted to the left, or to a lower mass - to - charge ratio (7) . When detector dead time occurs, all ions that reach the detector at this time are not counted, re gardless of the mass - to - charge ratio , and the recovery from this state may not be rapid enough to be able to count the ions of a different mass - to - charge ratio , depending on the similarity of the mass - to - charge ratio of the new ion compared to that of the previous ion (7) . This leads to lower mass accuracy, which is not desired in high - resolution mass spectrometers. A technique that is used to overcome detector saturation at 10 4 counts is dynamic range extension (6) . Dynamic range is defined as the range of abundances for which the signal count is linear. The lower limit of the dynamic range is the detection limit and the upper limit is the satu ration limit. A universal detector generally possesses a wide dynamic range in order to analyze samples in which a large range of abundances is present (8) . The electron multiplier has a wide dynamic range that spans approximately 7 orders of magnitude, whereas the dynamic range of the microchannel plate is 4 orders of magnitude. But with dynamic range e xtension, the saturation limit can be increased in order to preserve linearity in signal counting at the higher abundances (6) . 109 This can be done by replacing saturated data wit h unsaturated data that are then corrected with a magnification factor (6) . When dynamic range extension is selected, the instrument automates the scanning so that the analysis parameters alter between those in a normal scan and those for dynamic range extension. The replacement of saturated data only occurs for an ion when the abundance is above the saturation limit in the spectrum. The magnification factor is applied by changin g the potentials at various lenses in the region of the mass spectrometer between the ion source and the mass analyzer (6) . Adjustments in the potential will result in the defoc using o f the ion beam that passes through the lens, which leads to decreased ion intensity. The adjusted potential is related to a magnification factor in the lens (6) . For exam ple, if the magnification factor for dynamic range extension were set to 40, the lens potential would be adjusted by an amount that corresponds to an ion intensity reduction of 40 times. The defocused ion beam would then be sent to the mass analyzer and re ach the MCP detector, resulting in a signal count. This signal count would then be corrected with the magnification factor. Using this technique, the saturation limit of the MCP detector can be extended from 1 × 10 4 counts to 4 × 10 5 counts (6) . 7.1.4.1 Resolution in Mass Spectrometry Resolution in mass spectrometry is characterized by the ability of the mass analyzer to distinguish between ions of similar mass - to - charge ratio (3) . The resolution of a mass spectrometer can be determined by using the following equation (Eq. 7.1) in conjunction with two adjacent peaks in a mass spectrum (Figure 7.5 ): (7.1) where M is the mass of the first peak and m is the difference between the masses of adjacent peaks that are resolved (3) . Highe r resolution values are desirable as they indicate better 110 discrimination between two adjacent peaks. In low - resolution mass spectrometers, the mass analyzer is typically only able to distinguish ions that differ by 1 atomic mass unit or 1 Da. Low - resolutio n mass spectrometers typically have resolution values on the order of 10 2 , whereas high resolution mass spectrometers can have resolution values ranging from 10 3 10 5 . Figure 7.5. Example mass spectrum illustrating mass resolution. 7.1.4.2 Low - Resolut ion Mass Spectrometry An example of a mass analyzer with low or unit resolution is the quadrupole. This mass analyzer consists of four conducting rods that run parallel with the ion path and are arranged in the configuration shown in Figure 7.6 ( 3) . The charges on the top and bottom rods are the same, and they are different from the right and left rods. The rods are connected to an alternating current (AC) power source that produces radio frequency (RF) voltages as well as a direct current (DC) power source (3) . For example, at one moment in time, the top and bottom rods will be positively charged while the right and left rods will be negatively charged. In the next moment, these charges will change so that the right and left rods w ill be positively charged while the top and bottom rods will be negatively charged. This configuration and the alternating charges allow for the ions to take on a circular path as they travel through the rods. Furthermore, the quadrupole can selectively fi lter out ions based on the RF and DC voltages applied . A n ion of 111 a specific mass - to - charge ratio will have a stable trajectory and pass through at particular RF and DC voltages while all other ions will collide with the walls of the rods, be neutralized and pumped away, and not be able to pass through (3) . Therefore, at a specific moment in time with certain RF and DC voltages, only ions at a certain mass - to - charge will be allowed through, and in the next moment, a change in the RF and DC voltages will allow other ions at a particular mass - to - charge to have a stable trajectory. By scanning the RF and DC voltages at a fixed rate, all ions in the scan range can pass through the quadrupole at different moments in time, resulting in the separation of ions based upon mass - to - charge ( m/z ) ratio (3) . A scan range can span over one order of magnitude, depending on the upper and lower limits of RF and DC voltages that can be applied. Typically, the ions created via electron ionization are singly charged; therefore, the ions that are separated by the quadru pole are resolved based upon their mass. Figure 7.6. Schematic of a quadrupole with blue and red lines indicating two possible trajectories of ions at the same moment in time. The red line depicts the path of an ion that is neutralized in a collision wi th one of the rods while the blue line displays the path of an ion with a stable trajectory through the quadrupole and travels to the detector. 112 7.1.4.3 High - Resolution Mass Spectrometry An example of a high - resolution mass analyzer is a time - of - flight m ass analyzer (3) . Unlike the quadrupole, the time - of - flight analyzer is able to resolve ions that differ by 1 mDa, with resolution between 4000 9000. The accurate mass of ions is able to be obtained, with mass error of 10 ppm or less. Ions l eaving the ion source are focused into a narrow band as they travel through a series of lenses that also decelerate the ions. After this, a pusher to which voltage can be applied allows for the acceleration of ions since the applied voltage is transduced i nto kinetic energy, resulting in the acceleration of ions in a field - free region (3) . The amount of kinetic energy applied to all ions is the same, and due to relationships depicted by Eq. 7.2 and 7.3, the speed at which the ions travel in the flight tube is dependent on their mass (Eq. 7.4). (7.2) (7.3) (7.4) where E is kinetic energy, z is the charge of the ion, e is the charge of an electron, V is the applied voltage, m is the mass of the ion, and v is the velocity of the ion (3) . Another important relationship is: (7.5) where d is the distance traveled by the ion or the flight tube length and t is the flight time. Substitution of Eq. 7.5 into Eq. 7.4 will yield (7.6) 113 which relates the m/z ratio of an ion to its flight time (t) (3) . Since mass information is conveyed only t hrough the flight time, the time - of - flight mass analyzer theoretically does not have an upper m/z ratio limit; all ions can be detected regardless of size as long as the analysis time is (3) . This is vastly different than the quadrupole, which is limited by the high voltages that must be applied in order to detect larger ions and allow them through to the detector. In a conventional time - of - flight mass analyzer , the flight path is linear; the ions tra vel as a tight band, or packet, in a linear path from the pusher to the detector and the entire region is field - free. Ideally, all ions of the same m/z ratio should travel at the same rate and have the same flight time. However, there may be differences in the kinetic energy imparted to all the ions, which would affect the speed at which ions with the same m/z ratio travel, and subsequently, the flight time (3) . Differences in the kinetic energy of the ions can arise due to the positioning of t he ions at the pusher (Figure 7.7 ). Ions of the same m/z ratio located at different points at the pusher can have slightly different kinetic energies, where those ions positioned closer to the pusher may have more energy imparted to them. This is not the c ase for those ions positioned farther away from the pusher in the packet of ions. The positioning of ions in a packet may also influence the flight times of ions of the same m/z ratio (3) . The ions closest to the pusher, as opposed to the ion s closest to the entrance of the flight tube, may have a slightly longer flight path. This leads to a longer flight time than for ions that were located closer to the flight tube entrance prior to being accelerated by the pusher. To minimize flight time d ifferences among ions of the same m/z ratio, the flight path can be altered so that the ions can be partially deflected as shown in Figure 7.7 (9) . Deflec tion of the ions is achieved through the presence of a reflectron, which is a region where an electric field is 114 induced. The purpose of the reflectron at one end of the flight tube is to correct for differences in flight time for ions of the same m/z ratio. Those ions that are traveli ng at a higher velocity penetrate more deeply into the reflectron as opposed to ions traveling more slowly (9) . Therefore, when the ions are deflected towards the detector, the ions t hat penetrate more deeply into the reflectron travel a slightly longer distance than ions that originally traveled more slowly, and the result is that these ions travel at the same rate after leaving the reflectron region and reach the detector at the same time (9) . Another method of reducing flight time differences among ions of the same m/z ratio is to place the flight tube orthogonal to the ion path leading from the source (10) . This is also depicted in Figure 7.7 , where the ion path from the ion source creates a 90 ° angle with the flight tube. The orthogonality minimizes the differences in ion path length through the flight tube, thereby reducing positional differences that may result in ions of the same m/z ratio with different flight times (10) . Figure 7.7. Diagram of an orthogonal acceleration - time - o f - flight (oa - TOF) mass analyzer in a high - resolution mass spectrometer. The flight tube is indicated by the dashed lines. The blue 115 path indicates the trajectory of a packet of ions from the ion source to the deflection by both the pusher and the reflectron to the detector. Once the ions of an analyte are detected by the detector and tabulated into a mass spectrum, the accurate masses of the ions can be obtained. A method of quantitatively assessing mass accuracy of these accurate masses is needed. High mass accuracy is defined with low error, which is given in parts per million (ppm) using the following equation (3) : (7.7) where accurate mass is the measured mass of a compound and the exact mass is the theoretical value given the exact masses of the elements that comprise the compound. Thus, the mass accuracy of all ions in high - resolution ma ss spectra are able to be evaluated once their chemical composition is known. 7.2 Mass Defect Mass defect is the difference between the measured mass and the sum of the masses of the components of an element or compound. The mass of an element is defined as the mass of its atom, which includes protons, neutrons, and electrons, using atomic mass units (u). The measured mass of an element is its exact atomic mass while another method to determine the atomic mass is to sum the masses of the protons, neutrons , and electrons found in the element. The difference in mass exists due to the nuclear binding energy that is associated with each element, which is the amount of energy required to break apart the atom into its separate components: protons, neutrons, and electrons (11) . This is illustrated in the following exam ple: (7.8) 116 where represents the most abundant isotope of elemental oxygen, is an electron, is a proton, and is a neutron. The measured mass of oxygen - 16 is 15.9994 u, but the mass derived from the sum of its protons, neutrons, and electrons is 16.1320 u. Therefore, the mass defect associated with oxygen is 0.1326 u. The binding energy is related to mass def equation: (7.9) where is the binding energy, is the mass defect, and is the speed of light. In this equation, the mass defect is represented in kilograms, rather than atomic mass unit. The conversion factor between the two units is: 1 u = 1.6605 × 10 - 27 kg. The atomic mass unit is defined as one - twelfth of the mass of carbon - 12 in kilograms, which results in carbon - 12 having an exact mass of 12.0000 u (11) . The exact mass measurement can only be performed using a high - resolution mass spectrometer. Exact masses of elements can be summed to give the exact mass of a compound with a certain composition . Thus, exact mass measurements are useful in ascertaining the elemental compositions of molecules with a high degree of certainty . This is particularly valuable when encountering unknown compounds. High - resolution mass spectrometers are able to measure exact masses and reso lve those that differ by 10 - 3 u and more. Once the elemental composition and subsequent formula of a compound are determined, the identification of an unknown is facilitated. A lthough the concept of mass defect originates from binding energies and exact m asses of subatomic particles, different types of mass defects have been created using other scales, and these are discussed below. 117 7.2.1 Absolute Mass Defect Similar to the definition of mass defect, the absolute mass defect of an element is the differen ce between the exact mass and the calculated mass from the individual components. However, in this case, the calculated mass is the nominal mass of the element. Nominal mass is commonly used to express the mass of each element or compound, and approximate s the mass of a proton and a neutron to be 1 u (or Da) each, with the mass of an electron assumed to be negligible. Thus, the nominal mass of an element such as oxygen is 16 Da, since it contains 8 protons and 8 neutrons. The exact mass of oxygen - 16 is 15. 9994 Da. Thus, the absolute mass defect of oxygen - 16 is - 0.0006 Da. Whereas all mass defect values calculated using the exact masses of protons and neutrons are positive, negative mass defects exist using the proton and neutron mass approximation in absolu te mass defect. In fact, all elements with atomic number greater than or equal to 8 have negative mass defects while all other elements have positive absolute mass defects with the exception of carbon. The absolute mass defect of carbon - 12 is 0.0000 Da due to its exact mass of 12.0000 Da, as discussed above, which is equivalent to its nominal mass. The absolute mass defects of compounds are calculated as the sums of the absolute mass defects of the elemental components. The largest contributor to absolute mass defect is hydrogen, which has the highest positive absolute mass defect of 0.0078 Da. Therefore, compounds whose mass defects are largely positive are hydrogen - rich. Those molecules with negative mass defects are hydrogen - deficient; in organic molecul es, the compounds are thought to be oxygen - rich, since one of the main contributors to negative mass defect is oxygen. These relationships can hint at elemental composition of molecules, from which chemical information 118 can then be obtained. For example, tw o compounds with nominal mass 175 Da can have slightly different elemental composition (Figure 7.8 ): (a) (b) Figure 7.8. Structures of (a) 4 - APB and (b) 5 - APDI which hav e elemental formulae C 11 H 13 NO and C 12 H 17 N, respectively. The two structures have different exact masses (175.0997 Da and 175.1361 Da, respectively). Their absolute mass defects are 0.0997 Da and 0.1361 Da, respectively. Even if the molecular structures a re unknown, chemical information can still be obtained using absolute mass defect. In this case, the more positive mass defect of 0.1361 Da suggests that 5 - APDI is more hydrogen - rich as compared to 4 - APB, which has an absolute mass defect of 0.0997 Da. The lower mass defect value in 4 - APB can also suggest that the compound is more oxygen - rich. Both are found to be true in a comparison of the elemental formulae, where the presence of an oxygen atom in 4 - APB leads to a smaller mass defect value. In summary, c hemical information can be obtained using absolute mass defect. 7.2.2 Kendrick Mass Defect Kendrick mass defect is the difference between the Kendrick exact mass and the Kendrick nominal mass, and is a specific type of mass defect that is standardized t o the methylene (CH 2 ) functional group (12) . Kendrick exact mass is calculated in the following manner: (7.10) 119 where the exact mass is multiplied by a ratio of the nominal mass of methylene to its exact mass. Kendrick nominal mass is obtained by approximating the Kendrick exact mass to the nearest integer, and the difference between the two values is the Kendrick mass d efect (KMD). This mass defect scale is particularly useful in identifying homologous series of compounds, such as those compounds that differ by CH 2 units (e.g. butane, pentane, hexane, etc.) (13) . Since the KMD is normalized to the methylene unit, all compounds that only differ by the number of CH 2 units will have the same K MD value. This is suitable for identification of compounds in a series that span a wide range of masses (13) . The Kendrick mass scale can also be adjusted so that the masses of the compounds of interest can be normalized to other functional groups, if the methylene group is not as influential in the homologous series. 7.2.3 Relat ive Mass Defect Relative mass defect (RMD) is another mass defect scale that normalizes to a compound; however, unlike Kendrick mass defect, all absolute mass defect values are normalized to the exact mass (14) . RMD is calculated as: (7.11) The RMD of a compound is scal ed by 10 6 in order to obtain an integer value that can be expressed in parts per million (ppm). The range of RMD values can span from 10 2 10 3 ppm, and is indicative of hydrogen richness or deficiency (14) . This concept is similar to that of absolute mass defect; however, only positive RMD values exist. Those compounds with a large RMD are hydrogen rich while those with low RMD values are hydrogen deficient. This mass defect scale is useful in filtering and assigning molecules to various compound classes because compounds in the same class have similar RMD va lues. Also, the relative mass defect scale is independent of the linearity associated with large molecules and mass defect (14) . 120 Absolute mass defect increases with mass due to the large contribution from the hydrogen atoms. This is undesirable in a filter because large molecules may fall outside of an absolut e mass defect filter as a result of having a large mass rather than dissimilar class characteristics. Normalization to exact mass removes this linear relationship; therefore, RMD is advantageous for classification. 121 REFERENCES 122 R EFERENCES 1. Skoog DA, Holler FJ, Crouch SR. Principles of Instrumental Analysis: Thomson Brooks/Cole, 2007. 2. Grob K. Split and Splitless Injection for Quantitative Gas Chromatography: Concepts, Processes, Practical Guidelines, Sources of Error: Wiley, 2008. 3. Watson JT, Sparkman OD. Introduction to Mass Spectrometry: Instrumentation, Applications, and Strategies for Data Interpretation: John Wiley & Sons, 2007. 4. Goodrich GW, Wiley WC. CONTINUOUS CHANNE L ELECTRON MULTIPLIER. Journal Name: Rev Sci Instr; Journal Volume: Vol: 33; Other Information: Orig Receipt Date: 31 - DEC - 63. 1962:Medium: X; Size: Pages: 761 - 2. 5. Wiza JL. Microchannel plate detectors. Nuclear Instruments and Methods. 1979;162(1):587 - 60 1. 6. Waters. Waters Microcrass LCT Premier Mass Spectrometer Operator's Guide. 2004. 7. Dynamic Range Extension. Personal Communication with MSU Analytical Chemistry Faculty Member ed; 2014. 8. Harris DC. Quantitative Chemical Analysis: W. H. Freeman, 2010. 9. Mamyrin B, Karataev V, Shmikk D, Zagulin V. The massreflect ron, a new non - magnetic time - of - flight mass spectrometer with high resolution. Zh Eksp Teor Fiz. 1973;64:82 - 9. 10. Dawson J, Guilhaus M. Orthogonal acceleration time of flight mass spectrometer. Rapid Communications in Mass Spectrometry. 1989;3(5):155 - 9. 11. Wapstra AH, Gove NB. 1971 ATOMIC MASS EVALUATION. PART I. ATOMIC MASS TABLE. 1971. 12. Kendrick E. A Mass Scale Based on CH2 = 14.0000 for High Resolution Mass Spectrometry of Organic Compounds. Analytical Chemistry. 1963 1963/12/01;35(13):2146 - 54. 13. Hughey CA, Hendrickson CL, Rodgers RP, Marshall AG, Qian K. Kendrick Mass Defect - Resolution Broadband Mass Spectra. Analytical Chemistry. 2001 2001/10/01;73(19):4676 - 81. 14. Stagliano MC, DeKeyser JG, Omiecinski CJ, Jones AD. Bioassay - directed fractionation for discovery of bioactive neutral lipids guided by relative mass defect filtering and multiplexed 123 collision - induced dissoci ation. Rapid Communications in Mass Spectrometry. 2010;24(24):3578 - 84. 124 Chapter 8 Materials and Method 8.1 Sample Preparation Eight phenethylamine and eight cathinone standards were purchased from Cayman Chemical Co. (Ann Arbor, MI). Five standards for each class were designated as the training sets and the remaining three standards from each class were the test sets (Figures 8.1 and 8.2). Standards in the training sets were used to develop the mass defect filters. Standards in the tests sets were u sed to evaluate the efficacy of the developed filters. One milligram of each standard was dissolved in one milliliter of methanol (Sigma - Aldrich, St. Louis, MO) for analysis. (a) (b) (c) (d) Figure 8.1. Structures of phenethylamines (a h) used in this research. Common abbreviations for each compound and designation as training or test set are given in parentheses. 125 (e) (f) (g) (h) 126 (a) (b) (c) (d) (e) (f) Figure 8.2. Structures of cathinones (a h) used in this research. Common abbreviations and designation as training or test set are given in parentheses. 127 (g) (h) 8.2 Instrument Parameters The standard solutions were analyzed by both GC - QMS and GC - TOFMS; the training set standard solutions were analyzed in replicate of n = 5 whil e the test set standard solutions were analyzed in triplicate. The GC - QMS consisted of an Agilent 7890A gas chromatograph coupled to an Agilent 5975C single quadrupole mass spectrometer with an Agilent G4513A injector (Agilent Tech., Santa Clara, CA). The column was coated with a (5% phenyl) - 95% methylpolysiloxane stationary phase film with dimensions of 30 m × 0.25 mm × 0.25 µ m ( Agilent J&W DB - 5ms, Agilent Tech., Santa Clara, CA) . The injector temperature was set to 210 o C and a 50:1 split injection was us ed. The injection volume was 1 µ L. The carrier gas was ultra - high purity helium at a nominal 1 mL/min flow rate. The oven temperature program was as follows: 50 o C for 1 min, 15 o C/min to 280 o C with a final hold of 2 min. The transfer line temperature was 280 o C. Electron ionization at 70 eV was used; the source was kept at 180 o C 128 while the mass analyzer was held at 130 o C. The scan range was 35 300 u and the rate was 5.19 scans s - 1 . The GC - TOFMS instrument was a Waters Micromass GCT Premier (Waters, Mi lford, MA), which consists of an Agilent 6890N gas chromatograph coupled to a Waters GCT mass spectrometer with an Agilent 7683B autosampler. Similar instrument parameters and the same column type and stationary phase to the GC - QMS were used. However, ther e were some exceptions. A splitless injection, where the purge flow was 50 mL/min at 1 min, a 1.3 mL/min flow rate, a scan rate of 5.00 scans s - 1 , a low - mass cutoff of 40 Da to reduce transmission of background ions, and dynamic range extension were used. To ensure high mass accuracy, there was a constant infusion of the calibrant, perfluoro - tertbutylamine (PFTBA), during each analysis. Because of the constant infusion of calibrant, the baseline was at a higher intensity than that without the use of calibr ant. Thus, splitless injection on the GC - TOFMS was performed in order to obtain chromatographic peaks that are three orders of magnitude more intense than the baseline. 8.3 Data Processing Low - resolution mass spectral data were obtained after GC - QMS ana lysis by taking a single scan at the apex of the peak in the total ion chromatogram and subtracting this scan by a scan at 17.00 minutes. The scan at 17.00 minutes represented the baseline conditions and contained common background ions such as those at m/z 281, 207, and 73. These originated from the stationary phase and did not contain ions originating from any of the reference standards. All mass spectra were then exported to Microsoft Excel (Microsoft, Albuquerque, NM) for further processing. 129 High - res olution mass spectra were generated by taking a single scan within the peak in the total ion chromatogram and subtracting this scan from a single scan in the baseline region immediately before the peak using MassLynx v.4.1. (Waters, Milford, MA). The singl e scan in the baseline region before the chromatographic peak represented the current baseline condition and contained background ions at m/z 281, 207, and 73, as well as ions from the calibrant at m/z 218, 131, and 69. The baseline region did not contain ions originating from the reference standards. The mass accuracy of the ions in the background - subtracted mass spectra were assessed using the elemental composition algorithm in MassLynx. The algorithm tabulated the accurate masses of these ions against a list of possible elemental formulae with exact masses and an associated mass accuracy in ppm. The number of possible elemental formulae was restricted by a tolerance of 50 ppm in mass accuracy and by the number of carbon, hydrogen, nitrogen, and oxygen ato ms. Acceptable mass spectra were those where the majority of the ions, especially those characteristic of the reference standards, displayed a mass accuracy of 20 ppm. These mass spectra were then exported to Microsoft Excel for processing. Ion threshold values were 1% of the base peak for phenethylamines and 0.5% of the base peak for cathinones. These threshold values were applied to mass spectral data obtained by GC - QMS and GC - TOFMS. The lower threshold value was applied to the mass spectra of cathinone s due to the low number of fragment ions present at an abundance 1% of the base peak. However, a lower threshold value ( e.g. 0.1% of the base peak) was not selected in order to avoid including background ions that were present even after background subtr action. Average mass spectra were generated for each standard by averaging both the mass - to - charge ratios and the relative abundance across all replicates of each standard. Ions not common to all replicates were removed in order to reduce interference from background ions. 130 8.4 Ion Selection for Mass Defect Filters For the development of the mass defect filters, only high - resolution mass spectral data were used. The phenethylamine and cathinone mass defect filters were developed using ions from the mass spe ctra of the respective training set standards. The molecular ion of each standard in the training set was first identified and used to develop specific to molecular ions. Every other ion in the mass spectrum was then considered in the development of fragme nt ion filters. Fragment ion selection for specific mass defect filters and filter development are detailed below. 8.4.1 Absolute Mass Defect Filters For the phenethylamine training set, the absolute mass defect for the molecular ion in each spectrum was calculated by taking the difference between the accurate mass and the nominal mass of the ion. This was done for the five replicates of three of the phenethylamine training set standards, as the compounds 5 - MAPB and 5 - MAPDB did not exhibit molecular ions, and thus, were not included in the molecular ion filter. The average absolute mass defect values for each of the standards was then determined from the replicate values. The filter centroid was established by averaging the absolute mass defect value from e ach standard. The tolerance or filter window was based on the lowest confidence interval to encompass the observed absolute mass defect values, including those of the replicates. As no molecular ions were observed in the high - resolution mass spectra of the cathinone standards, the molecular ion filter was only developed for the phenethylamine class. Fragment ion filters for phenethylamines were developed by selecting the fragment ions that were common to all five standards in the training set. The absolute mass defects of these fragment ions were then determined as described above; the average mass defect for each standard was determined, and the filter centroid was defined as the average mass defect across 131 the five standards. The filter window was defined a s described above. Only one fragment ion was found to be common to all five standards in the phenethylamine training set; the fragment ion filter was developed at m/z 77. Fragment ion filters for cathinones were developed similarly, with filters at m/z 56, 77, and 91. The absolute mass defect values for the molecular ions and the common fragment ions in the average mass spectra of the test sets were calculated as described above. The molecular ion filter was assessed with the phenethylamine test set while the fragment ion filters were evaluated with both the phenethylamine and cathinone test sets based on the number of true positives, false positives, and false negatives. 8.4.2 Kendrick Mass Defect Filters For the phenethylamine training set, Kendrick mass defect (KMD) values were first calculated for the ions in the average mass spectrum for each standard. The molecular ion for each standard was identified. Since the KMD values of compounds in a homologous series are theoretically equivalent, each molecular ion filter was established with compounds that were in a homologous series. The filter window was determined as the lowest confidence interval that encompasses the observed KMD values, including those of replicates. All other KMD values were used in the fragment ion filters for phenethylamines. Since KMD allowed for normalization to the methylene unit, ion selection for a filter was based upon a difference only in the number of CH 2 functional groups. This was determined based on proximity of KMD values and the exact mass of each fragment ion in order to prevent other ions not in a homologous series from being used to develop the filter. Each filter centroid was defined as the average of the KMD values for the set of ions selected. The filter window was defined in a 132 similar manner to that for absolute mass defect filters. Fragment ion filters for cathinones were developed using a similar process as above. For the test sets, only the KMD values from the ions in the average mass spectra for each standard were used. The molecular ion filter was evaluated using the phenethylamine test set and the fragment ion filters for both classes were assessed using both test sets based on the number of true positives, false positives, and false negatives. 8.4.3 Relative Mass Defect Filters and Profiles For the phenethylamine training set, relative mass defect (RMD) values were first calculated for all ions in the average mass spectrum for each standard. Molecular ion RMD va lues were then used to develop the molecular ion filter. The filter centroid and tolerance were defined as discussed above. Rather than fragment ion filters, a fragment ion profile was generated for the phenethylamines using RMD values of fragment ions. T he profile is a plot of the fragment ion RMD values as a function of m/z ratios. Ion selection for the profile was based upon ion abundance for a designated number of ions in a spectrum. The ions in each average mass spectrum were sorted by decreasing ion abundance. Then, the total number of ions for each standard was tabulated. The standard with the least number of ions was 5 - MAPB, with a total of 17 ions. Thus, for the phenethylamine training set, the first 17 ions from each standard were selected. Ion se lection was performed in this manner in order to avoid biasing the profile towards standards that exhibited a larger number of fragment ions. An RMD profile was generated for the cathinones using a similar process. The compound that exhibited the least n umber of ions was 2 - methyl MC, with a total of 14 ions. Thus, the first 14 ions from each standard was selected for the RMD profile. 133 The molecular ion filter was assessed using the phenethylamine test set, again based upon the number of true positives, fal se positives, and false negatives. RMD values were only calculated using the ions in the average mass spectra of the test sets. The patterns in the RMD profiles for both classes generated with the training sets were compared and contrasted. RMD profiles fo r the test sets were then generated using the same number of ions as in the profiles for the training sets ( i.e. 17 ions from each phenethylamine test set standard and 14 ions from each cathinone test set standard). The pattern in the RMD profile of the ph enethylamine test set was compared to the pattern of the RMD profile of the phenethylamine training set; a similar comparison was performed for the cathinone test set to the cathinone training set. 134 Chapter 9 Results and Discussion 9.1 Comparison of GC - QMS and GC - TOFMS Spectra for Phenethylamines The similarity of mass spectra acquired by GC - QMS and GC - TOFMS was first investigated to determine the potential of adapting a classification scheme developed using high - resolution mass spectral data to low - r esolution mass spectrometry. As forensic practitioners perform identifications of synthetic designer drugs using low - resolution mass spectral data, it is advantageous to be able to relate mass spectral data collected by both techniques in the event that no low - resolution mass spectral data for novel compounds are available. Similarity in the mass spectra is indicative of the ability to adapt a classification scheme from GC - TOFMS data to GC - QMS data. Average mass spectra of all s tandards were generated and compared. Most of the phenethylamines exhibited molecular ions with both techniques, with the exception of 5 - MAPB, 5 - MAPDB, and 3,4 - MDPA. Exemplar spectra for the phenethylamine class are discussed , with a comparison of spectra and proposed fragmentation p athways. All other spectra of the phenethylamine reference standards not discussed are displayed in Figure s B . 1 B. 5 . In the mass spectrum of 2C - H (Figure 9.1), all of the characteristic ions observed in the mass spectrum acquired by low - resolution mass spectrometry (Figure 9.1a) were observed in the high - resolution mass spectrum (Figure 9.1b). The compound exhibits a mol ecular ion at m/z 181, a base peak ion at m/z 152, and characteristic ions at m/z 137, 121, 91, and 77. A proposed fragmentation pathway is displayed in Figure 9.2. The base peak ion is the result of a loss of CH 2 NH from the molecular ion on the ami ne portion of the molecule, and a subsequent loss of either of the two methoxy functional groups on the benzene ring results in a fragment ion at m/z 121. This ion can further fragment to give the ion at m/z 91, with chemical formula C 7 H 7 + . The 135 ion at m/z 137 is due to a loss of CH 2 CH 2 NH 2 that is attached to the benzene ring from the molecular ion. From these two fragmentation pathways, subsequent losses of side chains on the benzene ring result in the fragment ion at m/z 77, which has a chemical formula o f C 6 H 5 + . With the exception of the base peak, the ions listed above were more abundant in the high - resolution mass spectrum ( e.g. 21% relative abundance for m/z 121 in TOFMS spectrum in comparison to 15% abundance in QMS spectrum). While the splitless inj ection performed in the GC - TOFMS analysis partially contributes to this observation, another explanation is the higher sensitivity of the GC - TOFMS. Splitless injections result in the higher abundances of all ions since more of the analyte is introduced int o the instrument . H owever, the abundances of all ions in the mass spectra are normalized to the abundance of the base peak, and thus, differences between a splitless and a 50:1 split injection do not affect the relative abundances of the ions greatly. F igure 9.1. Average mass spectrum of 2C - H acquired via (a) GC - QMS and (b) GC - TOFMS . Characteristic ions are labeled with associated mass accuracy for the high - resolution 136 mass spectrum. The ion at m/z 162. 0911 in the TOFMS spectrum corresponds to a fragment ion from a compound that results from interaction between 2C - H and solvent . The mass accuracy of the ions listed above is also displayed in Figure 9.1b. All characteristic ions exhibit high mass accuracy, with error < 20 ppm . While the same pattern of ions is shown in the QMS and TOFMS spectra, the accurate masses of the ions are known in the TOFMS spectrum, which enables assignation of elemental formulae to the ions, providing more confidence to the identification of compounds. 137 Figure 9.2 . Proposed fragmentation pathway for 2C - H. Despite differ ences in abundance, the patterns and the ratios of the ions in the mass spectra are similar between the two techniques for the other reference standards that display a molecular ion ( Figure B. 1 B. 2 ). The only exception to this is the high - resolution mass spectra for 4 - APB and 6 - APB. The low - resolution and high - resolution mass spectra for 4 - APB are shown in Figure 9.3 with a proposed fragmentation pathway displayed in Figure 9.4. The mass spectra and fragmentation pathway corresponding to 6 - APB are shown i n Figure B. 3. The same characteristic ions are observed in the low - and high - resolution mass spectra of 4 - APB; the molecular ion is at m/z 175, the base peak is present at m/z 44, with other fragment ions at m/z 131, 77, and 132. The ions at m/z 44 and 131 are the result of the alpha - beta cleavage of the carbon - carbon bond located two bonds away from the amine group. The ion at m/z 132 arises from the loss of C 2 H 5 N and the ion at m/z 77 corresponds to the benzylic ion with chemical formula C 6 H 5 + that was present in the mass spectrum of 2C - H. In the mass spectrum acquired by 138 GC - TOFMS, the mass accuracy of these ions is high, with error < 10 ppm. The only difference in the two average mass spectra are the ion abundances relative to the base peak. I t is apparent that the abundances of the characteristic ions and the molecular ion are substantially higher in the TOFMS spectrum than in the QMS spectrum ( e.g. 42% abundance for m/z 77 in TOFMS spectrum as compared to 17% in QMS spectrum). The increase in relative ion abundances for all ions is indicative of signal reduction for the base peak ion. An explanation for this observation is that the use of a low - mass cutoff at 40 Da affected ion transmission at m/z 44. The low - mass cutoff at this value was a pplied in order to eliminate background ions originating from the ion source such as water (18 Da), oxygen (32 Da), and nitrogen (28 Da) from being detected in the mass spectra. Due to the proximity of the low - mass cutoff value to the ion at m/z 44, the nu mber of ions at m/z 44 that are transmitted through the mass spectrometer is reduced. As m/z 44 is the base peak ion for this compound, the reduction in the ion transmission affects the relative ion abundances of the other ions in the mass spectrum, but no t the base peak abundance, as the ion abundances are normalized to that of the base peak. Furthermore, the ratios of the characteristic ions remain the same in the two spectra. The ion at m/z 131 is 2.8 times more abundant than the ion at m/z 132 in the QM S spectrum while the ratio of these two ions in the TOFMS spectrum is 2.7:1. The m/z 131/ m/z 77 ratio is 1.6:1 in the low - resolution mass spectrum as opposed to a 1.9:1 ratio in the high - resolution mass spectrum. Thus, the pattern of the characteristic ion s in the mass spectrum is maintained despite the differences in relative ion abundance. The fragment ions in the mass spectra of 6 - APB ( Figure B. 3) are the same as those observed for 4 - APB and the only difference is the m/z 131/ m/z 132 ratio. Whereas the ratio is approximately 3:1 for 4 - APB, the low - resolution mass spectrum of 6 - APB displays a ratio of 139 1.3:1, indicating a higher abundance of the ion at m/z 132. This pattern is conserved in the high - resolution mass spectrum of 6 - APB, where the ratio is 1.2:1. As with the high - resolution mass spectrum of 4 - APB, the TOFMS spectrum of 6 - APB shows higher relative ion abundances of the characteristic ions compare d to the QMS spectrum. This is also due to the use of the low - mass cutoff at 40 Da, which reduces the transmission of the ion at m/z 44, resulting in higher ion abundances for the characteristic ions. Similar to the mass spectra of 4 - APB, however, the rati os of the characteristic ions are still maintained. Figure 9.3. Average mass spectrum of 4 - APB obtained via (a) GC - QMS and (b) GC - TOFMS . Characteristic ions are labeled with associated mass accuracy in the high - resolution mass spectrum. 140 Figure 141 Figure 9.4. Proposed fragmentation pathway for 4 - APB. For the reference standards that do not display a molecular ion, the accurate masses of the fragment ions are especially valuable for structural elucidation. Mass spectral data obtained by GC - QMS and GC - TOFMS for 5 - MAPB are displayed in Figure 9.5, with a proposed fragmentation pathway shown in Figure 9.6. In a comparison of the two mass spectra, it is apparent that the same characteristic ions are observed. The first fragment ion that is observed for 5 - MAPB is at m/z 174, which indicates a loss of the meth yl functional group on the alpha - carbon. The ions at m/z 58 and 131 are the results of the alpha - beta cleavage of the carbon - carbon bond that is located two bonds away from the amine group. Subsequent losses from both pathways lead to the formation of the benzylic ion at m/z 77. 142 Figure 9.5. Average mass spectrum of 5 - MAPB obtained via (a) GC - QMS and (b) GC - TOFMS . Characteristic ions are labeled with associated mass accuracy in the high - resolution mass spectrum. 143 Figure 9.6. Proposed fragmentation pathway for 5 - MAPB. Some differences between the high - resolution and low - resolution mass spectra are observed for 5 - MAPDB and 3,4 - MDPA. In the low - resolution mass spectrum of 5 - MAPDB ( Figure B. 4), a loss of the methyl f unctional group on the alpha - carbon yields the ion at m/z 174, which was observed in the fragmentation of 5 - MAPB . H owever, the first fragment ion observed in the high - resolution mass spectrum is at m/z 134, which corresponds to a loss of - CH 2 CH 2 NHCH 2 , res ulting from hydrogen abstraction of a hydrogen from the methyl functional group and subsequent cleavage of the carbon - carbon bond between the carbon on the benzene ring and the - carbon ( i.e. the carbon atom beta to the amine ). Differences in instrument de sign and most likely path length from the source to the detector between the GC - QMS and GC - TOFMS may lead to differences in ions observed. While the TOF is typically the more sensitive instrument, the ion at m/z 176 may not have as long a mean free path co mpared to the ion at m/z 134, which would lead to the collision and subsequent degradation of the ion prior to reaching the detector. 144 As for 3,4 - MDPA ( Figure B. 5), the first fragment ion observed is due to the loss of the propylamine functional group, and this is the same in spectra obtained by both techniques. Despite these small differences which can be explained by differences in the geometry of the instrument design, similar fragment ions at the lower m/z range are still observed and more chemical infor mation is obtained with HRMS. In the comparison of high - resolution and low - resolution mass spectra from phenethylamine reference standards, it is apparent that the mass spectral data acquired via GC - TOFMS is comparable to the data obtained by GC - QMS in the fragment ions observed and the overall patterns in the mass spectra. This shows promise in the ability to adapt a classification scheme developed using HRMS to low - resolution mass spectra. However, the sensi tivity of the GC - TOFMS is greater as evident with the higher ion abundances. Accurate masses of the ions in the high - resolution mass spectra are also obtained, with error 20 ppm, to which elemental formulae are assigned with high confidence. In short, ma ss spectral data of phenethylamines acquired via GC - TOFMS are useful for the identification of novel synthetic designer drugs. 9.2. Comparison of GC - QMS and GC - TOFMS Spectra for Cathinones While most phenethylamine reference standards investigated in this research displayed a molecular ion in the mass spectra, molecular ions were not observed in the mass spectra of the cathinone reference standards. This was found to be true in all of the low - resolution and high - resolution mass spectra, with the exception of the mass spectrum of 2 - methoxy MC acquired via GC - QMS, where the molecular ion at m/z 193 was observed at an abundance of 0.7% with respect to the base peak. The average mass spectra of this compound obtained by both techniques are displayed in Figure 9 .7, with a proposed fragmentation pathway shown in Figure 9.8. 145 While the molecular ion of 2 - methoxy MC is present in the GC - QMS mass spectrum, more fragment ions and their accurate masses are displayed in the mass spectrum acquired by GC - TOFMS, which prov ides more chemical and structural information that are useful for definitive identification. The ion at m/z 191 is the result of the loss of two hydrogen atoms from the molecule, while the ions at m/z 58 and 135 arise from the alpha - beta cleavage of the ca rbon - carbon bond located two bonds away from the amine group. The m/z 160 ion is the result of a loss of CH 5 O, while the loss of both the oxygen in the carbonyl functional group and the methoxy group results in the ion at m/z 146. The ion at m/z 92 is obt ained with a loss of C 5 H 11 O 2 , and subsequent losses in the majority of these fragment ions result in the benzylic ion at m/z 77. The relative ion abundances of these ions in the TOFMS spectrum are also higher than those observed in the QMS s pectrum, further confirming that GC - TOFMS is a more sensitive technique. Figure 9.7. Average mass spectrum of 2 - methoxy MC obtained via (a) GC - QMS and (b) GC - TOFMS. Characteristic ions are labeled with associated mass accuracy in the high - resolution mass spectrum. 146 147 Figure 9.8. Proposed fragmentation pathway for 2 - methoxy MC. Similar trends to the mass spectra of 2 - methoxy MC were also observed for - PPP (Figure 9.9) and other cathinone reference standards used in this research ( Figure s B. 6 B. 11 ). Both the QMS and the TOFMS spectra of - PPP (Figure s 9.9 a and 9.9b, respectively ) display a base peak ion at m/z 98 and characteristic ions at m/z 56, 77, and 105. The main difference between the two spectra is the presence of ions in the higher m/z range, namely the ions at m/z 172 and 20 1, whereas the majority of fragment ions in the low - resolution mass spectrum are those at the lower m/z range ( i.e. below m/z 100). The ions observed in the TOFMS spectrum also display high mass accuracy, with error 5 ppm. The proposed fragmentation pathway for - PPP is shown in Figure 9.10. The ions at m/z 98 and 105 correspond to the fragment ions resulting from th e - cleavage of the compound, whereas the ion at m/z 56 corresponds to an ion with chemical composition C 3 H 6 N + , and the ion 148 at m/z 77 is the benzylic ion with chemical formula C 6 H 5 + . The ion at m/z 172 only observed in the TOFMS spectrum is due to the losses of carbonyl oxygen and the methyl group on the - carbon, and the ion at m/z 201 results from the loss of two hydrogens, likely from the carbons in the pyrrolidine functional group. The presence of more fragment ions in high - resolution mass spectra a t greater relative abundances with high mass accuracy indicates higher instrument sensitivity and is more useful in spectral interpretation for compound identification. Figure 9.9. Average mass spectrum of - PPP obtained via (a) GC - QMS and (b) GC - TOFMS . Characteristic ions are labeled with associated mass accuracy in the high - resolution mass spectrum. 149 150 Figure 9.10. Proposed fragmentation pathway for - PPP. Overall, the high - resolution mass spectra obtained by GC - TOFMS display similar, if not more, chemical information for the cathinone reference standards investigated, and thus, demonstrate the utility of HRMS in assisting forensic practitioners in the ident ification of novel synthetic designer drugs. 9.3 Absolute Mass Defect The potential of absolute mass defect filters to classify synthetic designer drugs to the phenethylamine and cathinone structural classes was investigated. The molecular ion filter fo r the phenethylamine class is shown below in Table 9.1. The experimental exact mass for the three standards displayed high mass accuracy, with error ranging between 2 and 5 ppm. The standard deviation in the mass defects of the replicates for each standard was observed to be small, indicating high precision in the accurate mass and subsequently, the mass defect measurements. The filter window of ± 35.8 mDa was determined at the 82% confidence level so that the filter 151 was as narrow as possible while still enc ompassing all of the absolute mass defects for all replicates of the three standards. Table 9.1. Molecular ion filter for phenethylamines using absolute mass defect. 4 - APB 2C - P 2C - H Theoretical Exact Mass (Da) 175.0997 223.1572 181.1103 Experimental Exact Mass (Da) 175.0989 223.1568 181.1109 Experimental Mass Defect (mDa)* 98.9 ± 0.8 156.8 ± 0.7 110.9 ± 0.6 Mass Defect Filter (mDa) 122.2 ± 35.8 ** *Average mass defect ± standard deviation (n = 5). **Confidence interval calculated at 82% confidence level (CL) based on n = 3. Evaluation of the molecular ion filter is shown in Figure 9.11, where the phenethylamine standards in the training and test sets are plotted. Though the test set contains three reference standards, the compound 3,4 - MDPA did not exhibit a molecular ion, and thus, was not included in the evaluation of this filter. Successful classification of phenethylamines in the test set was achieved using the molecular ion filter in that the absolute mass defects of these standards were within the filter window. Even so, the filter window is large and similar to the ± 50 mDa tolerance that Grabenauer et al. used in their mass defect filter (1) . A large filter window increases the possibility that compounds belonging to other structural classes would have mass defect values that fall within this filter. For exampl e, a traditional controlled substance such as cocaine would be falsely classified as a phenethylamine with this filter. This is because the exact mass of cocaine is 303.1471 Da, and its absolute mass defect is 147.1 Da, which is a value that falls within t he molecular ion filter. However, the structure of cocaine does not resemble that of a 152 phenethylamine ( Figure B.1 2 ) an d cocaine is not a member of the phenethylamine class. Thus, filters with increased specificity need to be investigated. Figure 9.11. Molecular ion filter for the phenethylamine class using absolute mass defect (82% CL, n = 3). Error bars (smaller than the symbols) represent the standard deviation in the mass defect of the replicates (n = 5) for each standard in training set. The filter was tested with the phenethylamine test set. Fragment ion filters were developed for both classes using absolute mass defect to increase the specificity of the classification as well as to extract chemical information from compounds that do not exhibit m olecular ions. The fragment ions in the filters were common to all the standards in the training set. The ion at m/z 77 was common among all five phenethylamine standards. This fragment ion is the benzylic ion with chemical formula C 6 H 5 + . A filter was deve loped at 39.5 ± 2.2 mDa (99.9% confidence level, n = 5) (Table 9.2). The filter window for fragment ions is substantially smaller than that for a molecular ion filter (2.2 mDa compared to 35.8 mDa). This is because the 153 filter is specific to this ion and it s development is based only on the absolute mass defect value of m/z 77 rather than a range of mass defects as is the case for the molecular ion filter. The possibility of false positives is reduced with a small filter window; however, the possibility of f alse negatives increases. Table 9.2. Fragment ion filter at m/z 77 for phenethylamines using absolute mass defect. 4 - APB 5 - MAPB 5 - MAPDB 2C - P 2C - H Theoretical Exact Mass (Da) 77.0391 Experimental Exact Mass (Da) 77.0397 77.0390 77.0387 77.0400 77.0400 Experimental Mass Defect (mDa)* 39.7 ± 0.4 39.0 ± 0.7 38.7 ± 0.6 40.0 ± 0.6 40.0 ± 0.3 Mass Defect Filter (mDa) 39.5 ± 2.2 ** *Average mass defect ± standard deviation (n = 5). **Confidence interval calculated at 99.9% confidence level (CL) based on n = 5. The m/z 77 fragment ion filter was first assessed with the phenethylamine test set (Figure 9.12). Successful classification was achieved, as the three phenethylamine standards in the test set had absolute mass defect values at m/z 77 that were within the filter, and no false positives or negatives were observed. The absolute mass defect values at m/z 77 were the same for two of the standards, 6 - APB and 3,4 - MDPA, and thus, the two corresponding data points are overlaid on the figure . The m/z 77 fragment ion filter was then assessed using the cathinone test set (Figure 9.12). Only two of the cathinone test set standards, mephedrone and 2 - methoxy MC, exhibited an ion at m/z 77. The absolute mass defect values of the fragment ions at m /z 77 for the two cathinone standards were within the filter, indicating that the two standards would be classified as phenethylamines using this filter alone. Clearly, this filter is not discriminatory between the two 154 classes. However, this is expected, a s the benzylic ion is observed in the core structures of both phenethylamine and cathinone, and is common to aromatic compounds. Despite the benzylic ion being common to aromatic compounds, the cathinone test set standard, pyrovalerone, did not exhibit th is ion. A possible explanation for the absence of this ion is that the formation of the m/z 77 ion is not favored given the structure of the compound. It is likely that the methyl substitu ent on the benzene ring that is located para relative to the carbony l group leads to the formation of the m/z 91 ion rather than the m/z 77 ion, as the difference between the two ions is a CH 2 group. Given the abundance of the m/z 91 ion in the mass spectrum of pyrovalerone (18%), this explanation is reasonable. More detai l on the m/z 91 ion is provided below. Figure 9.12. Fragment ion filter for phenethylamines at m/z 77 using absolute mass defect (99.9% CL for n = 5). Error bars represent the standard deviation in the mass defect of the replicates (n = 5) for each stan dard in training set. The filter was tested with the phenethylamine and cathinone test sets. 155 Fragment ion filters for the cathinone class were developed for ions at m/z 56, 77, and 91. The ion at m/z 56 has chemical formula C 3 H 6 N + , while the ion at m/z 91 is the tropylium ion with chemical formula C 7 H 7 + . The ion at m/z 77 is the benzylic ion that was common to the phenethylamine standards, discussed previously. As this ion is common to both classes, this fragment ion filter developed for the cathinones is not discussed. However, that is not to say that the m/z 77 filter is not useful in any way; the presence of an ion at m/z 77 that falls within this absolute mass defect filter is indicative of an aromatic compound. This filter may be used as an initia l filter for novel compounds to determine its aromaticity; it is merely not discriminatory between the two compound classes of interest. The cathinone fragment ion filter developed for m/z 56 is centered at 49.8 ± 1.9 mDa (99.99% CL, n = 5) (Figure 9.13). This fragment ion is from the aliphatic portion of the compound that typically arises from alpha - beta cleavage of the carbon - carbon bond that is located two bonds away from the amine. A figure of this fragment ion is shown in Figure B.1 3 . Successful class ification of the cathinone test set was achieved, as the absolute mass defect values of the m/z 56 ions for the three standards are well within the filter window. The m/z 56 fragment ion filter was then assessed with the m/z 56 ions for the phenethylamine test set. Of the three standards, 2C - D did not exhibit a fragment ion at m/z 56, while the ion was present in 6 - APB and 3,4 - MDPA. The mass defect values for the m/z 56 ion in 6 - APB and 3,4 - MDPA were not within the filter window. At first, this appears to indicate that the filter is actually discriminatory , especially in light of the fact that the ion at m/z 56 is not exclusive to C 3 H 6 N + . An ion with chemical composition of C 4 H 8 + also exists with a nominal mass of 56 Da, but with exact mass of 56. 0626 Da. H owever, given the structure of the two test set compounds, the identity of the m/z 56 ion is likely the ion with the former chemical composition as opposed 156 to the latter. Furthermore, both ions exhibited poor mass accuracy (43 ppm for 3,4 - MDPA and 45 ppm for 6 - APB). The source of the poor mass accuracy may be the result of instrument drift; the exact mass was observed to be higher than expected, meaning that the flight time of these ions was longer. As described in Section 7.1.4.3, a multi tude of possibilities exist for flight time deviations, including positional and kinetic energy differences. The poor mass accuracy observed in these ions may be attributed to these differences, and thus, the m/z 56 filter does not provide discrimination b etween the phenethylamine and cathinone classes. Figure 9.13. Fragment ion filter at m/z 56 using absolute mass defect for cathinones (99.99% CL for n = 5). Error bars represent the standard deviation in the mass defect of the replicates (n = 5) for eac h standard in training set. The filter was tested with the phenethylamine and cathinone test sets. The cathinone fragment ion filter developed for m/z 91 was centered at 55.0 ± 2.4 mDa (99.998% CL for n = 5) (Figure 9.14). This fragment ion is the result of a rearrangement of the methylated benzylic ion to form a 7 - member ring ( Figure B.1 3 ). Successful classification was 157 achieved for the cathinone test set; all absolute mass defect values of the m/z 91 ion for the test set were within 0.3 mDa of the filter centroid, indicating high mass accuracy. The filter was then assessed using the phenethylamine test set. Only two compounds displayed an ion at m/z 91: 6 - APB and 2C - D. Similar to the ion filter at m/z 77, the mass defect values for the opposing test set were within the m/z 91 filter, which indicates that the filter is not useful for discrimination between the two classes. As the ion at m/z 91 is found in the core structures corresponding to both phenethylamines and cathinones, the absolute mass defect fil ter for this ion does not provide useful information towards the classification of a synthetic drug to either class. Figure 9.14. Fragment ion filter at m/z 91 developed for the cathinones (99.998% CL, n = 5). Error bars represent the standard deviation in the mass defect of the replicates (n = 5) for each standard in training set. The filter was tested with the phenethylamine and cathinone test sets. Despite the presence of this ion in the core structures, m/z 91 was not identified as a common fragment ion for the phenethylamine class due to its absence in one of the phenethylamine 158 standards; only 4 of the 5 phenethylamine training set standards displayed this ion. An explanation for this observation is the difference in substituents to the core structu re, where the substituent identity and position dictates the presence and abundance of the m/z 91 ion. For example, 5 - MAPB does not exhibit the m/z 91 ion while 5 - MAPDB does. In fact, ions at both m/z 77 and 91 are present in 5 - MAPDB while only the m/z 77 ion is present in the mass spectrum of 5 - MAPB. The difference between the two phenethylamines is the presence of the double bond located in the furan portion of the two structures. In 5 - MAPB, the double bond is present while it is absent in 5 - MAPDB. The pr esence of this double bond results in the absence of the m/z 91 ion. This is likely due to the inability of the compound to form the tropylium ion after cleavage of this bond. Prior to rearrangement, the m/z 91 ion is that of a benzene ring with a methylene functional group. But, because only one hydrogen atom is located on each of the two carbons participating in the double bond, cleavage of this bond will not allow for two hydrogen atoms to be present on th e carbon atom to form the methylene group. Thus, the identity of the substituent in this example dictates the formation of characteristic fragment ions. In summary, a molecular ion and a fragment ion filter at m/z 77 were developed for the phenethylamine class while fragment ion filters at m/z 56, 77, and 91 were developed for the cathinone class based on absolute mass defect. Successful classification was achieved with both of the filters for the phenethylamines with the phenethylamine test set. However, a wide filter window ( ± 35.8 mDa) was observed in the molecular ion filter and the filter for m/z 77 is not discriminatory. Of the standards investigated, no molecular ions were observed for cathinones, suggesting that this feature may be a point of discrim ination for the two classes. Despite successful 159 classification of the cathinone test set for the three fragment ion filters, these filters are not discriminatory. It is evident that the use of absolute mass defect filters alone is not feasible for classif ication to the phenethylamine and cathinone classes. However, these filters can be used at the early stages of a classification scheme to filter out compounds that belong to structural classes other than phenethylamines and cathinones. The wide window for the molecular ion is optimal for an initial screening to retain more compounds for further classification, thus reducing the possibility of missing potential phenethylamines. The common fragment ion filters for m/z 77 and m/z 91 can then be applied to dete rmine the aromaticity of the screened compounds and the fragment ion filter at m/z 56 to determine the presence of the nitrogen - containing aliphatic chain C 3 H 6 N + in the interrogated compounds. Compounds with absolute mass defect values that fall within the se filters are more likely to have structures similar to phenethylamines and cathinones. An alternative to absolute mass defect filters is also necessary, as a limitation exists for absolute mass defect. A positive correlation exists between m /z and mass defect (Figure 9.15), which is a dis advantag e for mass defect filtering . This correlation is due to the large contribution of hydrogen to absolute mass defect (7.8 mDa). For synthetic drugs with high molecular masses, the hydrogen content is substantial. S ince hydrogen atoms are the most influential in positive mass defect, the larger hydrogen content of the compound is reflected in the increase in absolute mass defect. This positive correlation is most likely to affect molecular ion filters, since those co mpounds with masses outside the investigated range may have mass defect values that do not fall within the filter, but are known to belong to the same structural class ( i.e. false negatives). Other types of mass defect filters need to be investigated for t heir potential to overcome this limitation in absolute mass defect. 160 Figure 9.15. Absolute mass defect values of synthetic designer drugs plotted as a function of their exact mass. 9.4 Kendrick Mass Defect Kendrick mass defect (KMD) filters were investigated for their potential to overcome the limitations and non - specificity observed with the absolute mass defect filters. KMD is extremely useful at identifying homologous series since members in a series have the same KMD. Not all compounds in a cl ass will differ only by methylene (CH 2 ) units. Therefore, the possibility of multiple KMD filters for the same class exist. For example, of the 5 phenethylamines in the training set, two homologous series can be identified. The first homologous series is t hat of the 2C - phenethylamines, which includes the compounds 2C - P and 2C - H (members of the 2C - phenethylamines), since their chemical formulae differ by C 3 H 6 , or 3 methylene groups. The second homologous series includes the compounds 4 - APB and 5 - MAPB, and de spite the different position of the furan functional group, the difference in chemical formulae is CH 2 . 5 - MAPDB, on the other hand, does not fit into either of the two homologous series, since the 161 compound is not a 2C - phenethylamine, nor is the compound pu rely an aminopropylbenzofuran (APB) since it contains two more hydrogen atoms than 5 - MAPB. Two KMD molecular ion filters for phenethylamines are expected based upon the homologous series present. However, no molecular ion was present in the mass spectrum of 5 - MAPB, and therefore, the filter was not developed for the homologous series containing 4 - APB and 5 - MAPB. More reference standards in this series would need to be analyzed in order to develop this KMD molecular ion filter. The molecular ion filter that was developed was for the 2C - phenethylamines at 91.8 ± 1.5 mDa (78% CL, n = 2) (Table 9.3) . Unlike the molecular ion filter developed using absolute mass defect, this filter window is extremely narrow. This is expected since the KMD values that comprise the filter are theoretically equivalent, and the filter centroid is not the average of value s that span a wide mass defect range, as was observed for the molecular ion filter using absolute mass defect. Table 9.3. Molecular ion filter for phenethylamines using Kendrick mass defect. 4 - APB 2C - P 2C - H Theoretical KMD (mDa) 95.8 92.0 92.0 Experime ntal KMD (mDa)* 96.6 ± 0.8 92.4 ± 0.7 91.3 ± 0.6 KMD Filter (mDa) 91.8 ± 1.5 ** *Average mass defect ± standard deviation (n = 5). **Confidence interval calculated at 78% confidence level (CL) based on n = 2. The efficacy of the 2C - phenethylamine filter was then tested with the phenethylamine test set (Figure 9.16). Only two standards (2C - D and 6 - APB) were used in the test set as 3,4 - MDPA did not exhibit a molecular ion. As 6 - APB is a positional isomer of 4 - APB and is not in the 2C - phenethylamine h omologous series while 2C - D is a 2C - phenethylamine, it is expected that only 162 the KMD of 2C - D will fall within the filter. Successful classification of 2C - D was observed; the KMD value of 6 - APB was not within the filter. Figure 9.16. Molecular ion filter for 2C - phenethylamines using KMD (78% CL for n = 2). Error bars represent the standard deviation in the mass defect of the replicates (n = 5) for each standard in training set. The filter was tested with the phenethylamine test set. It is evident that in creased specificity can be achieved with KMD molecular ion filters; these filters can be used to further subclassify within the phenethylamine structural class. This is extremely useful in a classification scheme; more structural information of a compound can be obtained with filters that can indicate subclass. The narrow window in the KMD molecular ion filter is also advantageous since this reduces the possibility of false positives that are likely to occur with a wide window, as observed in the molecular ion filter developed using absolute mass defect. Fragment ion filters based on Kendrick mass defect were then investigated for the phenethylamine and cathinone classes. Seven fragment ion filters were developed for 163 phenethylamines (Table 9.4). Representat ive filters for the class, including Filters 2, 4, and 6, are discussed. Filters 1 and 3 ( Figure s B. 14 and B. 15 ) are similar to Filter 2, Filter 5 ( Figure B.1 6 ) is similar to Filter 4, and Filter 7 ( Figure B.1 7 ) is similar to Filter 6. Table 9.4. List of the ions included in each fragment ion filter for the phenethylamine class using Kendrick mass defect. Filter m/z R ange Chemical Formula Range 1 51 79 C 4 H 3 + C 6 H 7 + 2 63 105 C 5 H 3 + C 8 H 9 + 3 62 104 C 5 H 2 + C 8 H 8 + 4 75 131 C 6 H 3 + C 10 H 11 + 5 87 129 C 7 H 3 + C 10 H 9 + 6 138 194 C 8 H 10 O 2 + C 12 H 18 O 2 + 7 137 193 C 8 H 9 O 2 + C 12 H 17 O 2 + Filter 2 ranges from C 5 H 3 + to C 8 H 9 + and includes the ions C 6 H 5 + and C 7 H 7 + . The m/z range that the filter spans is from 63 105 Da. Four of the five phenethylamine training set standards exhibited the m/z 63 ion; this ion was not observed for 2C - P. As expected, the m/z 77 ion was present in all five phenethylamine standards, and as stated pre viously, only 5 - MAPB did not exhibit the m/ z 91 ion. The m/z 105 ion was present in all standards with the exception of 5 - MAPB and 2C - H. Based upon the chemical formulae of the ions included, it is apparent that fragment ion Filter 2 targets ions that have aromatic characteristics with a small aliphatic component ( i.e. the CH 2 group in m/z 91 and the C 2 H 4 group in m/z 105). The filter is centered at 46.5 ± 2.5 mDa (99.9999998% CL, n = 16) (Figure 9.17). Of the fragment ions filters developed for pheneth ylamines, this filter included the most ions in development. An extremely high confidence level was applied in order to encompass the KMD values of two of the ions used in filter development: the m/z 91 ion from 2C - H and the m/z 105 ion from 2C - P. The mass accuracy 164 of the two ions was lower than expected, at 23 ppm and 28 ppm, respectively whereas the mass accuracy of the other ions in the filter ranged between 1 and 11 ppm. Figure 9.17. Fragment ion Filter 2 for phenethylamines using KMD (99.9999998% C L, n = 16), with test sets from both classes plotted. Filter 2 was first evaluated with the phenethylamine test set. From the three test set standards, 10 ions were identified as having KMD values within the filter range. Successful classification was ac hieved for all ions in the phenethylamine test set that exhibited the fragment ions at m/z 63, 77, 91, and 105. No false positives or negatives were observed for the phenethylamine test set. Both 6 - APB and 2C - D exhibited the four fragment ions, but only io ns at m/z 77 and 105 were observed in 3,4 - MDPA. Despite the absence of the ions at m/z 63 and 91, 3,4 - MDPA would still be classified as a phenethylamine using this filter. Filter 2 was then assessed with the cathinone test set. Nine ions from three cathin one test set standards were identified as having KMD values within the filter range. 2 - M ethoxy MC 165 exhibited ions at m/z 63, 77, and 91 that were within the filter window. The ions at m/z 91 and 105 for pyrovalerone were also within the filter window, and a ll four ions exhibited by mephedrone had KMD values in that range. However, the m/z 105 ion for 2 - methoxy MC and the m/z 63 ion for pyrovalerone had KMD values that were outside the filter window, and more specifically, KMD values above the upper limit of the filter window. These are false negatives, as the ions have chemical formulae that correspond to the ions in the filter. An explanation for this occurrence is the low mass accuracy for these ions; errors of 41 and 36 ppm for the m/z 63 and m/z 105 ions, respectively, are observed. As the filter window expands only to encompass ions with maximum error of 28 ppm ( m/z 105 ion for 2C - P), it is not surprising that the two ions at m/z 63 and 105 from the cathinone test set are outside the filter. Despite suc cessful classification of all ions in the phenethylamine test set, this filter is non - discriminatory between the two classes , since the cathinone test set contained ions that fell within this filter. This filter contains ions at m/z 77 and 91, which, as discussed previously, are ions that are common to both classes and indicate aromaticity of compounds. Fragment ion Filter 4 was developed for the phenethylamine class with ions ranging in chemical formulae from C 6 H 3 + to C 10 H 11 + , whic h spans one of the largest m/z range s in the phenethylamine fragment ion filters (75 131 Da). The filter includes ions with chemical formulae C 7 H 5 + , C 8 H 7 + , and C 9 H 9 + , with structures that contain 5 - or 6 - member rings with a small aliphatic component. 4 - A PB exhibits the ions at m/z 75, 89, and 103 while 5 - MAPB displays only ions at m/z 89 and 103. The fragment ions at the higher m/z range ( i.e. m/z 103, 117, and 131) are present in 2C - P , whereas only the m/z 89 ion is observed in 2C - H. Finally, the ions at m/z 103 and 117 are present in 5 - MAPDB. This filter is centered at 60.0 ± 1.9 mDa (99.995% CL, n = 11) (Figure 9.18). A high confidence level was applied in order to encompass 166 the m/z 117 ion for 5 - MAPDB, which displayed a mass accuracy of 14 ppm. While t his value is acceptable from the analysis, the mass accuracy displayed by the other ions is < 10 ppm, and thus, the error in this m/z 117 ion is higher in comparison. Filter 4 was first evaluated with the phenethylamine test set. Six ions from the three s tandards were identified as having KMD values that were within the filter; 6 - APB exhibited ions at m/z 75, 89, and 103, 2C - D exhibited ions at m/z 89 and 103, and 3,4 - MDPA only exhibited the m/z 103 ion. Successful classification of these ions in the phenethylamine test set was achieved, and no false positives or negatives were observed. The fragment ion filter was then assessed with the cathinone test set. Five ions from two of the test set sta ndards were identified as having KMD values that were within this filter. The ions at m/z 75, 89, and 103 from 2 - methoxy MC and the m/z 89 and 117 ions from mephedrone had KMD values that were within the filter. One false negative was identified in the cat hinone test set, and this wa s the m/z 89 ion in pyrovalerone. Given the chemical formula corresponding to this ion, it is expected to have a KMD value within this filter; however, the associated mass accuracy is 20 ppm, which is high in comparison to the m ass accuracy displayed for the phenethylamine training set standards (maximum error = 14 ppm for m/z 117 ion from 5 - MAPDB). While the ions in the phenethylamine test set were within the filter, the majority of cathinone ions found to be within this filter . This indicate s the non - discriminatory nature of this filter for classification to either class. 167 Figure 9.18. Fragment ion Filter 4 for phenethylamines using KMD (99.995% CL, n = 11), with test sets from both classes plotted. Thus far, the filters d eveloped for the phenethylamine class only included ions that contain carbon and hydrogen atoms. Filter 6 was developed for ions that contain not only carbon and hydrogen atoms, but also two oxygen atoms. Fragment ion Filter 6 contains ions with chemical f ormulae that range from C 8 H 10 O 2 + to C 12 H 18 O 2 + , including C 9 H 12 O 2 + , C 10 H 14 O 2 + , and C 11 H 16 O 2 + . While the two filters previously discussed were developed for ions in the lower to middle m/z range , this filter spans a higher m/z range , from 138 to 194 Da. The compound 2C - H exhibits ions at m/z 138 and 152 while the ions at m/z 166, 180, and 194 are present in 2C - P. Not surprisingly, only 2C - P and 2C - H exhibit ions in this filter, as these are the only training set standards that have t wo oxygen atoms. It is apparent that this filter is specific to the 2C - phenethylamines. This fragment ion filter is centered at 86.9 ± 2.6 mDa (98% CL, n = 5) (Figure 9.19). 168 Figure 9.19. Fragment ion Filter 6 for phenethylamines (98% CL, n = 5) using KM D. Test sets from both classes are plotted. Filter 6 was first evaluated with the phenethylamine test set. It is expected that only ions that are characteristic of 2C - phenethylamines would have KMD values within this filter. Six ions were identified from two standards in the test set as having KMD v alues within the filter. The ions at m/z 127 and 133 for 6 - APB and the ions at m/z 152, 164, 166, and 196 for 2C - D were within the filter. Successful classification was achieved for 2C - D for the ions at m/z 152 and 166. However, the other four ions (all io ns from 6 - APB and m/z 164 and 196 for 2C - D) are false positives. This is because the filter was developed with KMD values for m/z 152 and 166, and the chemical formulae corresponding to the m/z 127, 133, 164, and 196 ions are C 10 H 7 + , C 9 H 9 O + , C 10 H 14 NO + , and C 11 H 18 NO 2 + , respectively, which are compounds that are not in a homologous series with the compounds in Filter 6. 169 Two possible explanations for the presence of the four false positives are the wide filter window and the reduced mass accuracy in these ion s. The theoretical KMD value for this filter is 86.1 mDa, but the filter centroid is at 86.9 mDa. While this is not a large difference, the filter window of 2.6 mDa means that the filter spans KMD values between 84.3 mDa and 89.5 mDa, and this is a large e nough range to include other ions, such as the KMD value at m/z 127 (86.5 mDa) to fall within this range. The false positive at m/z 196 arises from the wide filter window as well as the reduced mass accuracy of this ion, since its theoretical KMD value is 85.3 mDa, but an error of 12 ppm shifted its KMD value to 87.5 mDa. The reduced mass accuracy in the ions at m/z 133 and 164 account for the false positives, with errors of 27 and 52 ppm, respectively. The theoretical KMD values for these ions are 83.2 and 75.7 mDa, respectively, and these values are not within the range of the filter. Fragment ion Filter 6 was then assessed with the cathinone test set. Since the filter is specific to 2C - phenethylamines, it is expected that no cathinone ions have KMD value s that fall within the filter. However, four ions were identified from the cathinone test set as having KMD values within the filter. These ions are m/z 148 for 2 - methoxy MC, m/z 141 and 199 for pyrovalerone, and m/z 162 for mephedrone. The chemical formul ae that correspond to these ions are C 9 H 10 NO + , C 11 H 9 + , C 14 H 17 N + , and C 10 H 12 NO + , respectively, and these compounds are clearly not in a homologous series with the compounds in the filter. The wide filter window and the reduced mass accuracy are the primary reasons for these false positives. The KMD values observed for the ions at m/z 141 and m/z 199 are close to the lower limit of the filter. The KMD values for the ions at m/z 148 and m/z 162 are close to the filter centroid (87.5 and 87.4 mDa, respectively) ; however, the KMD values for the two compounds were lowered (theoretically 89.1 mDa) as a result of reduced mass accuracy (error of 11 and 10 ppm, respectively). It is apparent that even 170 error of 10 ppm is detrimental to KMD, especially for ions at higher m/z ratios . It may seem that mass accuracy of 10 ppm for larger ions ( e.g. > 100 Da) is comparable to that for smaller ions ( e.g. < 100 Da), but because mass accuracy is determined by normalizing the difference between theoretical and experimental mass by the exact mass of the compound, larger compounds that exhibit the same mass difference as smaller compounds display less error and higher mass accuracy. For example, a mass difference of 1 mDa for ion s at m/z 77.0391 and m/z 115.0997 corresponds to mass accuracies of 13 ppm and 9 ppm , respectively. With a smaller filter window and higher mass accuracy, these ions would not fall within the filter, and the compounds in the cathinone test set would not be falsely classified as belonging t o the phenethylamine class, and more specifically, to the 2C - phenethylamine class. Successful classification for two of the ions from 2C - D was achieved with this filter . Hence, 2C - D wa s correctly classified as belonging to the phenethylamine class, and mo re specifically to the 2C - phenethylamine class. However, false positives for 2C - D and 6 - APB were observed. Furthermore, false positives for the three cathinone test set standards were also observed. Clearly, refinement of the filter window and ions with hi gher mass accuracy are necessary. In contrast to Filters 2 and 4, the filter is theoretically discriminatory, as the ions in the cathinone test set are not in a homologous series with the compounds in Filter 6. Fragment ion filters for cathinones were the n investigated in order to assess their potential to provide discrimination between the two classes (Table 9.5). Only Filters 2, 7, and 9 are discussed. Filters 1 and 3 ( Figure s B.1 8 and B. 19 ) are similar to Filter 2, Filters 4 6 ( Figure s B. 20 B. 22 ) co ntain ions that were used in filters for phenethylamines (Filters 1 3, respectively) and are clearly not discriminatory, and Filter 8 ( Figure B. 23 ) is similar to Filter 7. 171 Chemical information may be obtained from the cathinone fragment ion filters that can provide classification to the cathinone class and discriminate between phenethylamines and cathinones. Table 9.5. List of the ions included in each fragment ion filter for the cathinone class using Kendrick mass defect. Filter m/z R ange Chemical Formula Range 1 58 72 C 3 H 8 N + C 4 H 10 N + 2 42 98 C 2 H 4 N + C 6 H 12 N + 3 54 96 C 3 H 4 N + C 6 H 10 N + 4 51 79 C 4 H 3 + C 6 H 7 + 5 63 105 C 5 H 3 + C 8 H 9 + 6 62 118 C 5 H 2 + C 9 H 10 + 7 132 174 C 9 H 10 N + C 12 H 16 N + 8 130 186 C 9 H 8 N + C 13 H 16 N + 9 105 119 C 7 H 5 O + C 8 H 7 O + Fragment ion Filter 2 for the cathinone class was developed for ions that ranged in chemical formulae from C 2 H 4 N + to C 6 H 12 N + in the lower m/z range (42 98 Da), and includes the ions C 3 H 6 N + and C 4 H 8 N + . The ion at m/z 42 is present only in me thcathinone, but the m/z 56 ion is exhibited by all the training set standards. The m/z 70 ion is present in 3 - MEC, - PPP, and 3 - methyl PPP, and the m/z 98 ion is observed in - PPP and 3 - methyl PPP. These ions correspond to the aliphatic chain attached to the amine, and the increase in m/z ratio is the addition of CH 2 groups to the chain. The filter is centered at 12.7 ± 0.6 mDa (99.99% CL, n = 11) (Figure 9.20). A small filter window is observed despite the high confidence level, with mass accuracy of the training set standards 10 ppm. Filter 2 was first evaluated with the cathinone test set. Six ions from the test set were identified as having KMD values that were within the filter. These ions are the m/z 56 ion for all three standards, and the ions at m/z 70, 98, and 126 for pyrovalerone. Successful classification of these ions was achieved, and it was observed that the homologous series continues past m/z 98 to 172 incorporate the m/z 126 ion for pyrovalerone, which has chemical composition C 8 H 16 N + . However, false negatives were found, as three ions with chemical formulae that correspond to belonging to this homologous series were identified. These are m/z 42 for 2 - methoxy MC an d mephedrone, and m/z 84 for pyrovalerone, with chemical formulae C 2 H 4 N + and C 5 H 10 N + , respectively. The primary reason for the false negatives at m/z 42 for 2 - methoxy MC and mephedrone is the reduced mass accuracy of these ions, with errors of 69 and 67 pp m, respectively. On the other hand, the false negative at m/z 84 for pyrovalerone is attributed to the narrow filter window (0.6 mDa). Further refinement of the filter window by analyzing more reference standards is needed in order to reduce false negative s. Figure 9.20. Fragment ion Filter 2 for cathinones (99.99% CL, n = 11) using KMD. The cathinone test set is plotted. No ions from the phenethylamine test set were found to contain KMD values within this filter. Filter 2 was then assessed with the phenethylamine test set. No ions from the test set were found to have KMD values within the filter range, indicating that none of the phenethylamine 173 test set standards are classified as a cathinone using this filter. Also, no fa lse positives or negatives were observed. Despite the three false negatives that were identified, successful classification of the cathinone test set ions was achieved. Furthermore, Filter 2 is discriminatory as no ions in the phenethylamine test set were found to have KMD values that were within the filter. The ability of this filter to extend up to the mid - m/z range to incorporate the m/z 126 ion shows not only specificity of the homologous series but also demonstrates the utility of KMD filters to be ap plied to larger molecules in the homologous that may exhibit fragment ions in the higher m/z range . In the event of this occurrence, the KMD filters are still able to correctly classify the ions, in contrast to the absolute mass defect filters, which are l imited by the specified m/z range . Fragment ion Filter 7 spans the m/z range from 132 174 Da, and is one of the filters developed with ions at the higher m/z range . The chemical formulae associated with this filter range from C 9 H 10 N + to C 12 H 16 N + , and in clude the ion C 11 H 14 N + at m/z 160. Methcathinone exhibits the m/z 132 ion, while 3 - MEC and - PPP display the ion at m/z 160, and the m/z 174 ion is present in 3 - methyl PPP. The structures that correspond to these ions are based on the cathinone core structure without the carbonyl oxygen and the methyl group on the alpha - carbon. Additional CH 2 groups most likely on the amine group result in the higher m/z ions in the filter. Filter 7 was developed at 66.3 ± 1.4 mDa (95% CL, n = 4) (Figure 9.21). Fragment ion Filter 7 was first evaluated using the cathinone test set. Five ions from two of the test set standards were identified as having KMD v alues that were within the filter. The ions at m/z 74, 132, 146, and 147 present in 2 - methoxy MC, and at m/z 174 in pyrovalerone, had KMD values within this filter. No ions for mephedrone were observed to have KMD values that fell within this filter. Of th e five ions, the three ions at m/z 132 and 146 for 2 - methoxy MC and 174 m/z 174 for pyrovalerone were expected to have KMD values in this range, as the chemical formulae for these ions (C 9 H 10 N + , C 10 H 12 N + , and C 12 H 16 N + ) correspond to compounds that are in a hom ologous series with the compounds in Filter 7. However, the remaining two ions at m/z 74 and 147 for 2 - methoxy MC are false positives, with chemical formulae C 6 H 2 + and C 10 H 13 N + , respectively. Clearly, the two ions are not in a homologous series with the co mpounds in F ilter 7. The false positive at m/z 74 is attributed to the proximity of the filter to other KMD values in conjunction with slightly reduced mass accuracy of the ion. The error associated with this ion is 9 ppm, and while this is an acceptable v alue, the reduction in mass accuracy led to a shift in KMD from 67.0 mDa to 66.3 mDa. The false positive at m/z 147 is mainly due to reduced mass accuracy, as the error associated with the ion is 46 ppm. Despite the two false positives for 2 - methoxy MC, th e compound exhibited two ions that were within the filter, indicating that the compound is correctly classified as belonging to the cathinone class with this filter. Figure 9.21. Fragment ion Filter 7 for cathinones (95% CL, n = 4) using KMD. Test sets from both classes are plotted. 175 Filter 7 was then evaluated with the phenethylamine test set. Three ions from two test set standards were identified as having KMD values within the filter. Of these, the ions at m/z 74 and 102 were present in 6 - AP B and the m/z 136 ion in 2C - D had KMD values in this range. Using this fragment ion filter, both 6 - APB and 2C - D would have been incorrectly classified as cathinones. However, these ions are false positives, as the compounds corresponding to these ions (C 6 H 2 + , C 8 H 6 + , and C 9 H 12 O + ) are not in a homologous series with the compounds in this filter. The reason that the ions at m/z 74 and 102 for 6 - APB had KMD values within the filter is the same as above for the m/z 74 ion for 2 - methoxy MC. Reduced mass accuracy for the ion at m/z 136 for 2C - D is observed (error of 30 ppm), resulting in a false positive. Successful classification of the majority of the ions in the cathinone test set was achieved; however, two false positives were identified from the test set. Fu rthermore, three false positives were identified from the phenethylamine test set, indicating that the filter window needs to be refined and higher mass accuracy for the ions in the higher m/z range is needed. Despite the false positives, the filter is tec hnically discriminatory for the two classes, as all other ions in the phenethylamine test set did not have KMD values in this range, and most phenethylamine fragment ions at the higher m/z range do not contain a nitrogen atom. Fragment ion Filter 9 was th e only filter for cathinones that included ions composed of carbons, hydrogens, and one oxygen atom. The m/z range that the filter spans is from 105 119 Da, with corresponding chemical formulae C 7 H 5 O + to C 8 H 7 O + . The structures of these two ions are based on the benzoyl ion, with the m/z 119 ion corresponding to the methylated benzoyl ion. It is important to note that there is another ion with nominal mass of 105 Da; however, that ion corresponds to C 8 H 9 + , which is a structure containing a benzene ring and an ethylene group, and was not used in the development of this filter. Using HRMS, these two ions are discriminated, as 176 the former has an exact mass of 105.0340 Da while the latter has an exact mass of 1 05.0704 Da. The continued reference to the m/z 105 in this discussion is to the ion with exact mass of 105.0340 Da. Methcathinone and - PPP exhibit the ion at m/z 105 while the substituted compounds 3 - MEC, 2 - methyl MC, and 3 - methyl PPP exhibit the m/z 119 ion. The filter was developed at 83.1 ± 0.5 mDa (98% CL, n = 5) (Figure 9.22). The filter contains a narrow filter window, which increases the possibility of false negatives. While this is undesirable, filter windows are statistically determined based upo n the confidence interval s associated with the KMD values for each ion in the filter s . Thus, the confidence intervals are different depending on the range of the KMD values, and the smallest confidence interval that encompasses the range of KMD values is u sed as the filter window in order to reduce false positives. Filter 9 was first assessed with the cathinone test set. Only the m/z 119 ion for mephedrone was identified to have a KMD value that was within the filter. Successful classification of this ion was achieved; however, four other ions at m/z 105 for 2 - methoxy MC and m/z 119 for 2 - methoxy MC and pyrovalerone were expected to have KMD values within this range. These ions are false negatives, as their KMD values are outside the filter window. The prim ary reason for this occurrence is the narrow filter window, as the mass accuracy of these false negatives range from 7 12 ppm, which are acceptable values. Furthermore, the ion at m/z 119 for mephedrone that was within the filter has a KMD value that is the exact value of the lower limit, indicating that the refinement of the filter window is needed in order to reduce the number of false negatives. Filter 9 was then evaluated using the phenethylamine test set. Only one ion was identified in the test set as having a KMD value within the filter; this was the m/z 167 ion in 2C - D, which 177 corresponds to C 10 H 15 O 2 + and is clearly not part of the homologous series in this filter. This false positive is attributed to the reduced mass accuracy of the ion (26 ppm) th at resulted in a shift in KMD from 79.4 mDa to 83.7 mDa, and was thus within the filter. The m/z 119 ion also in 2C - D was a false negative due to the narrow filter window, as its mass accuracy is 14 ppm, which is acceptable. It is likely that the m/z 119 i on present in 2C - D corresponds to a structure that contains a benzene ring and a methoxy group, rather than the methylated benzoyl group since 2C - D contains two methoxy groups and does not have a carbonyl functional group attached to the benzene ring. Impr oved mass accuracy in the higher m/z range is needed in order to reduce the number of false positives and a wider filter window is needed to minimize false negatives. Figure 9.22. Fragment ion Filter 9 for cathinones (98% CL, n = 5) using KMD. Test sets from both classes are plotted. Successful classification of mephedrone to the cathinone class using Filter 9 was achieved; further analysis of a wider range of standards to refine the filter window is needed in order to 178 reduce the number of false negativ es. Despite the false positive in the phenethylamine test set, this filter is still potentially useful for discrimination between the two classes, as the compounds in the filter are based upon a structure that is found in the cathinone core structure but n ot in the phenethylamine structure. Despite the false positives from compounds that exhibit ions with structure s that are similar to the benzoyl functional group, this filter is still useful in a classification scheme since compounds with ions that have KM D values in this range are more likely to possess a benzoyl functional group than compounds with ions that have KMD values not within the filter. In summary, the majority of the fragment ion filters for phenethylamines were developed with ions that contai n only carbon and hydrogen. Most of the filters were developed for ions in the lower m/z range; however, two filters were established with ions at the higher m/z range. These two filters provide subclassification specificity within phenethylamines, and are highly discriminatory despite the presence of false positives. As briefly discussed above, the KMD values for ions at higher m/z ratios are more susceptible to reduced mass accuracy as compared to ions in the lower m/z range, despite acceptable error ( i.e . below 20 ppm). The other filters for the phenethylamine class, however, do not provide discrimination between phenethylamines and cathinones. T he majority of the fragment ion filters for the cathinone class included ions that contained carbon, hydrogen, and nitrogen. This was observed throughout the m/z range , indicating that more nitrogen - containing ions are found in the mass spectra of cathinones , in contrast to the carbons and hydrogens that comprise the majority of the phenethylamine fragment ions. Because the filters developed for the two classes show distinct differences in chemical and structural 179 information, a preliminary compound class can b e predicted using KMD filters depending on the KMD values of the majority of the ions in an unknown compound. The specificity of the KMD filters is also important, as this allows KMD filters to be used in the later stages of a classification scheme rather than at the preliminary stages, as observed with absolute mass defect filters. The large m/z range s that the KMD filters span is also advantageous to the classification, as it overcomes the challenges associated with absolute mass defect. However, some li mitations of KMD do exist, such as the need for high mass accuracy across the entire m/z range , which was found to be crucial in order to reduce false positives, especially in the higher m/z range . Thus, another type of mass defect filter that is not as af fected by mass accuracy is needed in the classification scheme. 9.5 Relative Mass Defect Relative mass defect (RMD) is useful for determining whether a compound is hydrogen - and nitrogen - rich or oxygen - rich by normalizing the absolute mass defect of the compound to its exact mass. High RMD values indicate hydrogen - and nitrogen - richness, while low RMD values are indicative of oxygen - richness. This chemical information is useful for classification. A molecular ion filter for the phenethylamine class w as developed using the phenethylamine training set (Figure 9.23). The filter is centered at 627 ± 82 ppm (82% CL for n = 3). The filter lies in the mid - range RMD, which is expected as the phenethylamine compounds in the filter all contain a nitrogen atom a nd between 1 and 2 oxygen atoms. Hydrogen - and nitrogen - richness along with oxygen - richness is balanced in these compounds. The filter was assessed using the phenethylamine test set; 3,4 - MDPA did not exhibit a molecular ion, and therefore, only 6 - APB and 2 C - D were used in the test set to evaluate the molecular ion filter. Both compounds have RMD values that lie within the filter, indicating successful classification with this filter. 180 However, because the filter is in the mid - range RMD, the amount of chemica l information obtained regarding these compounds is limited. Figure 9.23. Molecular ion filter for phenethylamines using RMD (82% CL, n = 3), with phenethylamine test set plotted. Instead of fragment ion filters, profiles of the RMD for the fragment ions of the standards for each class were generated. The RMD values of 17 ions from each of the five phenethylamine training set standards were plotted against m/z values to generate the RMD profile (Figure 9.24). m/z range number of fragment ions plotted, only 12% of the ions exhibit high RMD values, and all are in the lower m/z range . This indicates that the majority of the fragment ions in the phenethylamine compounds are neither hydrogen - nor nitrogen - rich. The remaining fragment ions have RMD values between 400 and 600 ppm throughout the entire m/z range , indicating that the majori ty of fragment ions are oxygen - rich. These fragment ions are likely to be composed of carbon, 181 hydrogen, and oxygen, with low hydrogen content. The fragment ions from the phenethylamine test set were then plotted onto the profile. The majority of the ions h ave RMD values within the 400 The pattern of the fragment ions in the test set is similar to that of the training set standards. Figure 9.24. Fragment ion profile for p henethylamines using RMD, with fragment ions from the phenethylamine test set plotted. Red box o utlines high RMD r a n ge in the lower m/z range. RMD values of 14 ions from each of the 5 cathinone training set standards was plotted against m/z values to gen erate the profile (Figure 9.25). The pattern of fragment ions in the cathinone standards show some similarity to that of the fragment ions in the phenethylamine r ange for the cathinones. Of the 70 fragment ions included in the profile, 40% of the ions have high RMD values ( i.e. 182 this region. This may be a point of discrimination between the two clas ses for the compounds investigated in this research. Figure 9.25. Fragment ion profile for cathinones using RMD, with fragment ions from cathinone test set plotted. Red box outlines high RMD range in the lower m/z range. It is apparent that cathinones exhibit more ions that are hydrogen - and nitrogen - rich, especially in the lower m/z range as compared to phenethylamines. These fragment ions are likely the nitrogen - containing aliphatic portions of the compounds as opposed to the aromatic compo nent. This is in agreement with the KMD fragment ion filters developed for cathinones, as the majority of the ions in the filters contained nitrogen and likely have high RMD values. The cathinone standards do display ions in the 400 600 ppm region across the m/z range ; however, the pattern of the ions in this region is not as concentrated as the one observed in the phenethylamine profile. The fragment ions in this RMD range most likely contain only carbon and hydrogen comprising the aromatic component of the compound; however, not many of these fragment ions are observed. The pattern of the cathinone test set resembles that of the cathinone 183 standards. The differences observed in the patterns of both standards and test sets are potentially useful in discrim inating between phenethylamines and cathinones. The RMD molecular ion filter does not provide discriminatory information since it is in the mid - range of RMD; however, the RMD profiles generated from the fragment ions for both classes display some differenc es that are attributed to chemical differences between the two classes. These differences are potentially useful for classification to the phenethylamine and cathinone classes. 9.6 Classification Scheme The three types of mass defects can be incorporated into a classification scheme. Figure 9.26 illustrates a proposed classification scheme in order to be able to classify novel synthetic drugs to the cathinone or phenethylamine class. Absolute mass defect is most suited for a preliminary classification in order to determine whether the compound of interest is a phenethylamine - or cathinone - like compound. Because the molecular ion and fragment ion filters probe the aromaticity and aliphaticity of compounds, a synthetic designer drug exhibiting both component s is likely to be a cathinone or phenethylamine. The second step in the classification scheme is to utilize the RMD molecular ion filter and RMD profiles. Once absolute mass defect filters have indicated whether the unknown compound is likely to have both aromatic and aliphatic components, RMD filter and profiles will then indicate whether the compound has more characteristics belonging to cathinones or phenethylamines . Finally, KMD filters are able to provide more specific information regarding the struct ure of the compound , including the subclass . The outputs of all the mass defect filters and profiles will then be combined to give an overall output class of either cathinone or phenethylamine. This research demonstrates that different types of mass defect filters probe 184 different aspects of a compound, and that the combination of the different mass defect filters provide more chemical and structural information necessary for classification. Figure 9.26. Diagram of a proposed classification scheme using t he three types of mass defects for classification of novel synthetic designer drugs to the cathinone or phenethylamine class. 185 APPENDIX 186 Figure B. 1. Mass spectrum of 2C - P obtained by (a) GC - QMS and (b) GC - TOFMS with (c) proposed fragmentation pathway. 187 (c) 188 Figure B. 2. Mass spectrum of 2C - D obtained by (a) GC - QMS and (b) GC - TOFMS with (c) proposed fragmentation pathway. 189 (c) 190 Figure B. 3. Mass spectrum of 6 - APB obtained by (a) GC - QMS and (b) GC - TOFMS with (c) proposed fragmentation pathway. 191 (c) 192 Figure B. 4. Mass spectrum of 5 - MAPDB obtained by (a) GC - QMS and (b) GC - TOFMS with (c) proposed fragmentation pathway. 193 (c) 194 Figure B. 5. Mass spectrum of 3,4 - MDPA obtained by (a) GC - QMS and (b) GC - TOFMS with (c) proposed fragmentation pathway. 195 (c) 196 Figure B. 6. Mass spectrum of 3 - methyl PPP obtained by (a) GC - QMS and (b) GC - TOFMS with (c) proposed fragmentation pathway. 197 (c) 198 Figure B. 7. Mass spectrum of methcathinone obtained by (a) GC - QMS and (b) GC - TOFMS with (c) proposed fragmentation pathway. 199 (c) 200 Figure B. 8. Mass spectrum of 2 - methyl MC obtained by (a) GC - QMS and (b) GC - TOFMS with (c) proposed fragmentation pathway. 201 (c) 202 Figure B. 9. Mass spectrum of 3 - MEC obtained by (a) GC - QMS and (b) GC - TOFMS with (c) proposed fragmentation pathway. 203 (c) 204 Figure B. 10. Mass spectrum of pyrovalerone obtained by (a) GC - QMS and (b) GC - TOFMS with (c) proposed fragmentation pathway. 205 (c) 206 Figure B. 11. Mass spectrum of mephedrone obtained by (a) GC - QMS and (b) GC - TOFMS with (c) proposed fragmentation pathway. 207 (c) 208 Figure B. 12. Structure of cocaine. 209 Figure B. 13. Structures of the (a) m/z 56, (b) m/z 77, and (c) m/z 91 ions common to cathinone training set standards. (a) (b) (c) 210 Figure B. 14. Fragment ion Filter 1 developed for the phenethylamine class using KMD. The filter is centered at 32.6 ± 4.2 mDa (99.999% CL, n = 13). 211 Figure B. 15. Fragment ion Filter 3 developed for the phenethylamine class using KMD. The filter is centered at 53.2 ± 1.5 mDa (99.9% CL, n = 8). 212 Figure B. 16. Fragment ion Filter 5 developed for the phenethylamine class using KMD. The filter is centered at 73.8 ± 1.5 mDa (99.5% CL, n = 8). 213 Figure B. 17. Fragment ion Filter 7 developed for the phenethylamine class using KMD. The filter is centered at 91.1 ± 2.7 mDa (99% CL, n = 6). 214 Figure B. 18. Fragment ion Filter 1 developed for the cathinone class using KMD. The filter is centered at - 0.6 ± 0.2 mDa (80% CL, n = 3). 215 Figure B. 19. Fragment ion Filter 3 developed for the cathinone class using KMD. The filter is centered at 26.0 ± 0.3 mDa (99.8% CL, n = 6). No phenethylamine ions were observed. 216 Figure B. 20. Fragment ion Filter 4 developed for the cathinone class using KMD. The filter is centered at 33.5 ± 0.7 mDa (99.9% CL, n = 8). 217 Figure B. 21. Fragment ion Filter 5 developed for the cathinone class using KMD. The filter is centered at 46.7 ± 1.0 mDa (99.99% CL, n = 15). 218 Figure B. 22. Fragment ion Filter 6 developed for the cathinone class using KMD. The filter is centered at 54.0 ± 1.9 mDa (99.95% CL, n = 8). 219 Figure B. 23. Fragment ion Filter 8 developed for the cathinone class using KMD. The filter is centered at 79.0 ± 0.5 mDa (90% CL, n = 4). 220 REFERENCES 221 R EFERENCES 1. Grabenauer M, Krol WL, Wiley JL, Thomas BF. Analysis of Synthetic Cannabinoids Using High - Resolution Mass Spectrometry and Mass Defect Filtering: Implications for Nontargeted Screening of Designer Drugs. Analytical Chemistry. 2012 2012/07/03;84(13):5574 - 81. 222 Chapter 10 Conclusions and Future Work 10.1 Conclusions This proof - of - concept research aimed to develop tools to assist forensic practitioners in the identification of synthetic designer drugs, particularly by reducing the time - consuming nature of structural elucidation and allowing analysts to prioritize novel synthetic drugs for identification. High - resolution mass spectrometry (HRMS) as an alternative to GC - QMS was first investigated to assess the feasibility of using mass spectra acquired via GC - TOFMS as references to which low - resolution mass spectra can be compared. It was observed that high - resolution mass spectral data provide similar, if not more, chemical information than low - resolution mass spectra, which is ideal for identification of synthetic drugs. In the event that mass spectra of reference standa rds acquired via GC - QMS are not available for forensic analysts, it is advantageous to compare mass spectra of submitted samples obtained by GC - QMS to those of reference standards obtained by GC - TOFMS that are available to practitioners. The second goal w as to develop mass defect filters from mass spectra obtained via GC - TOFMS to allow discrimination between the synthetic phenethylamine and synthetic cathinone classes. Three types of mass defect filters were investigated: absolute, Kendrick, and relative m ass defect. Each type of mass defect filter probed a different aspect of a compound, and differed in their specificity of classification. Absolute mass defect is non - specific and is better suited for preliminary screening of compounds to distinguish synthe tic phenethylamine - and cathinone - like compounds from other compounds. On the other hand, Kendrick mass defect is highly specific and is able to subclassify within a structural class, indicating that these filters are ideal in the later stages of a classif ication scheme. Relative mass defect filters and profiles displayed higher specificity than absolute mass defect, but classification was not as specific as 223 Kendrick mass defect, indicating that it should be incorporated between absolute and Kendrick mass d efect filters in a classification scheme. By combining all three types of filters, the specificity of the classification is increased. Using mass defect filters, classification of synthetic designer drugs is rapid and simple, and allows forensic analyst to prioritize the analysis of novel synthetic drugs, so that other resources are directed towards identification. 10.2 Future Work Further investigation of absolute mass defect filters is necessary to obtain useful information for classification to the phen ethylamine and cathinone classes; this preliminary study focused on fragment ions that were common to all five standards in each class. However, the specificity of the classification to either class can be increased with the use of subsequent mass defect f ilters for ions that may only be present in certain groups of compounds within the class. Additionally, a more in - depth study on the ideal characteristics for mass defect filters would be particularly useful in order to develop filters that provide the lev el of specificity and accuracy needed for classification. A wider range of compounds also needs to be investigated to ensure that the mass defect filters developed are representative of the structural classes. Furthermore, a classification scheme develope d using the mass defect filters would benefit from a confidence - based classification . This type of classification would entail the assignation of a confidence level to the output class from the screening based upon how well the chemical information from an unknown compound is captured by the filters. Even though the filter windows are developed at different confidence levels, the specificity of each filter itself is variable, and by weighting each filter based on its specificity for classification, an overa ll confidence value can be assigned to the resulting classification as a measure of how well t he novel compound fits into a particular class .