DEVELOPMENT OF CLASS REFERENCE STANDARDS FOR MULTIVARIATE STATISTICAL ANALYSIS OF FIRE DEBRIS By Jordyn Geiger A THESIS Submitted to Michigan State University In partial fulfillment of the requirements For the degree of Forensic Science—Master of Science 2014 ABSTRACT DEVELOPMENT OF CLASS REFERENCE STANDARDS FOR MULTIVARIATE STATISTICAL ANALYSIS OF FIRE DEBRIS By Jordyn Geiger Research in statistical analysis of fire debris evidence has grown increasingly as a result of the 2009 National Academy of Sciences report. Multivariate statistical procedures such as principal components analysis (PCA) have previously been investigated as a means for associating simulated fire debris samples to the corresponding ignitable liquid standards. This research investigated the development of class reference standards aimed to standardize a multivariate statistical approach for the analysis of fire debris. Three standards representative of ASTM chemical classes were developed in this research to investigate the utility of an alternative standard data set. Standards were developed based on major characteristic compounds of each class. Commercially available ignitable liquid standards, developed class reference standards, and simulated fire debris samples were analyzed by gas chromatography-mass spectrometry. The utility of developed reference standards was investigated using PCA, hierarchical cluster analysis (HCA), and k-Nearest neighbors (k-NN) as a means of generating a more standardized approach. Commercially available standards were also used to investigate the impact of data set selection for successful association and classification of simulated fire debris samples to the corresponding standard and address current limitations of commercial ignitable liquid standards. Association and classification of class reference standards was successful and showed some potential as an alternative to commercially available ignitable liquid standards. ACKNOWLEDGEMENTS I would first like to thank everyone for all of the support I have received over the past two years. Most importantly, I would like to thank Dr. Ruth Smith for all of her guidance and encouragement throughout my time at Michigan State and for making this experience challenging and fulfilling. Thank you to Dr. Chris Melde for dedicating time to sit on my committee and to Dr. Victoria McGuffin for dedicating her time and expertise throughout my time at Michigan State. I would like to thank everyone in the MSU Forensic Science Program for their support, assistance, and encouragement including John McIlroy, Christy Hay, Barb Fallon, Fanny Chu, and Becca Brehe. I would particularly like to thank my fellow second year’s Mac Hopkins, Ashley Mottar, and Ashley Doran for being great friends and making this experience much more enjoyable. I would also like to thank my family and friends who have been my greatest support system throughout this entire experience especially my parents and grandparents who have believed in me from the start. Finally, I would like to thank Kari and Jake for supporting me and putting up with me each and every day throughout all the stress and craziness of this adventure! iii TABLE OF CONTENTS LIST OF TABLES ....................................................................................................................... vi LIST OF FIGURES .................................................................................................................... vii 1. Introduction ............................................................................................................................... 1 1.1 Background ...................................................................................................................... 1 1.2 Ignitable Liquid Classification ......................................................................................... 1 1.3 Current Methods in Fire Debris Analysis ........................................................................ 4 1.4 Limitations in Current Methods of Fire Debris Analysis ................................................. 5 1.5 Statistical Analysis of Fire Debris ................................................................................... 6 1.6 Literature Review ........................................................................................................... 10 1.6.1 Limitations in Current Methods of Fire Debris Analysis ....................................... 10 1.6.2 Statistical Analysis of Fire Debris .......................................................................... 12 1.7 Research Objectives ....................................................................................................... 25 REFERENCES ............................................................................................................................ 27 2. Theory .................................................................................................................................. 30 2.1. Gas Chromatography-Mass Spectrometry ..................................................................... 30 2.2. Data Pretreatment .......................................................................................................... 39 2.3. Data Analysis ................................................................................................................. 41 2.3.1. Principal Component Analysis ............................................................................... 41 2.3.2. Euclidean Distance ................................................................................................. 42 2.3.3. Hierarchical Cluster Analysis................................................................................. 43 2.3.4. k-Nearest Neighbors ............................................................................................... 44 REFERENCES ............................................................................................................................ 48 3. Materials and Methods .......................................................................................................... 50 3.1. Commercial Ignitable Liquid Standards ........................................................................ 50 3.2. Class Reference Standards ............................................................................................. 50 3.3. Preparation of Simulated Fire Debris ........................................................................... 51 3.3.1. Burn Study ............................................................................................................... 53 3.3.2. Spike Volume Study ................................................................................................. 53 3.3.3. Simulated Fire Debris Samples .............................................................................. 54 3.4. Passive-Headspace Extraction ....................................................................................... 54 3.5. GC-MS Analysis ............................................................................................................. 56 3.6. Data Pretreatment .......................................................................................................... 56 3.7. Data Analysis ................................................................................................................. 57 3.7.1. Principal Components Analysis .............................................................................. 57 3.7.2. Euclidean Distance ................................................................................................. 57 3.7.3. Hierarchical Cluster Analysis................................................................................. 58 3.7.4. k-Nearest Neighbors ............................................................................................... 58 4. Investigation of Class Reference Standards for Association of Fire Debris to ASTM Class using Principal Components Analysis....................................................................................... 59 4.1. Introduction .................................................................................................................... 59 iv 4.2. Commercially Available Standards and Corresponding Class Reference Standards ... 60 4.2.1. Gasoline .................................................................................................................. 60 4.2.2. Medium Petroleum Distillate .................................................................................. 62 4.2.3. Heavy Petroleum Distillate ..................................................................................... 62 4.3. Determination of Substrate Burn Times ......................................................................... 65 4.4. Simulated Fire Debris Samples ...................................................................................... 68 4.5. Association and Discrimination of Simulated Fire Debris using PCA .......................... 71 4.5.1. Commercial Ignitable Liquid Standards – Chemically Diverse Data Set .............. 72 4.5.2. Commercial Ignitable Liquid Standards – Refined Data Set.................................. 80 4.5.3. Class Reference Standards ..................................................................................... 87 4.6. Summary ......................................................................................................................... 91 5. Investigation of Class Reference Standards for Association and Classification of Fire Debris to ASTM Class using Hierarchical Cluster Analysis and k-Nearest Neighbors ....... 93 5.1. Introduction .................................................................................................................... 93 5.2. Association of Simulated Fire Debris using HCA .......................................................... 94 5.2.1. Commercial Ignitable Liquid Standards- Chemically Diverse Data Set ................ 94 5.2.2. Commercial Ignitable Liquid Standards- Refined Data Set ................................. 102 5.2.3. Class Reference Standards ................................................................................... 106 5.3. Association of Simulated Fire Debris using k-NN ....................................................... 109 5.3.1. Commercial Ignitable Liquid Standards- Chemically Diverse Data Set .............. 109 5.3.2. Commercial Ignitable Liquid Standards- Refined Data Set ................................. 111 5.3.3. Class Reference Standards ................................................................................... 117 5.4. Summary ....................................................................................................................... 118 REFERENCES .......................................................................................................................... 121 6. Conclusions ............................................................................................................................ 123 6.1. Summary ....................................................................................................................... 123 6.1.1. Objectives and Goals ............................................................................................ 123 6.1.2. Association of Fire Debris using PCA .................................................................. 123 6.1.3. Association of Fire Debris using HCA ................................................................. 126 6.1.4. Association of Fire Debris using k-NN ................................................................. 127 6.2. Future Work ................................................................................................................. 128 REFERENCES .......................................................................................................................... 129 v LIST OF TABLES Table 1.1: ASTM Classification of Ignitable liquids ...................................................................... 2 Table 3.1: Composition of the gasoline class reference standard prepared in 15 mL of dichloromethane ............................................................................................................................ 52 Table 3.2: Composition of the medium and heavy petroleum distillate class reference standards prepared in 15 mL of dichloromethane ......................................................................................... 52 Table 3.3: Spike volumes used for simulated fire debris samples with respect to each commercial ignitable liquid and substrate ........................................................................................................ 55 Table 4.1: Euclidean distances between fire debris scores and ignitable liquid standard scores for the chemically diverse data set ..................................................................................................... 78 Table 4.2: Euclidean distances between fire debris scores and ignitable liquid standard scores for the refined data set ........................................................................................................................ 83 Table 4.3: Euclidean distances between fire debris scores and ignitable liquid standard scores for the refined data set containing diesel and excluding diesel .......................................................... 86 Table 4.4: Euclidean distances between fire debris scores and class reference scores for the class reference data set........................................................................................................................... 90 Table 5.1: Percent classification of simulated fire debris containing carpet spiked with diesel to the corresponding commercial diesel standard using 1, 3, 5, 7, and 9 nearest neighbors .......... 113 vi LIST OF FIGURES Figure 1.1: Representative total ion chromatograms of A) commercial paint thinner and B) commercial upholstery protector to indicate differences within a given ASTM class ................... 9 Figure 2.1: Diagram of a gas chromatograph ............................................................................... 31 Figure 2.2: Diagram of a quadrupole mass analyzer .................................................................... 37 Figure 2.3: Diagram depicting Euclidean distance and the single-linkage process used during HCA clustering ............................................................................................................................. 45 Figure 2.4: Diagram depicting k-NN classification based on the number nearest neighbors selected .......................................................................................................................................... 47 Figure 4.1 Representative total ion chromatograms of A) commercial gasoline standard and B) gasoline class reference standard with characteristic compounds identified ................................ 61 Figure 4.2: Representative total ion chromatograms of A) commercial torch fuel standard and B) medium petroleum distillate class reference standard with characteristic compounds identified 63 Figure 4.3: Representative total ion chromatograms of A) commercial diesel standard and B) heavy petroleum distillate class reference standard with characteristic compounds identified .... 64 Figure 4.4: Representative total ion chromatograms of A) 30-second burn time of the treated red oak flooring substrate and B) 60-second burn time of the treated red oak flooring substrate with characteristic compounds identified. Compounds from the wood treatment are indicated in red.66 Figure 4.5: Representative total ion chromatogram of 120-second burn time of nylon carpet with carpet padding with characteristic compounds indicated ............................................................. 67 Figure 4.6: Representative total ion chromatograms for A) treated red oak flooring spiked with 75 µL of commercial diesel with substrate interferences indicated in red and B) commercial diesel ignitable liquid standard. *C12 originates from both wood treatment and diesel................ 69 Figure 4.7: Representative total ion chromatogram of A) nylon carpet with carpet padding spiked with 175 µL of commercial diesel with substrate interferences indicated in red and B) commercial diesel ignitable liquid standard ................................................................................. 70 Figure 4.8: PCA scores plot of PC1 (32.5%) verses PC2 (24.4%) for the chemically diverse data set with commercial standards only .............................................................................................. 73 Figure 4.9: Loadings plots for the chemically diverse data set with A) PC1 representing 32.5% of the variance and B) PC2 representing 24.4% of the variance....................................................... 74 vii Figure 4.10: PCA scores plot of PC1 (32.5%) verses PC2 (24.4%) for the chemically diverse data set with commercial standards and simulated fire debris projected ............................................. 77 Figure 4.11: PCA scores plot of PC1 (60.2%) versus PC2 (24.4%) for the refined data set with commercial standards and simulated fire debris projected ........................................................... 81 Figure 4.12: PCA scores plot of PC1 (63.5%) versus PC2 (25.1%) for the refined data set with commercial standards (excluding diesel) and simulated fire debris projected ............................. 85 Figure 4.13: PCA scores plot of PC1 (72.8%) versus PC2 (27.1%) for the class reference standards data set with the projected scores of the simulated fire debris ..................................... 89 Figure 5.1: Dendrogram of the chemically diverse data set with similarity levels indicated where appropriate .................................................................................................................................... 95 Figure 5.2: Representative total ion chromatograms of A) commercial gasoline A standard, B) commercial gasoline B standard, and C) commercial gasoline C standard with characteristic compounds identified to highlight differences among each gasoline standard ............................ 97 Figure 5.3: Dendrogram of the chemically diverse data set with simulated fire debris consisting of carpet spiked with diesel......................................................................................................... 100 Figure 5.4: Dendrogram of the refined data set with simulated fire debris consisting of carpet spiked with diesel ........................................................................................................................ 103 Figure 5.5: Dendrogram of the class reference data set with simulated fire debris consisting of carpet spiked with diesel ............................................................................................................. 107 Figure 5.6: Representative total ion chromatogram of A) simulated fire debris consisting of carpet spiked with commercial diesel, B) commercial diesel standard, and C) commercial fuel injector standard with compounds identified .............................................................................. 114 viii 1. Introduction 1.1 Background Arson is a crime that involves the ignition of a fire with the intent to cause damage. The damage caused can be intended solely for destruction or for the purpose of covering up another crime. Arson is typically identified by the presence of an ignitable liquid in debris collected from the scene of the crime. Perpetrators will use an ignitable liquid in order to speed up the spread of the fire and increase the amount of damage caused; therefore, the ignitable liquid used is commonly an easily accessible liquid such as gasoline. It is the job of the forensic scientist to determine if there is an ignitable liquid present within the fire debris and to classify any ignitable liquids present according to chemical class. 1.2 Ignitable Liquid Classification ASTM International has classified ignitable liquids into eight different classes based on chemical composition (1). These classes include gasoline, petroleum distillates, isoparaffinic products, aromatic products, naphthenic paraffinic products, n-alkane products, oxygenated solvents, and miscellaneous as seen in Table 1.1. All classes of ignitable liquid, with the exception of gasoline, are further classified based on the distribution of normal alkanes present (1). For example, a petroleum distillate can be a light petroleum distillate, which by definition, contains normal alkanes ranging from four to nine carbons (denoted C4-C9). Similarly, a medium petroleum distillate contains normal alkanes ranging from eight to 13 carbons (C8-C13) and a heavy petroleum distillate contains normal alkanes ranging from eight to more than 20 carbons (C8-C20+). 1 Table 1.1: ASTM Classification of Ignitable liquids Class Composition Gasoline – all brands, including gasohol C3- and C4-alkylbenzenes and various aliphatic compounds Petroleum Distillates Homologous series of nalkanes; less significant isoparaffinic, cycloparaffinic, and aromatic components Petroleum Ether, cigarette lighter fluids, camping fuels charcoal starters, paint thinners, dry cleaning solvents Kerosene, diesel fuel, jet fuels, charcoal starters Isoparaffinic Products Branched chain (isoparaffinic); cyclic (naphthalenic) alkanes and n-alkanes insignificant or absent Aviation gas, specialty solvents Charcoal starters, paint thinners, copiers toners Commercial specialty solvents Aromatic compounds; aliphatic compounds absent or insignificant Paint and varnish removers, automotive parts cleaners, xylenes, toluene-based products Automotive parts cleaners, specialty cleaning solvents, insecticide vehicles, fuel additives Insecticide vehicles, industrial cleaning solvents Aromatic Products Light (C4-C9) Medium (C8-C13) Heavy (C8-C20+) Fresh gasoline is typically in the range C4-C12 Classification defined by ASTM International 2 Table 1.1 (cont’d) Class Composition Light (C4-C9) Medium (C8-C13) Heavy (C8-C20+) Naphthenic Paraffinic Products Branched chain (isoparaffinic) and cyclic (naphthalenic) alkanes; nalkanes insignificant or absent Cyclohexane-based solvents/products Charcoal starters, insecticide vehicles, lamp oils Insecticide vehicles, lamp oils, industrial cleaning solvents n-Alkane Products Only n-alkanes, typically containing 5 or less Solvents, pentane, hexane, heptane Candle oils, copier toners Candle oils, carbonless forms, copier toners Oxygenated Solvents Oxygenated products including alcohols, esters, ketones; major components include toluene or xylene Alcohol, ketones, lacquer thinners, fuel additives, surface preparation solvents Lacquer thinners, industrial solvents, metal cleaners/gloss removers Liquids that cannot otherwise be classified Single component products, blended products, enamel reducers Turpentine products, blended products, specialty products OthersMiscellaneous Classification defined by ASTM International 3 Blended products, specialty products 1.3 Current Methods in Fire Debris Analysis Fire debris evidence is commonly analyzed using a passive-headspace extraction procedure. A passive-headspace extraction involves sealing the fire debris evidence in a clean, unused metal paint can or a nylon bag. An activated carbon strip (ACS) is suspended over the sample within the paint can or nylon bag and the sample is heated in an oven at a temperature ranging from 50 °C to 80 °C for 2 to 24 hours, as recommended by ASTM International (2). Over the elapsed time, volatile compounds from the sample enter the headspace of the container or bag and adsorb onto the ACS. In order to extract any volatile compounds adsorbed by the ACS, the ACS is eluted with an organic solvent, such as dichloromethane (CH2Cl2). Subsequently, the extract is analyzed using gas chromatography-mass spectrometry (GC-MS). GC allows for the separation of compounds in a complex mixture and MS allows for the definitive identification of those compounds, making GC-MS a useful tool in fire debris analysis. Using GC-MS, total ion chromatograms (TICs), extracted ion chromatograms (EICs), and extracted ion profiles (EIPs) are generated and used to identify any ignitable liquid present in the fire debris extract. An EIC contains only ions of a specific mass-to-charge (m/z) ratio that may be of interest while an EIP is a profile of multiple ions that are considered to be characteristic of the compounds of interest (3). For example, ions with m/z ratios of 57, 71, 85, and 99 are indicative of the alkane class whereas, ions with m/z ratios of 91, 105, 119, and 133 are indicative of the aromatic class. EICs and EIPs are more specific than the TIC and can be used to exclude background ions and, in some cases, interference ions from the substrate can also be eliminated. 4 Currently, fire debris analysts visually compare the TIC and EICs or EIPs of an extract to an in-house reference collection of ignitable liquids. Major compounds indicative of specific ignitable liquid classes are identified in order to characterize any ignitable liquid present in the fire debris. However, the visual comparison of chromatograms from fire debris and those of reference standards is challenging due to interferences from the fire debris substrate, evaporation of the liquid that occurs during burning, and thermal degradation or pyrolysis of the substrate or ignitable liquid that may potentially occur during the fire (3). 1.4 Limitations in Current Methods of Fire Debris Analysis Interferences from the fire debris substrate make visual interpretation of TICs more difficult as hydrocarbons inherent to the substrate may resemble an ignitable liquid, such as a petroleum distillate. Ultimately, this can increase the risk of false positive identification of an ignitable liquid in the debris. During the fire, volatile compounds in the ignitable liquid are lost due to evaporation. This chemically changes the liquid so visual comparison of the chromatogram of the evaporated liquid to unevaporated reference standards is more challenging; however, the in-house reference collection can be expanded to include reference standards evaporated to different levels. As a result of thermal degradation or pyrolysis, compounds are broken down and may no longer be present in fire debris. In addition, the compounds are broken down into several new compounds that are introduced to the chromatogram. Each of these factors (i.e., substrate interferences, evaporation, and thermal degradation/pyrolysis) affects the appearance of the chromatogram of the fire debris that is compared to the chromatograms in the standard reference collection. These differences increase the risk for false positive and false negative identification, as well as misclassification of an ignitable liquid. The issue of 5 subjectivity arises due to the complexity of visual comparisons and stresses the need for a more objective method of comparison. To help overcome the challenges encountered in fire debris analysis, the National Center for Forensic Science (NCFS) maintains an ignitable liquid reference collection (ILRC) database (4). The ILRC database currently consists of 695 ignitable liquid chromatograms, most of which are unevaporated, although chromatograms of evaporated and biologically degraded liquids are also included. Each ignitable liquid within the database is classified according to ASTM class and has major peaks identified (4). In addition to the ILRC database, a substrate database is also maintained by NCFS (5). This database currently consists of approximately 60 different substrates with chromatograms of both unburned and burned substrates available. Similar to the ILRC database, major compounds and the dominant ion profile for each substrate are indicated. These databases are continuously growing to assist fire debris analyst with the current challenges faced during fire debris analysis. 1.5 Statistical Analysis of Fire Debris In 2009, the National Academy of Sciences (NAS) released a report entitled Strengthening Forensic Science in the United States: A Path Forward (6). This report emphasized the need for a more objective approach for the analysis of forensic evidence. The report indicated that a statistical assessment of forensic evidence would reduce false positive or negatives and would be more suitable for satisfying the Daubert standard (6). As a result of the NAS report (6), there has been increasing interest in statistical procedures for the analysis of forensic evidence. For the analysis of fire debris evidence specifically, multivariate statistical procedures, such as principal component analysis (PCA), 6 have been investigated for associating simulated fire debris to the corresponding ignitable liquid reference standard. PCA has been utilized to reduce the subjectivity of fire debris analysis and to increase confidence in visual comparisons of TICs, EICs, and EIPs. Successful association of simulated fire debris to an ignitable liquid or ignitable liquid class has been achieved using PCA (7, 8). PCA generates two main outputs: scores and loadings plots. The scores plot is a scatter plot that represents the association and discrimination of the samples based on positioning in the plot. Samples that are chemically similar will be positioned closely while those that are chemically different will be separated from one another on the scores plot. A loadings plot is also generated for each principal component (PC). These plots show the variables contributing most to the variance described by that PC. Loadings plots are used to explain the positioning of the samples on the scores plot. However, there are some limitations inherent to PCA for applications in fire debris analysis. First, association of simulated fire debris samples to ignitable liquid standards is based on visual interpretation of the PCA scores plot. However, it is common for similar ignitable liquids to be positioned close to one another on the scores plot. As a result, it is difficult to determine visually which ignitable liquid the fire debris samples is most closely associated. To reduce subjectivity of visual interpretation, additional metrics or statistical procedures such as Pearson product-moment correlation (PPMC) coefficients (7-9), Euclidean distances, and hierarchical cluster analysis (HCA) can be implemented (9). PPMC coefficients measure the linear correlation between two samples and can be used to determine the similarity between two chromatograms. Euclidean distance is the distance between two given data points in a 7 multidimensional space and can be used to measure the distance between the ignitable liquid standards and fire debris samples in the PCA scores plot. Hierarchical cluster analysis is a complementary procedure to PCA that can be used to determine similarity between a fire debris sample and ignitable liquid reference standard (8,9) The second limitation of PCA for applications in fire debris analysis is based on the data sets used for the analysis. Typically, PCA is performed on a data set containing commercially available ignitable liquid reference standards. However, there is a large number of commercially available ignitable liquids on the market (the ILRC database currently contains 695 ignitable liquids) and it is not practical to include all ignitable liquids in a given data set. Further, the chemical composition of ignitable liquids within the same ASTM class can also vary substantially, as shown in Figure 1.1. The commercially available paint thinner (Figure 1.1A) and the commercially available upholstery protector (Figure 1.1B) are both classified as isoparaffinic products containing branched alkanes and cyclic alkanes. Although both ignitable liquids contain the same classes of compounds, the two liquids still appear substantially different, as upholstery protector contains branched alkanes ranging from C5-C7 and paint thinner contains branched alkanes ranging from C7-C12. Given the number of ignitable liquids that are commercially available, as well as the chemical variation within an ASTM class, determining which commercial ignitable liquids to include within a data set for PCA is challenging. To correctly associate an ignitable liquid in a fire debris sample to the corresponding standard, the appropriate reference standard must be present in the data set. However, as the liquid present in the fire debris sample will not be known, the corresponding ignitable liquid reference standard may not be present in the data set. This problem could be overcome with the development of reference standards that are more 8 Normalized Abundance A C7-C12 Branched and cyclic alkanes 0 10 20 30 Retention Time (min) Normalized Abundance B 0 C5-C7 Branched and cyclic alkanes 10 20 Retention Time (min) Figure 1.1: Representative total ion chromatograms of A) commercial paint thinner and B) commercial upholstery protector to indicate differences within a given ASTM class 9 30 representative of each ASTM class, thus providing a more standardized approach for statistical analysis of fire debris evidence. 1.6 Literature Review 1.6.1 Limitations in Current Methods of Fire Debris Analysis Research has been conducted in order to potentially move away from the subjective visual comparison method toward a more objective comparison method. Lentini et al. analyzed common household materials using ASTM procedures for passive-headspace extraction and GCMS analysis (10). The goal of this research was to demonstrate how similar the chromatograms of household substrates can be to ignitable liquids. Common household items, such as colored newspaper, spandex shorts, and tennis shoes contain petroleum-based liquids that are used during the manufacturing of the item (10). Additional household items may also inherently contain petroleum-based liquids such as stain used on wood flooring. During the investigation of common household items, Lentini et al. also determined that ignitable liquids, such as toluene and a heavy petroleum distillate, used during manufacturing, were detectable for at least five years and up to 19 years after production in some materials (10). Often, fire debris analysts also assess the ratios of characteristic compounds present in the chromatogram of the fire debris sample and compare the ratios to those in the chromatogram of the reference standard. Lentini et al. demonstrated the importance of also considering these ratios (10). For example, even when toluene, xylenes, and C3-alkylbenzenes, that are characteristic of gasoline, were present in a household material, the ratios of these compounds did not resemble the ratios expected in gasoline, thereby reducing the risk of false positive identifications. However, when ignitable liquids undergo extensive evaporation, the expected compound ratios 10 begin to vary making identification more difficult (10). Knowing the extent to which many common household items may resemble an ignitable liquid allows a trained analyst to consider this when visually comparing the fire debris evidence to a standard reference collection. Nevertheless, the subjectivity of visual comparison remains an issue. The evaporation and thermal degradation/pyrolysis that occur due the high temperatures encountered during a fire also contribute to the difficulty in identifying any ignitable liquid. In order to account for evaporation, Keto and Wineman created a library of neat and evaporated liquid standards. The goal of this research was to develop a method for identifying ignitable liquids regardless of evaporation levels and substrate interferences using target compound chromatograms (TCCs). Ignitable liquids were evaporated to 4%, 20%, and 80% by volume (v/v) and analyzed by GC-MS (11). A library of TCCs representative of each ignitable liquid (i.e., gasoline and petroleum distillates) was created. The TCCs were generated by reforming the chromatogram to include only target compounds of interest based on corresponding m/z ratios. The generated TCCs were slightly different from EICs and EIPs as the m/z ratios included were more specific to the characteristic compounds in the ignitable liquids rather than m/z ratios of a general class of compounds. The selected target compounds were compounds still present (greater than 30% abundance relative to the base ion) after excessive evaporation and still identifiable after extensive burning. For example, target compounds selected for gasoline included trimethylbenzene (TMB), indane, and naphthalene, while target compounds selected for a medium petroleum distillate (MPD) included the normal alkanes, C9-C12 (11). To account for pyrolysis, items such as nylon carpeting, vinyl floor tile, and plywood were heated to high temperatures in a metal can and volatile compounds were collected using a charcoal adsorption tube (11). The extracts of the pyrolysis products were then spiked with a 11 0.1% (v/v) dilute ignitable liquid, and subsequently analyzed by GC-MS. The TICs and TCCs of the pyrolysis/ignitable liquid samples were visually compared to the neat and evaporated ignitable liquid standard library. Using the TICs alone, Keto and Wineman determined it was not apparent whether an ignitable liquid was present or if the compounds were a product of the substrate. However, using the TCCs, the presence of an ignitable liquid could be determined, as well as the ASTM class (11). For example, when looking at a TIC for fire debris containing gasoline it was not readily apparent that gasoline was present, but when the target compounds were extracted, 13 of the 15 target compounds were present including TMB, indane, and naphthalene. The use of specific target compounds helped to eliminate some potential for false negative identification when the effects of pyrolysis were present; however, in sample preparation, the ignitable liquids in the samples were not directly introduced to the heat and so did not undergo extensive evaporation (11). More importantly, the method reported involved visual comparisons of TCCs, which remains subjective and selecting only a series of compounds has the potential for the loss of discriminatory information. 1.6.2 Statistical Analysis of Fire Debris After the release of the NAS report, researchers have had an increasing interest in statistical procedures for the analysis of forensic evidence. Specifically, research on statistical procedures for the analysis of fire debris has increased as the subjectivity of visually comparing chromatograms suggests a need for a more objective approach. Sigman and Williams utilized covariance mapping to compare TICs of ignitable liquids and generated fire debris samples for the purpose of automated database searching (12). Fifteen 12 different ignitable liquid references standards from nine different ASTM classes and simulated fire debris were analyzed using three different GC-MS configurations. All three configurations used the same instrument conditions (i.e., temperature program) but different instruments and columns of varying length. The three configurations were used to develop an automated database searching method that could be used universally among laboratories (12). Covariance mapping was performed whereby, covariance matrices were calculated for each standard and fire debris sample based on mass spectral data and the Manhattan distance between two matrices was calculated. Pairwise comparisons between each of the ignitable liquids and between the ignitable liquid standards and fire debris samples were calculated (12). A distance of 0 indicated that the two matrices were similar, while a distance of 1 indicated dissimilarity. Using covariance mapping, Sigman and Williams were able to discriminate neat/lightly evaporated gasoline from heavily evaporated gasoline and were able to discriminate among the light, medium, and heavy petroleum distillates. Additionally, association of the simulated fire debris to the corresponding ignitable liquid was achieved (12). Covariance mapping could be beneficial for screening ignitable liquids that are similar to the fire debris sample; however, with the use of pairwise comparisons interpretation of the data would become much more time consuming. Turner and Goodpaster utilized PCA to investigate the microbial degradation of gasoline over time and in two different types of soil (13). Four Molotov cocktails were constructed by filling both wine and beer bottles to the neck with gasoline. The Molotov cocktails were then ignited and tossed into either lawn soil or potting soil during the months of July and January. Soil samples were collected and stored at room temperature then passive-headspace extracted, and analyzed by GC-MS at 0, 2, 7, 11, 22, 45, and 60 days. PCA was performed on summed EIPs 13 based on characteristic compounds (i.e., n-alkane and aromatic). From the PCA scores plot, both soil samples had similar levels of microbial degradation at day 0, but different levels of degradation starting at day 11. Using PCA, Turner and Goodpaster determined that levels of microbial degradation were dependent on the type of soil and the season; however, the effects of degradation on the association to the corresponding ignitable liquid were not investigated. Baerncopf et al. used statistical procedures to associate fire debris samples back to an ignitable liquid standard (7). One liquid from six different ASTM classes was analyzed and simulated fire debris was generated using nylon carpet spiked with 750 µL of an ignitable liquid used as a standard. Samples were extracted using a passive-headspace extraction and analyzed by GC-MS. Similarities between the simulated fire debris chromatograms and corresponding ignitable liquid chromatograms were assessed using PPMC coefficients (7). PPMC coefficients indicated strong correlation between the simulated fire debris samples and the corresponding ignitable liquid standard. For example, the PPMC coefficient for the comparison of fire debris spiked with torch fuel and the torch fuel standard was 0.9609 ± 0.0102. PPMC coefficients can be useful in determining similarities between samples; however, this method only allows for pairwise comparisons. The larger the data set, the more comparisons need to be made, resulting in more PPMC coefficients that have to be compared. As a result, this method can be time consuming. In addition, Baerncopf et al. also performed PCA on the TICs to investigate association of the fire debris back to the ignitable liquid standard (7). A majority of the simulated fire debris was successfully associated back to the ignitable liquid standard; however, due to the similarity 14 of some ignitable liquids, only association to class rather than specific liquid was possible. For example, due to the similar chemical composition of petroleum distillates, the simulated fire debris containing diesel could only be associated to the petroleum distillate class, but further association could not be made because of the close clustering of ignitable liquid reference standards within the petroleum distillate class. When this occurred, PPMC coefficients could be used to associate the fire debris samples and ignitable liquid standards. Therefore, successful association of the simulated fire debris and ignitable liquid standard was possible using both PPMC coefficients and PCA. Moving towards using statistical procedures such as PPMC coefficients and PCA eliminates the issue of subjectivity. However, due to the large spike volume (750 µL) used with the simulated fire debris in this study, few substrate interference compounds were observed and had little influence on the positioning of standards and samples on the scores plot. Using a smaller spike volume would increase the amount of substrate interference compounds and would be more representative of actual fire debris evidence. By generating scores and loadings plots for the liquid standards only and then projecting fire debris samples onto the scores plot, PCA can be used to account for substrate interferences within fire debris samples. In this way, the positioning of the fire debris samples on the scores plot is based solely on the compounds present in the standards used to generate the original scores plot. PCA therefore acts as a filter, essentially filtering out any contributions from the substrate. Prather et al. used PCA in this way to associate simulated fire debris samples to the corresponding ignitable liquid (8). Neat and evaporated gasoline and kerosene were used as the ignitable liquid standards. Simulated fire debris was generated by spiking 20 µL of the neat and evaporated ignitable liquid standards onto nylon carpet and burning. All extracts were obtained using a passive-headspace extraction and analyzed by GC-MS. 15 Performing PCA on the TICs, Prather et al. generated a scores plot and loadings plot using only the neat and evaporated ignitable liquid standards (8). The simulated fire debris was then projected onto the scores plot in order to filter out interferences from the substrate. The simulated fire debris samples associated to the ignitable liquid class, but the majority of the samples were not associated back to the corresponding evaporation levels. This could be due to the fact that the simulated fire debris was spiked with evaporated ignitable liquid and then burned which would result in further evaporation of the liquid. However, using such a small set of ignitable liquid standards limits the utility of this research. The two ignitable liquids used are substantially different from one another; therefore, it would be expected for appropriate association and differentiation to occur. As more ignitable liquids are introduced, successful association will become more challenging. Therefore, the selection of the data set for analysis affects the success of associating and discriminating samples properly. But, careful selection of a data set may be interpreted as manipulating the data and further highlights the need for a more standardized approach. Hierarchical cluster analysis (HCA) has been studied as an additional statistical procedure to investigate forensic evidence and associate unknown samples to samples of known origin (8). HCA accounts for all dimensions of the data and highlights patterns and similarities that may have not otherwise been obvious based on visual assessment alone. In addition, the similarity of the samples is only relative to the data set as a result, HCA will highlight similarities, but there will always be samples with no similarity in the data set. Therefore, with a limited data set, samples that appear visually similar may show little to no similarity after HCA, depending on the other members of the data set. 16 Goodpaster et al. used HCA in two different studies to associate electrical tape samples (13,14). In the first study, 67 different rolls of black electrical tape from 34 different brands were analyzed using scanning electron microscopy and energy dispersive spectroscopy (13). HCA was performed based on the elemental profile of the tape adhesive using Euclidean distance and Ward’s algorithm to determine clustering. Successful association according to tape grade (i.e., general, mid-range, and premium) was possible and in some cases, association to manufacturing year was possible based on prior manufacturing knowledge of tape. In a second study, Goodpaster et al. used HCA to associate 79 different rolls of electrical tape from 36 different brands to the corresponding brand (14). The electrical tapes were analyzed using attenuated total reflectance-Fourier transform infrared spectroscopy (ATR-FTIR) and HCA was applied to the resulting spectra using the previously specified metrics for clustering. Association of electrical tape with black adhesive according to brand was successful, but association of clear adhesive was only partially successful. However, similarity levels were not utilized during the study of electrical tape, but instead distances were reported. Distances are limiting as the distance can change based on the distance metric used. Mat-Desa et al. utilized HCA to compare a variety of different lighter fuels in an effort to associate the lighter fuels based on brand (16). Fifteen different lighter fuel refills from a total of five different brands were analyzed by GC-MS. HCA was performed on both raw and pretreated data using 51 characteristic peaks across the full data set. Pretreated data included normalized data, normalized and square root transformed data, and normalized and fourth root transformed data. The characteristic peaks were selected based on peaks with similar retention times and peaks that had a relative standard deviation of less than 5% based on triplicate analyses. Based on visual assessment of the characteristic peaks in the TICs, three brands appeared to be similar 17 and the remaining two brands appeared similar to one another, but different from the other brands. HCA was performed on all pretreated data sets using complete linkage; however, the distance metric used was not specified. Mat-Desa et al. determined that using the raw, normalized, and normalized with square root transformation data, the samples did not correctly associate by brand (16). However, using the normalized with fourth root transformation data, both neat and evaporated lighter fuels were correctly associated to the corresponding brands. Two brands that were visually different from one another based on the TICs appeared to be similar based on the given dendrogram. However, little similarity was implied between two brands with visually similar TICs. Although the research conducted by Mat Desa et al. accounted for varying levels of evaporation that may occur during the burning process, substrate interference compounds were not accounted for. Introducing simulated fire debris samples into the analysis might introduce the problems from both evaporation and interference compounds making the analysis even more challenging. In addition, only using 51 characteristic peaks instead of a full TIC removes variables that could otherwise influence the association and could result in the loss of discriminatory information. In a second study, Mat-Desa et al. utilized HCA to associate different brands of medium petroleum distillate (MPD) samples (17). Three different MPDs (white spirit, paint brush cleaner, and lamp oil) from varying brands were used to give a total of eight MPD samples. Samples were prepared and analyzed similar to the previous study. Characteristic peaks (85 in total) were selected using the previously mentioned criteria and the data were pretreated using normalization, sixteen square root transformation, and row 18 scaling; however, the need for this data pretreatment selection was not elucidated (17). HCA was performed using Euclidean distance and complete linkage. Successful association of six of the MPDs by brand regardless of evaporation level was possible while, for two of the samples, it was not possible to associate the 70, 90, and 95% evaporated samples. Poor association of these samples was likely due to extensive evaporation and volatility of compounds in the samples compared to the other samples used. Mat-Desa et al. did demonstrate some successful association of MPDs at different evaporation levels; however, MPDs have a relatively low volatility and do not change substantially through evaporation, as more volatile ignitable liquids such as gasoline do (17). Interference compounds were still not accounted for and characteristic peaks were used over full TICs, which could result in the loss of discriminatory information. Once again, similarity levels were not used during the statistical analysis of lighter fluids and medium petroleum distillates, but instead distances were used. As stated before, this can be limiting as the distance will change based on the distance metric used. Prather et al. also performed HCA to determine if evaporated ignitable liquids and simulated fire debris would associate to the corresponding neat liquid (8). Neat, 10 and 90% evaporated gasoline and neat, 10 and 70% evaporated kerosene were prepared. Simulated fire debris samples were prepared by spiking the neat and evaporated samples onto nylon carpet with carpet padding and burning for a predetermined burn time. All ignitable liquids and simulated fire debris samples were passive-headspace extracted and analyzed by GC-MS. HCA was performed using Euclidean distance and complete linkage first on the neat and evaporated ignitable liquids alone and then, on the simulated fire debris samples and ignitable 19 liquids (8). In both iterations of HCA, the replicates of each ignitable liquid grouped together first at similarity levels ranging from 0.80 to 0.99. The next clustering occurred between the neat and 10% gasoline (0.71) and a second cluster between the neat and 10% kerosene (0.84). The kerosene standards clustered together at a higher similarity than the gasoline, as expected due to the higher volatility of some of the compounds in gasoline. The 70% evaporated kerosene and 90% evaporated gasoline clustered last to the corresponding neat standards due to the differences in chemical composition as a result of evaporation. All evaporation levels of gasoline clustered to the neat gasoline standard at lower similarity levels than the evaporated kerosene samples to the neat kerosene standard. These lower similarity levels of evaporated gasoline are once again due to the volatility of the components within gasoline and due to greater evaporation of gasoline than kerosene. Lastly, all gasoline and kerosene samples clustered to one another with no similarity, as expected due to the distinctly different chemical composition of these liquids. HCA was performed a second time including the simulated fire debris samples to determine if association to the corresponding liquid was possible in the presence of substrate interference compounds and evaporation (8). Successful association of the simulated fire debris to the corresponding ignitable liquid was possible, but not to the correct evaporation level. Association to the proper evaporation level may have been limited by the process in which the fire debris was generated. By spiking previously evaporated ignitable liquids onto the carpet and then burning, the ignitable liquid evaporates further making the evaporation level unknown. Unfortunately, it is difficult to generate simulated fire debris of known evaporation levels due to the inability to control the burning process completely. As a result, it is difficult to generate consistent simulated fire debris samples. In addition, using a small data set containing two very different ignitable liquids makes association simpler. 20 While exploratory procedures including PCA and HCA have previously been used for the successful association of fire debris evidence (7,8). These exploratory procedures have some limitations, which have been previously highlighted. Another approach to statistical analysis includes classification procedures such as soft independent model classification analogy (SIMCA), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and k-Nearest neighbors (k-NN). Classification procedures can be used to classify samples of unknown origin to a defined class or sets of classes of known origin. SIMCA, LDA, and QDA have previously been used to classify simulated fire debris samples to ignitable liquid standards (18, 19), while k-NN has not yet been used for this application. Tan et al. utilized SIMCA for classifying ignitable liquids in fire debris samples (18). Fifty-one different ignitable liquids belonging to five different ASTM classes were analyzed using GC-MS. Simulated fire debris samples were generated using the ignitable liquids and either wood or carpet as the substrate. TICs of the ignitable liquids and simulated fire debris were divided into 19 sections. These 19 sections were then summed generating 19 variables that were used in subsequent data analysis. For model development in SIMCA, the training set contained the ignitable liquid standards and some of the simulated fire debris samples. Classification of the fire debris samples to the corresponding ignitable liquids was successful; however, including simulated fire debris samples within the training set requires a prior knowledge of the samples being tested. For example, if fire debris samples containing different substrates or ignitable liquids than those present in the training set were included, correct classification may not occur. In a forensic laboratory, it would not be possible to know what types of simulated fire debris samples to include in the training set and therefore including fire 21 debris samples in the training set would be limiting. Additionally, using only 19 variables has the potential for the loss of discriminatory information. Sigman et al. used LDA and QDA as a multistep hard classification procedure for the purpose of identifying the presence of ignitable liquids in fire debris and classifying any ignitable liquids present based on ASTM class (19). The ultimate goal of this research was to be able to use LDA and QDA to classify fire debris data collected from various laboratories. Sharing a standard database for statistical classification among laboratories can be challenging as the retention time of compounds in chromatographic data can shift based on the instrument and conditions applied. As a result, Sigman et al. generated total ion spectra (TIS), or an average mass spectrum across an entire chromatogram, to avoid complications with retention time alignment (19). Total ion spectra were generated using data from the ILRC and substrate database. Statistical models were generated using 460 ignitable liquids from the ILRC database and 88 substrates from the substrate database (19). In addition, TIS were combined using software to generate 4600 TIS samples containing an ignitable liquid and two substrates and 4400 TIS samples containing three different substrates. These models were used to cross-validate the multistep classification procedure and to classify simulated fire debris samples (19). Principal component analysis was used for feature extraction in which principal components (PCs) accounting for 50, 70, 90, and 95% of the variance were used (19). Using the number of PCs that accounted for 95% of the variance resulted in the most successful classification. Using LDA and QDA as a multistep classification procedure, a positive 22 classification rate of 70.9% was achieved for the fire debris samples, with a false positive rate of 8.9% (19). k-Nearest Neighbors is another multivariate statistical tool used for hard classification purposes. k-NN has been previously been utilized in forensic applications to classify unknown samples to defined classes; however, the procedure has not been previously utilized for the analysis of fire debris. Said et al. utilized k-NN on a variety of handwriting samples in order to try to classify them back to the original writer (20). A total of 40 different writers were used to write 25 documents each for a total of 1000 handwriting samples. Texture analysis was performed on the handwriting using two different texture recognition algorithms. These algorithms were used to identify the text features and the variables that were used for k-NN; however, the number of nearest neighbors used was not specified. Of the 25 documents from each writer, 15 were used as the training set and the remaining 10 were used as the test set. Using one of the texture recognition algorithms, successful classification ranged from 77 - 86% and using the second algorithm, successful classification ranged only from 66 - 74%. Kumar et al. also investigated k-NN during a nondestructive ink analysis technique to identify alterations made using 10 different blue ink ballpoint pens (21). Forty-five different combinations of two intersecting pen strokes were generated. These combinations were repeated again so that the intersections occurred in the opposite order and then all combinations were doubled generating 180 combinations. Two different imaging models were used to identify minor differences in color of the ink and a texture recognition algorithm was used to identify differences in texture among the ink. k-NN was performed using the data generated from these 23 algorithms using 1, 3, 5, 7, 9, and 11 nearest neighbors. The most accurate classification occurred using 5 nearest neighbors with an accuracy range of 80.00 - 97.56% and an average accuracy of 85.51%. Jiang et al. performed k-NN with other statistical procedures in an attempt to identify items according to type of drug or explosive that had been concealed using body packaging (22). An anthropomorphic phantom consisting of a head, chest, and abdomen was used to conceal a variety of drugs, drug precursors, and explosives (6 different samples in total). The samples were placed inside the stomach of the anthropomorphic phantom and analyzed 40 times using energy dispersive X-ray diffraction (EDXRD). k-NN was performed on the resulting spectra based on features extracted using positive matrix factorization (PMF), PCA, and robust PCA using only one nearest neighbor. When k-NN was performed following feature extraction using PMF, the highest classification rate of 99.5% was achieved. When k-NN was used following robust PCA and PCA for feature extraction, classification rates of 98.8% and 98.1%, respectively, were achieved. However, the use of one nearest neighbor could be problematic as a value of one is susceptible to outliers and could result in misclassification. k-NN has had some success for correct classification and shows some potential in forensic applications (20-22), but has not been widely used in fire debris analysis. In fire debris analysis, ignitable liquid reference standards could be used as defined classes and simulated fire debris as the samples to be classified. However, there are a large number of ignitable liquids on the market and chemical variability within each ASTM that make it challenging to determine which ignitable liquids to use in the data set. 24 1.7 Research Objectives For multivariate statistics to be practical and applicable in a forensic laboratory during fire debris analysis, the methods need to be rapid, simple, and standardized for high throughput and reproducibility. In order to obtain this outcome and remain objective, a revised approach needs to be considered. In this research, the impact of the data set composition on successful association of simulated fire debris samples to the corresponding reference standard was investigated, using a variety of statistical procedures. Further, reference standards characteristic of ASTM classes were developed and investigated as a more standardized approach for subsequent statistical analysis. PCA was used as the initial statistical method to compare the impact of data set selection based on chemical diversity and to demonstrate the utility of class reference standards. However, PCA can be limiting because visual interpretation of the scores plot remains subjective. As a result, Euclidean distances were utilized to quantitatively assess the association of the fire debris samples to the ignitable liquid standards. Although all dimensions of the data are accounted for using PCA, interpretation of the data can be limiting as only two or three dimensions can be compared simultaneously. Additional statistical procedures including HCA, which has previously been used for fire debris analysis, and k-NN that has not yet been used for this application, were investigated due to the limitations of PCA. Commercial ignitable liquid standards and class reference standards were compared using each of the statistical procedures and the advantages and disadvantages of HCA and k-NN were investigated. Ultimately, the comparison of a chemically diverse data set, a refined data set, and a class reference data set using several multivariate statistical procedures were performed in 25 order to determine a more standardized approach for reliable fire debris analysis. Developing a set of chemical class references standards useful for a statistical approach would make fire debris analysis more reliable, would help reduce the potential of false positive and negatives, would aid in convincing a jury, and would satisfy the Daubert standard. 26 REFERENCES 27 REFERENCES 1. ASTM International, ASTM E 1618-06e1. Annual Book of ASTM Standards 14.02. 2. ASTM International, ASTM E 1412-07. Annual Book of ASTM Standards 14.02. 3. Baerncopf JM, McGuffin VL, Smith RW. Effect of Gas Chromatography Temperature Program on the Association and Discrimination of Diesel Samples. Journal of Forensic Sciences 2010; 55: 185-192. 4. National Center for Forensic Science. Ignitable liquids reference collection. Available at: http://ilrc.ucf.edu (Accessed on 3 July 2014). 5. National Center for Forensic Science. Substrate Database. Available at: http://ilrc.ucf.edu (Accessed on 3 July 2014). 6. Committee on Identifying the Needs of the Forensic Sciences Community, National Research Council. Strengthening Forensic Science in the United States: A Path Forward. Washington, D.C.: National Academies Press, 2009. 7. Baerncopf JM, McGuffin VL, Smith RW. Association of ignitable liquid residues to neat ignitable liquids in the presence of matrix interferences using chemometric procedures. Journal of Forensic Sciences 2011; 56: 70-81. 8. Prather KR, McGuffin VL, Smith RW. Effect of evaporation and matrix interferences on the association of simulated ignitable liquid residues to the corresponding liquid standard. Forensic Sciences International 2012; 222: 242-251. 9. Willard MAB, McGuffin VL, Smith RW. Forensic analysis of Salvia divinorum using multivariate statistical procedures. Part I: discrimination from related Salvia species. Analytical and Bioanalytical Chemistry 2012; 402: 833-842. 10. Lentini JJ, Dolan JA, Cherry C. The Petroleum-Laced Background. Journal of Forensic Sciences 2000; 45(5):968-989. 11. Keto RO, Wineman PL. Detection of Petroleum-Based Accelerants in Fire Debris by Target Compound Gas Chromatograph/Mass Spectrometry. Analytical Chemistry 1991; 63: 1964-1971. 12. Sigman ME, Williams MR. Covariance Mapping in the Analysis of Ignitable Liquids by Gas Chromatography/Mass Spectrometry. Analytical Chemistry 2006; 78: 1713-1718. 28 13. Turner DA, Goodpaster JV. The effects of season and soil type on microbial degradation of gasoline residues from incendiary devices. Analytical and Bioanalytical Chemistry 2013; 405: 1593-1599. 14. Goodpaster JV, Sturdevant AB, Andrews KL, Brun-Conti L. Identification and Comparison of Electrical Tapes Using Instrumental and Statistical Techniques: I. Microscopic Surface Texture and Elemental Composition. Journal of Forensic Science 2007; 52: 610-629. 15. Goodpaster JV, Sturdevant AB, Andrews KL, Briley EM, Brun-Conti L. Identification and Comparison of Electrical Tapes Using Instrumental and Statistical Techniques: II. Organic Composition of the Tape Backing and Adhesive. Journal of Forensic Science 2009; 54: 328-338. 16. Mat-Desa WNS, NicDaeid N, Ismail D, Savage K. Application of Unsupervised Chemometric Analysis and Self-organizing Feature Map (SOFM) for the Classification of Lighter Fuels. Analytical Chemistry 2010; 82: 6395-6400. 17. Mat-Desa WNS, Ismail D, NicDaeid N. Classification and Source Determination of Medium Petroleum Distillates by Chemometric and Artificial Neural Networks: A Self Organizing Feature Approach. Analytical Chemistry 2011; 83: 7745-4454. 18. Tan B, Hardy JK, Snavely RE. Accelerant classification by gas chromatography/mass spectrometry and multivariate pattern recognition. Analytica Chimica Acta 2000; 422: 37-46. 19. Waddell EE, Song ET, Rinke CN, Williams MR, Sigman ME. Progress Toward the Determination of Correct Classification Rates in Fire Debris Analysis. Journal of Forensic Sciences 2013; 58: 887-896. 20. Said HES, Tan TN, Baker KD. Personal identification based on handwriting. Pattern Recognition 2000; 33: 149-160. 21. Kumar R, Pal NR, Chanda B, Sharma JD. Forensic Detection of Fraudulent Alteration in Ball-Point Pen Strokes. IEEE Transactions on Information Forensics and Security 2012; 7: 809-820. 22. Jiang Y, Liu P. Feature extraction for identification of drug and explosive concealed by body packaging based on positive matrix factorization. Measurement 2014; 47: 193-199. 29 2. Theory 2.1. Gas Chromatography-Mass Spectrometry Chromatography is a technique used to separate a chemical mixture by means of two individual phases known as the stationary phase and the mobile phase. Frequently gas chromatography is coupled with a mass spectrometer (GC-MS) to provide additional data for a more conclusive means of identification. For this reason, GC-MS is frequently used in forensic science laboratories and more specifically, it is commonly used in fire debris analysis. Samples suitable for GC-MS analysis are gases and volatile liquids and must be thermally stable within the operating temperature of the instrument. For analysis of fire debris evidence, the sample must first be extracted; commonly a passive-headspace extraction with an activated carbon strip (ACS) is used. The ACS accumulates the volatile compounds and is then eluted with a volatile organic solvent usually dichloromethane, methanol, or carbon disulfide. In GC-MS analysis, a sample is introduced to a heated inlet, carried through a column within an oven, and then exits to a detector. A basic diagram for a gas chromatograph (GC) is illustrated in Figure 2.1. Once the sample is prepared for analysis the sample is introduced to the inlet using a syringe. The inlet is heated (typically 250-280 °C) so liquids are instantly volatilized upon introduction. A nominal flow rate of an inert gas, commonly helium, is set to carry the volatilized sample into the column. A split or splitless injection can be selected based on the type of sample. Using a splitless injection, the entire sample injected is introduced onto the column. Splitless injections are commonly used when the analytes are present in the sample at low 30 concentration. For a split injection, only a portion of the sample is introduced onto the column and the remainder is transferred to waste. A split ratio, commonly 50:1 or 100:1, is Syringe Injection Port Detector Column Oven Gas Cylinder Figure 2.1: Diagram of a gas chromatograph 31 selected to determine what portion of the sample is introduced to the column and what portion is transferred to waste. Split injections are commonly used when a sample is highly concentrated to prevent contamination of the GC column. As previously mentioned, chromatographic separation of a sample mixture utilizes two phases. These two phases are known as the stationary phase and the mobile phase and their nature varies based on the type of chromatography. In GC, the stationary phase is coated inside a column comprised of fused silica. Modern GC utilizes capillary columns that are narrow in diameter. The stationary phase is then coated to the inner walls of the capillary column allowing the mobile phase to easily flow through the center of the column with little obstruction. The mobile phase, commonly known as the carrier gas, is the previously mentioned helium and its purpose is to carry the sample through the column. The chemical composition of the stationary phase varies based on the type of separation desired. For example, a 100% dimethyl polysiloxane stationary phase is strictly nonpolar and is beneficial for the separation of nonpolar compounds such as a range of hydrocarbons. Other stationary phases such as a 5% diphenyl, 95% dimethyl polysiloxane stationary phase contain phenyl groups. This type of stationary phase is still relatively nonpolar, but can be beneficial for separating hydrocarbon mixtures that also contain aromatic compounds. Additionally, a polar stationary phase such as a 50% cyanopropyl, 50% phenylmethyl polysiloxane can be used to separate a mixture of polar compounds. As the sample mixture travels through the column, separation of different analytes within the sample occurs through partitioning to the stationary phase based on affinity and due to differences in boiling points of the analytes. As these analytes come off the column, they are 32 detected by a detector and reported as a series of peaks in a chromatogram based on the amount of time spent in the column, known as the retention time. If the analyte has a high affinity for the stationary phase it will interact with the stationary phase longer, resulting in a longer retention time, while those with little to no affinity for the stationary phase will continue through the column via the flow of helium resulting in a shorter retention time. Similarly, if the analyte has a low boiling point it will elute from the column faster than an analyte with a higher boiling point. Analytes with low boiling points will result in shorter retention times. Alternatively, if the analyte has a low boiling point but a high affinity for the stationary phase the analyte will move through the column quickly, but will also interact with the stationary phase and, as a result, a slight increase in retention time will occur. The column used for separation is housed inside a temperature-controlled oven. The temperature of the oven can be selected based on the analytes of interest. An isothermal temperature program can be used, where the temperature remains constant during the entire analysis. Isothermal temperature programs can be useful when the boiling point range of the analytes in a sample is known and is within a small range. However, if the sample contains analytes with a large range of boiling points, temperature programming the oven may be necessary. Using a temperature program, an initial oven temperature can be set followed by an increased in temperature at a specified rate. An initial oven temperature (40-50 °C) is selected based on the type of solvent the samples are prepared. An ideal initial oven temperature is typically 10-20 °C lower than the boiling point of the solvent. As a result, the solvent will condense and focus at the head of the column. As temperature increases, the focused solvent volatilizes and begins to move through the column by having a focused starting point the resolution of separation is increased, 33 Additionally, the initial over temperature is typically held for several minutes (1-3 minutes) in order to achieve better resolution and avoid detecting the solvent in which the sample was prepared. A ramp rate (5-10 °C/min) can be programmed so analytes with a broader range of boiling points can be analyzed simultaneously. A slow ramp rate can be beneficial because it allows for better resolution of more analytes, but also results in a longer analysis time, broader peaks, and poorer resolution as the longer the analytes spend in the column, the greater the diffusion and mass transfer effects. In order to increase analysis time, a faster ramp rate can be used, but will result in poorer resolution of analytes as there is less time for interactions with the stationary phase to occur. Temperature programming is beneficial for separating more compounds simultaneously; however, a compromise between good resolution and fast analysis time must be made. Following the separation of analytes mass analysis and detection of the individual analytes is conducted using a detector. A variety of detectors can be used based on the type of analysis being performed; however, a mass spectrometer is a common detector used in the bench top instruments in forensic laboratories. Mass spectrometers are especially useful in forensic science as they can be utilized for definitive identification of unknown samples. The column containing the separated analytes enters into the mass spectrometer via the transfer line, which is held at a high temperature (280-300 °C) to ensure all analytes remain in the gaseous phase. The column is held within the transfer line and the tip of the column ends at the ion source of the mass spectrometer, delivering the separated analytes directly into the ion source. The ion source ionizes the analytes as they enter the mass spectrometer; the ions are then separated using a mass analyzer and the separated ions are then detected. 34 The analysis must be carried out under vacuum conditions to prevent ions from colliding with one another before detection and to pump away any additional molecules that were not ionized. Upon introduction to the ion source, the analytes undergo ionization and, while there are numerous ionization methods available, electron ionization is the most common for GC-MS analysis. During electron ionization, a filament is heated generating electrons. The electrons produced are then accelerated to a high energy (typically 70eV) generating a beam of highenergy electrons. These high-energy electrons interact with the analytes causing a loss of an electron from the analyte resulting in a positively charged ion. Electron ionization is known as a hard ionization method because fragmentation commonly occurs during the ionization process. Typically, bonds within organic molecules are substantially less than 70 eV and, as a result, there is sufficient excess energy for fragmentation to occur. Fragmentation can be beneficial for definitive identification of a molecule as it allows structural information to be obtained based on fragmentation patterns unique to that specific compound under those ionization conditions. Commonly, compounds are identified based on molecular mass and unique fragmentation patterns; however, excessive fragmentation could result in the loss of the molecular ion making identification more difficult. Once the ions are produced, they are directed into a mass analyzer. In GC-MS, the most common type of mass analyzer is a single quadrupole mass analyzer. The ions are directed into the quadrupole by the presence of a repeller plate (positively charged) and an ion focusing plate (negatively charged) in the ion source. The positively charged ions are focused towards the negatively charged plate and into the mass analyzer to be separated based on individual mass-tocharge (m/z) ratios. 35 A quadrupole mass analyzer consists of two sets of parallel rods positioned to form a square orientation, as shown in Figure 2.2. A direct current (DC) potential is applied to two rods opposite each other and a radio frequency (RF) potential is applied to the remaining adjacent rods. The DC and RF potentials are applied so that they are 180° out of phase with the other. Therefore, the two rods opposite each other always have the same charge and the two adjacent rods always have the equal but opposing charge. These charges alternate as a function of time causing ions that enter the quadrupole to have a wave-like trajectory. At a given DC-RF ratio, only a narrow range of m/z ratio ions has a stable trajectory and all other m/z ratio ions are unstable. The unstable ions hit the quadrupole rods, neutralize, and are pumped away by the vacuum system. Those ions with a stable trajectory will pass through the quadrupole and reach the detector. In order to analyze more than one m/z ratio, the DC and RF can be scanned over time, but the DC-RF ratio always remains constant. As the ions with stable trajectories exit the quadrupole mass analyzer, they are detected using an electron multiplier, a common detector used in GC-MS analysis. Ions enter the hornshaped detector and hit the walls of the multiplier tube, which consist of glass doped with lead allowing the tube to be slightly conductive. A voltage (1.8-2 kV) is applied across the multiplier tube generating a voltage gradient (1). When the ions strike the surface, secondary electrons are generated and move towards the higher potentials further into the tube. As the electrons move further into the detector, they continue to strike the walls of the multiplier tube generating a cascade of electrons. As a result a signal amplification of approximately 105 to 108 occurs. The amplified signal is then digitized using an analog-to-digital converter and the data are processed. 36 To Detector Quadrupole rods Ions - Stable Ion (Detected) Unstable Ion (Not Detected) Figure 2.2: Diagram of a quadrupole mass analyzer 37 The data generated from GC-MS analysis are a total ion chromatogram (TIC) of the sample mixture and a mass spectrum for each separated analyte in the sample mixture. The TIC results in peaks at a range of retention times. As previously mentioned the analytes that elute from the column first (e.g., analytes with low boiling points and low affinity for the stationary phase) are detected first, and therefore have a shorter retention time. Analytes with high boiling points and high affinity for the stationary phase will interact with the column longer and a longer retention time will result. In forensic laboratories, retention time is used to help identify unknowns by comparing the retention time of the unknown to the retention times of reference standards. As retention time will vary from instrument to instrument due to differences such as column integrity, column length, and oven temperature programs, it is important that reference standards are analyzed on the same instrument and under the same conditions. However, retention times are not unique to specific analytes and this information by itself cannot be used as for definitive identification of an unknown. With GC-MS analysis, each peak in the chromatogram has its own mass spectrum that contains peaks based on the m/z ratios present. In addition to retention times, molecular mass information and the fragmentation patterns of an analyte can be used to aid in identification. If the molecular ion is present, it is used to determine the molecular mass of the analyte and the fragmentation pattern is used to determine the structure of the analyte. The mass spectrum of an analyte is also compared to mass spectra of reference standards and, as the fragmentation pattern of the analyte is unique under specific instrument conditions, definitive identification is possible. Using retention time alone, definitive identification is not possible; however, using both retention time from the chromatographic data and m/z ratios from the mass spectral data 38 definitive identification in forensic analysis is possible. Additionally, these data can be used to apply multivariate statistical procedures for further analysis. 2.2. Data Pretreatment Data pretreatment procedures are applied to chromatographic data in attempts to reduce instrumental variation between samples while preserving chemical differences. Background subtraction is used to eliminate compounds not characteristic of a given sample, but instead are introduced during the analysis process. For example, caprolactam a compound originating from the nylon bags used during the passive-headspace extraction process is eliminated as this compound does not originate from the simulated fire debris samples. Smoothing is used to help increase the signal-to-noise (S/N) ratio caused from instrumental noise by reducing background noise and preserving and amplifying the desired signal. As a result, peaks that do not attribute to the characteristics of the sample are eliminated and the remaining peaks are smoother with less noise and an amplified signal. Retention time alignment is applied to TICs as retention times of the same analyte in different chromatograms may drift when analyzed over a long period. Over time, the retention time at which a peak elutes may vary due to minor differences such as GC temperatures, differences in the chemical composition of the stationary phase because of aging, and differences in the flow rate of the carrier gas. As a result, differences in retention time could lead to improper identification of an unknown. Although many alignment algorithms are available, the correlation optimized warping (COW) algorithm was used in this research. The COW algorithm was used to align the simulated fire debris TICs to the target TICs of each set of standards (commercial and class). Ideal target TICs contain a majority of the peaks within the samples being aligned. For this research, an average TIC was used for alignment. The average TIC was generated using one 39 sample TIC replicate (randomly selected) from each ignitable liquid standard. The abundance of all selected TICs was averaged at each retention time. To perform a COW alignment, two parameters are selected for alignment, the segment size and the slack, or warp. Selecting a segment size determines how many segments the TIC will be divided. The warp is the number of data points that can be added or subtracted from each segment of the TIC. The algorithm starts as the end of the chromatogram and interpolation is used to stretch or compress the segment so that the peaks within the segment of the sample chromatogram align with the peaks in the same segment of the target chromatogram. Pearson product-moment correlation (PPMC) coefficients are then used to calculate the correlation between the sample and target segments. For example, if a warp of two is selected, two data points can be added or subtracted from the segment; however, it is also possible that one or zero data points can be added or subtracted from the segment. PPMC coefficients are then calculated between the sample and target segments using each of the possibilities of a given warp setting. The optimal warp and segment size are then determined based on which combination gives the highest PPMC coefficient indicating the highest similarity. The algorithm then moves on to the next segment and the process is repeated. Once the TICs are aligned, they are typically normalized to reduce variation in peak abundance caused by minor differences in volume of sample injected during analysis. There a variety of different methods for normalization of chromatographic data; however, in this research constant-sum normalization was utilized. To perform constant-sum normalization, the abundance of each variable (retention time point) within a TIC is summed to obtain the total abundance or total area of the TIC. Each individual variable is then divided by the sum of the total area of the 40 corresponding TIC. As a result, replicates of the same sample have peak abundances more similar to one another as expected from replicate samples. 2.3. Data Analysis 2.3.1. Principal Component Analysis Principal component analysis (PCA) is an exploratory multivariate statistical procedure commonly used to associate and discriminate samples within a data set. In this research, PCA is used to associate simulated fire debris samples back to the corresponding ignitable liquid. PCA reduces the dimensionality of the data set, which is especially useful in data sets that contain a large number of variables, as is the case with chromatographic data. Reducing the dimensionality of the data is beneficial as it allows for the identification of patterns within the data that may not have originally been apparent due to the large number of variables. To perform PCA using the pretreated TICs, the covariance matrix of the data is first calculated where the size of the matrix is based on the number of dimensions in the data set. During the process of calculating the covariance, the data are also mean centered. The covariance is the measure of variance between two dimensions; when multiple dimensions are present a pairwise comparison of each variable is calculated in the form of a covariance matrix. This allows for the measure of variance between multiple dimensions. Eigenanalysis of the covariance matrix is performed to calculate eigenvectors and eigenvalues for the data set. Eigenvectors are unit vectors that produce a multiple of the vector that results from the product of the original vector and calculated covariance. The maximum number of eigenvectors that can be calculated is equivalent to the maximum number of dimensions in a given data set. Eigenvalues represent the variance that a particular eigenvector describes and are the values by which the eigenvector was originally multiplied. 41 In order to calculate the eigenvectors and eigenvalues the covariance matrix must be square. Based on eigenvectors and eigenvalues, principal components (PC) are derived, where the eigenvector with the largest eigenvalue describes the most variance known as the first principal component (PC1). The second principal component (PC2) describes the next greatest variance and is positioned orthogonally to PC1, and so on. PCA generates two main outputs: loadings and scores plots. For chromatographic data, the loadings plot can be generated by plotting the eigenvector for a given PC versus retention time. The loadings plot describes the variables (compounds) contributing to the variance described by the PC and the retention times can be used to identify the variables. The scores plot is a scatter plot that represents the association and discrimination of the samples. The score for a sample on a given PC is the sum of the product of the mean-centered data for the sample and the relevant eigenvector. Samples that are chemically similar will be positioned closely on the scores plot and those that are chemically different will be separated from one another on the scores plot. Further, the loadings plots can be used to explain the positioning of the samples on the scores plot 2.3.2. Euclidean Distance Euclidean distance is the distance between two given data points in a multidimensional space. In this research, Euclidean distance was utilized to measure the distance between the scores of sample pairs based on multiple PCs. Euclidean distance (d) is calculated using Equation 2.1 √∑( ̅ 42 ̅) where, ̅ represents the average score of sample x on a given dimension i, ̅̅̅ represents the average score of sample y on a given dimension i, and n represents the total number of dimensions. In this research, x represents the average score of the simulated fire debris and y represents the average score of an ignitable liquid standard. Subscripts indicate how many PCs are being used; additional PCs can be accounted for as desired. In this research, the number of PCs used was based on the number of PCs that accounted for at least 95% of the variance in the data set. 2.3.3. Hierarchical Cluster Analysis Hierarchical cluster analysis (HCA) is an exploratory multivariate statistical procedure that generates a hierarchy of clusters based on the similarity of samples. HCA can be performed on a complex data set in order to observe patterns of similarity within the data. In this research, agglomerative HCA was used to cluster samples. Using this type of clustering, each sample starts as its own individual cluster. The individual clusters are then grouped to the sample it is most similar to resulting in a cluster of two samples. Clustering of the samples continues until all samples are grouped into one cluster. To cluster samples, a distance metric is used to measure the distance between each individual sample. In this research, Euclidean distance was used. In the first iteration of HCA, Euclidean distances are calculated between all individual samples in multidimensional space, resulting in a distance matrix. The two samples with the shortest distance, which indicates the greatest similarity, are clustered together. In the next iteration, Euclidean distances are again calculated but now, the distance between groups containing more than one sample must be calculated. 43 The method by which this distance is calculated varies depending on the linkage method used. In this research, the single linkage method was used in which the Euclidean distance is calculated between the two nearest neighbors in the two groups being considered. This is illustrated in Figure 2.3 which depicts a sample (red square) that can be clustered to class A (blue circles) or class B (green circles). Euclidean distances (dA and dB) are calculated between the sample and each class in multidimensional space. The shortest distance or nearest neighbor to the sample is the class in which the sample will be clustered; therefore, the sample is clustered to class A. The process repeats until all samples are members of a single cluster. A dendrogram is the resulting output generated which includes a similarity level or percent similarity. The similarity level is calculated by dividing the Euclidean distance of a given sample by the maximum Euclidean distance in the data set and subtracting from one. The greater the similarity level, the more similar the samples are. 2.3.4. k-Nearest Neighbors k-Nearest neighbors (k-NN) is an example of a hard classification method. Using this method, samples are placed into a defined class based on the measured similarity to that class. Hard classification methods place a sample into one class and one class only. As a result, classification is forced so even if there are no classes representative of the sample, the sample will still be classified to a class. In k-NN, defined classes consist of samples of known origin that form the training set. In this research, a set of standards are used for the training set and are placed into defined classes. A sample is then projected into multidimensional space and Euclidean distance is used to determine 44 Class A dA dB Class B Samples to be clustered Figure 2.3: Diagram depicting Euclidean distance and the single-linkage process used during HCA clustering 45 the distance between the sample and each of the individual defined classes in multidimensional space. A maximum of nearest neighbors (k) is selected to determine which class the sample will be placed. In k-NN, k is a user-defined value. Typically, an odd number of nearest neighbors is selected to avoid classification ties. k-NN can be performed using a range of k values and the total number of misclassifications that occurs at each k value can be determined to assist with selecting an optimal value for k. The class with the majority of standards closest to the sample (based on the number of nearest neighbors selected) will be the class in which the sample is placed. Classification may differ based on the number of nearest neighbors selected. For example, Figure 2.4 depicts a sample to be classified (red square) to class A (blue circles) or class B (green circles). If three nearest neighbors (k=3) are selected, illustrated by the inner circle, the sample would be classified into class A. If five nearest neighbors (k=5) are selected, illustrated by the outer circle, the sample would be classified into class B. Although classification of a sample is forced, the classification fit can be analyzed to determine how similar the sample is to that class. Class fit can be calculated for each sample (class or projected) by subtracting the smallest distance in each class from an individual distance and dividing by the standard deviation. Each sample has its own calculated class fit that can be compared to a given threshold. If the sample falls within the given threshold, it is considered a good fit. This can be used to determine if the standards in a given class contain any outliers or if the projected samples fit well under the given classification. 46 Class A k=3 Sample to be classified Class B k=5 Figure 2.4: Diagram depicting k-NN classification based on the number nearest neighbors selected 47 REFERENCES 48 REFERENCES 1. Skoog DA, Holler FJ, Crouch SR, Principals of instrumental analysis. 6th edition. Belmon, CA: Thompson, 2007. 49 3. Materials and Methods 3.1. Commercial Ignitable Liquid Standards The commercial ignitable liquids used were available in the laboratory and were obtained from gas stations and local stores. These standards were comprised of ignitable liquids from the gasoline, isoparaffinic, and petroleum distillate ASTM classes. Ignitable liquids from the gasoline class included three different gasoline samples collected from the East Lansing area (Meijer, British Petroleum, and Marathon) during late 2009 and early 2010. The isoparaffinic products were comprised of odorless paint thinner (Sunnyside Corp., Wheeling, IL) and upholstery protector (Scotch Gard™, 3M Protective Materials and Consumer Health Care Division, St. Paul, MN) and the petroleum distillate class included diesel (Mobil), kerosene (Meijer), fuel injector (STP® Products Co., Oakland CA), charcoal lighter (ACE® Hardware Corp., Oak Brook, IL), and torch fuel (Tiki®, Menomonee Fall, WI). The neat commercial ignitable liquids were diluted 1:10 (v/v) in dichloromethane (Honeywell International Inc., Morristown, NJ) for a passive-headspace extraction followed by GC-MS analysis. 3.2. Class Reference Standards The class reference standards included a gasoline standard, a medium petroleum distillate standard, and a heavy petroleum distillate standard. Each standard was comprised of compounds characteristic of each respective chemical class and all compounds for the standards were available in the laboratory. The gasoline standard included toluene (MCB Manufacturing Chemists, Inc., Cincinnati, OH), ethylbenzene, m-xylene, o-xylene, propylbenzene, and 1, 2, 4trimethylbenzene (all from Aldrich Chemical Company, Inc., Milwaukee, WI). The medium petroleum distillate standard included the normal alkanes octane, dodecane, tridecane (all from Aldrich Chemical Company, Inc., Milwaukee, WI), nonane, decane (Alfa Aesar, Ward Hill, 50 MA), and undecane (ACROS, NJ). The heavy petroleum distillate standard included the normal alkanes octane, dodecane, tridecane, tetradecane (all from Aldrich Chemical Company, Inc., Milwaukee, WI), nonane, decane, pentadecane, hexadecane, heptadecane, nonadecane, eicosane (all from Alfa Aesar, Ward Hill, MA), undecane (ACROS, NJ), and octadecane (Sigma Chemical Co., St. Louis, MO). The class reference standards were prepared in dichloromethane with a final volume of 15 mL. Compound volumes for each standard were aliquoted so that the ratios and abundances of each compound were similar to those present in each respective commercial ignitable liquid standard (Table 3.1 and Table 3.2). Specifically, for the gasoline standard, a 1:3:2 ratio for the C2-alkylbenzenes and a 1:3 ratio for the C3-alkylbenzenes, propylbenzene and 1,2,4trimethylbenzene, was desired. A distribution of normal alkanes similar to the distribution in petroleum distillates was desired for the petroleum distillate class reference standards. Once prepared, the class reference standards were passive-headspace extracted in triplicate and analyzed by GC-MS in triplicate using the same procedures as used for the commercial ignitable liquid standards. 3.3. Preparation of Simulated Fire Debris Simulated fire debris samples were prepared for two substrates: red oak flooring treated with a golden oak finish (WATCO™ Danish oil, Rust-oleum® Corporation, Vernon Hills, IL) and nylon carpet with carpet padding (source unknown). Prior to preparing simulated fire debris samples, it was necessary to determine the appropriate time to burn each substrate for, as well as the volume of ignitable liquid to spike onto each sample. 51 Table 3.1: Composition of the gasoline class reference standard prepared in 15 mL of dichloromethane Compound Toluene Ethylbenzene m-xylene o-xylene Propylbenzene 1, 2, 4- trimethylbenzene Volume (µL) 25 10 40 35 15 60 Table 3.2: Composition of the medium and heavy petroleum distillate class reference standards prepared in 15 mL of dichloromethane Compound Octane Nonane Decane Undecane Dodecane Tridecane Tetradecane Pentadecane Hexadecane Heptadecane Octadecane Nonadecane Eicosane Volume/Mass Medium Petroleum Distillate Heavy Petroleum Distillate 10 µL 3.3 µL 15 µL 5 µL 20 µL 6.6 µL 20 µL 6.6 µL 15 µL 8.3 µL 10 µL 10 µL N/A 11.6 µL N/A 10 µL N/A 8.3 µL N/A 6.6 µL N/A 0.0040 g N/A 0.0025 g N/A 0.0018 g 52 3.3.1. Burn Study The burn study was carried out by burning treated wood samples using a propane torch for 30 and 60 seconds with only direct flame to char the surface. Nylon carpet with carpet padding samples were burned for 30 seconds with direct flame and allowed to burn for an additional 90 seconds before being extinguished using an overturned beaker. Each sample was then passive-headspace extracted in triplicate and analyzed by GC-MS in triplicate. An appropriate burn time for the wood samples was selected based on the presence and abundance of interference compounds from both the wood and the treatment. A high abundance of interference compounds was desired so that the simulated fire debris would closely resemble typical fire debris evidence; as a result, a 30-second burn time for the wood samples was selected. The burn time selected for the carpet samples was based on a previous burn study and was analyzed to confirm the burn time was sufficient. Characteristic compounds from the carpet were present and the burn time was used for the remainder of the project. 3.3.2. Spike Volume Study The spike volume study was carried out by spiking a range of volumes of each of the four commercial ignitable liquids (gasoline, paint thinner, diesel, and torch fuel) on the two substrates. All ignitable liquids were diluted 1:10 (v/v) in dichloromethane prior to spiking onto each substrate. For gasoline (Meijer), 100 and 125 µL were spiked onto the carpet samples and 100, 125, 150, 200, 225, and 250 µL were spiked onto the treated wood samples. For paint thinner, 125 µL was spiked onto the carpet samples and 75, 125, and 150 µL were spiked onto the treated wood samples. For diesel, 125 and 175 µL were spiked onto the carpet samples and 50 and 75 µL were spiked onto the treated wood samples. For torch fuel, 50, 75, 100 and 115 µL were spiked onto 53 the carpet samples and 50, 75, 100, and 125 µL onto the treated wood samples. Each sample was then passive-headspace extracted in triplicate and analyzed by GC-MS in triplicate. Appropriate spike volumes were selected based on the abundance of the interference compounds relative to the ignitable liquid compounds. Compounds of the commercial ignitable liquid were present but at a lower abundance compared to the interference compounds. 3.3.3. Simulated Fire Debris Samples The final ignitable liquid spike volumes used for each substrate are listed in Table 3.3. Fire debris samples were prepared by spiking the substrate (4 × 4 cm2) using the appropriate spike volume then burning for the appropriate time and placing each into separate nylon bags for passive-headspace extraction followed by GC-MS analysis. 3.4. Passive-Headspace Extraction For passive-headspace extraction of all sets of standards, a 20 µL aliquot was spiked onto individual 4 × 4 cm2 Kimwipes™ (Kimberly-Clark Global Sales, LLC, Roswell, GA), which were subsequently placed into a nylon bag (Grand River Products, LLC, Grosse Pointe Farms, MI) with a suspended activated carbon strip (Albrayco Technologies, Inc., Cromwell, CT) and sealed using masking tape. For passive-headspace extraction of the simulated fire debris samples, the samples were placed directly into a nylon bag with a suspended activated carbon strip and sealed. The passive-headspace extraction was performed at 80 °C for 4 h. Following extraction, the carbon strips were eluted with 200 µL of dichloromethane and analyzed by GC-MS. All ignitable liquid standards and simulated fire debris samples were extracted in triplicate. 54 Table 3.3: Spike volumes used for simulated fire debris samples with respect to each commercial ignitable liquid and substrate Ignitable Liquid Volume Spiked onto Carpet Substrate (µL) Volume Spiked onto Treated Wood Substrate (µL) Gasoline 100 N/A Paint thinner 125 125 Diesel 175 75 Torch fuel 100 50 N/A: An appropriate spike volume was not determined and further analysis was not completed due to contamination of the instrument from the wood treatment. 55 3.5. GC-MS Analysis An Agilent 6890N gas chromatograph coupled to an Agilent 5975 mass spectrometer (Agilent Technologies, Santa Clara, CA) with an auto sampler containing a 10 µL Hamilton syringe was used. The column was a 30.0 m x 0.25 mm x 0.25 µm Agilent capillary column comprised of 5% phenyl methyl siloxane (HP-5). The inlet temperature was set to 250 °C with a pulsed splitless injection of 15.0 psi for 0.25 minutes. Helium gas with a nominal flow rate of 1.0 mL/min was used to carry 1 µL of injected sample into the column and an oven temperature program of 40 °C for 3 minutes, followed by a 10 °C ramp per minute to 280 °C with a 4 minute hold was used. The transfer line was set to 280 °C and electron ionization (70eV) was used. The mass spectrometer was set to 2.91 scans/s over a mass range of 50-500u. 3.6. Data Pretreatment All total ion chromatograms (TICs) were caprolactam background subtracted and smoothed using functions available in Agilent ChemStation software (version E01.02.16). Simulated fire debris samples were retention time aligned to the appropriate reference standards (commercial or class) using a correlation optimized warping algorithm available in the data analysis software (The Unscrambler® X, version 10.2, Camo Software Inc., Woodbridge, NJ). The target used for alignment to the commercial standards was an average TIC of all commercial ignitable liquid standards and the target used for alignment to the class reference standards was an average TIC of all class standards. Alignment to both the commercial and class standards was performed using a segment size of 125 data points and a warp of 11 data points. All chromatograms were then constant-sum normalized in Microsoft Excel (Microsoft Office Professional Plus 2010 version 14.0.7116.5000, Microsoft Corp., Redmond, WA) before data analysis. 56 3.7. Data Analysis 3.7.1. Principal Components Analysis Principal components analysis was performed on the TICs of three different sets of standards using The Unscrambler® X. The first data set was referred to as the chemically diverse data set and was comprised of the three commercial gasoline standards, six commercial petroleum distillate, and two isoparaffinic products. The second data set was referred to as the refined data and was comprised of the three gasoline standards and four petroleum distillate standards. The final data set was referred to as the class reference data set and consisted of the class reference standards. Scores for each standard were generated and plotted as a scatter plot using Microsoft Excel. Loadings plots of each significant principal component were also plotted using Microsoft Excel. Scores for the simulated fire debris samples were generated by multiplying the mean-centered data for the debris sample by the eigenvectors for the first principal component (PC1) and summing the product. Scores for additional principal components were calculated similarly using the respective eigenvectors. The scores of the simulated fire debris samples were then projected onto the scores plots generated for the commercial liquid standards and the class reference standards. 3.7.2. Euclidean Distance Euclidean distances were calculated between the scores for the simulated fire debris samples and the scores for the commercial standards and class reference standards to evaluate association. Distances were calculated in Microsoft Excel using equation (Equation 2.1). The number of principal components used to calculate the Euclidean distance was based on the number of principal components that accounted for at least 95% of the variance in the data set. 57 3.7.3. Hierarchical Cluster Analysis Agglomerative hierarchical cluster analysis (HCA) was performed on each of the three data sets and simulated fire debris samples in Pirouette® (version 4.0, Infometrix Software, Inc., Bothell, WA),using the Euclidean distance and single linkage methods. Resulting dendrograms were assessed for degree of similarity between fire debris samples and commercial standards and between fire debris and class reference standards. 3.7.4. k-Nearest Neighbors k-Nearest Neighbors (k-NN) was performed on each of the three standard data sets and simulated fire debris in Pirouette®. HCA was performed using the method above in order to assign known classes of ignitable liquids. Once the classes were defined, the k-NN algorithm was performed on the ignitable liquid standards. A prediction was then performed on the simulated fire debris in order to classify the samples to an ignitable liquid class. Predictions were made using 1, 3, 5, 7, and 9 nearest neighbors. 58 4. Investigation of Class Reference Standards for Association of Fire Debris to ASTM Class using Principal Components Analysis 4.1. Introduction Principal components analysis (PCA) has previously been utilized to associate simulated fire debris samples back to the commercial ignitable liquid used as an accelerant. While PCA has shown potential for this association, it can be limited due to the variety of commercial ignitable liquids on the market. It would not be feasible for a forensic laboratory to analyze every ignitable liquid on the market to include in a data set. PCA may also be considered limiting because the interpretation of the scores plot remains subjective. Furthermore, the diversity within a data set can influence the success of association. This chapter demonstrates the limitations of associating simulated fire debris back to the corresponding commercial ignitable liquid using PCA and highlights the need for a more standardized approach. As an alternative, the utility of class reference standards for association has been investigated. By generating class reference standards that are representative of an ASTM class as a whole, the number of standards required for an in-house data set in a forensic laboratory may substantially be reduced. In addition, having standards more representative of each class could potentially account for the high variability in chemical composition observed within an ASTM class. In this chapter, the subjectivity of visual interpretation of the scores plots as well as limitations of diversity within a commercial ignitable liquid data set were also investigated. To address the limitations of subjectivity, Euclidean distances were used to quantitatively assess the extent of association between the ignitable liquid standards and the simulated fire debris in the 59 PCA scores plot. To examine the limitations of diversity and the impact of data set selection, two commercial ignitable liquid data sets were compared. One data set was more chemically diverse in nature and consisted of gasoline, petroleum distillate, and isoparaffinic standards and the second data set was a refined data set and consisted of only gasoline and petroleum distillate standards. To investigate association to an ASTM class, rather than a specific commercial ignitable liquid, using multivariate statistical procedures, class reference standards for two different ASTM classes and simulated fire debris samples were generated and used. Common household items such as treated red oak flooring and nylon carpet with carpet padding were used as substrates. Appropriate burn times for each substrate and appropriate spike volumes for each commercial ignitable liquid were determined. Burn times and spike volumes were selected so that the fire debris was not readily identifiable as the ignitable liquid used and so that sufficient interference compounds from each substrate were observed. Common commercial ignitable liquids such as gasoline, diesel, paint thinner, and torch fuel were used to generate the simulated fire debris samples. 4.2. Commercially Available Standards and Corresponding Class Reference Standards 4.2.1. Gasoline Total ion chromatograms (TICs) of a representative commercial gasoline and the gasoline class reference standard are shown in Figure 4.1. The commercial gasoline (Figure 4.1A) contains toluene, C2-alkylbenzenes (e.g., ethylbenzene, m-xylene, o-xylene), C3-alkylbenzenes (e.g., propylbenzene, and 1,2,4-trimethylbenzene), and C4-alkylbenzenes (e.g., 1,2,4,5- 60 Normalized Abundance A C2-alkylbenzenes C3-alkylbenzenes Toluene C4-alkylbenzenes Napthalenes 0 10 20 30 20 30 Retention Time (min) Normalized Abundance B 0 C3-alkylbenzenes C2-alkylbenzenes Toluene 10 Retention Time (min) Figure 4.1 Representative total ion chromatograms of A) commercial gasoline standard and B) gasoline class reference standard with characteristic compounds identified 61 tetramethylbenzene) and some naphthalene compounds. Compounds for the class reference standard were chosen based on the characteristic compounds of gasoline and were prepared in similar abundance and ratios to the commercial gasoline. The gasoline class reference standard contains toluene, ethylbenzene, m-xylene, o-xylene, propylbenzene, and 1,2,4-trimethylbenzene and a representative TIC is shown in Figure 4.1B. 4.2.2. Medium Petroleum Distillate Total ion chromatograms of a representative commercial torch fuel and the medium petroleum distillate (MPD) class reference standard are shown in Figure 4.2. The commercial torch fuel (Figure 4.2A) is classed as a medium petroleum distillate due to the presence of normal alkanes (C11-C14) with some isoparaffinic, cycloparaffinic, and aromatic compounds. Compounds for the class reference standard were determined based on characteristic compounds that dominated the TIC and were prepared in a similar distribution to most petroleum distillates. The MPD class reference standard contains C8-C13 and a representative TIC is shown in Figure 4.2B. 4.2.3. Heavy Petroleum Distillate Total ion chromatograms of a representative commercial diesel and the heavy petroleum distillate (HPD) class reference standard are shown in Figure 4.3. The commercial diesel (Figure 4.3A) is classed as a heavy petroleum distillate and contains normal alkanes (C12-C20) with some isoparaffinic, cycloparaffinic, and aromatic compounds. Compounds for the class reference standard were determined based on characteristic compounds that dominated the TIC and were prepared in a similar distribution. The HPD class contains C8-C20 and a representative TIC is show in Figure 4.3B. 62 C13 Normalized Abundance A C12 C14 C11 0 10 20 30 20 30 Retention Time (min) B Normalized Abundance C11 C10 C12 C13 C9 C8 0 10 Retention Time (min) Figure 4.2: Representative total ion chromatograms of A) commercial torch fuel standard and B) medium petroleum distillate class reference standard with characteristic compounds identified 63 Normalized Abundance A C14 C15 C16 C13 C17 C12 C18 C19 0 10 C20 20 30 Retention Time (min) C14 B Normalized Abundance C13 C15 C16 C12 C11 C17 C10 C18 C9 C19 C8 0 10 20 30 Retention Time (min) Figure 4.3: Representative total ion chromatograms of A) commercial diesel standard and B) heavy petroleum distillate class reference standard with characteristic compounds identified 64 4.3. Determination of Substrate Burn Times Burn times were determined for each of two substrates in order to maximize the abundance of interference compounds. Representative TICs of the treated red oak flooring substrate are shown in Figure 4.4. The 30-second burn time (Figure 4.4A) consisted of a high abundance of interference compounds from the wood treatment, specifically C9-C12 corresponding to a medium petroleum distillate. The 60-second burn time (Figure 4.4B) consisted of a lower abundance of interference compounds from the wood treatment, but included compounds from the wood such as benzaldehyde, and benzophenone. As the treatment resembled a petroleum distillate, a burn time of 30 seconds was used in order to make association more challenging, similar to typical fire debris evidence an analyst may receive. A representative total ion chromatogram of the 120-second burn time of the nylon carpet is shown in Figure 4.5. The 120-second burn time was optimized in a previous study in the laboratory. Compounds characteristic to the burned carpet included styrene (retention time (tR): 7.167 minutes), 1,2,3-trichloropropane (tR: 7.705 minutes), and biphenyl (tR: 15.097 minutes). These compounds originate from the carpet backing, padding, and adhesive used for the production of carpet. Interference compounds from the carpet and carpet padding substrate have similar retention times to some of the ignitable liquids used which could result in coelution making interpretation more challenging. For example, o-xylene from gasoline has a retention time of 7.241 and has a tendency to coelute with styrene from the substrate. 65 A Abundance C10 C 11 C12 C9 0 10 20 30 Retention Time (min) Abundance B C10 C11 Benzaldehyde 0 Benzophenone C12 C9 10 20 30 Retention Time (min) Figure 4.4: Representative total ion chromatograms of A) 30-second burn time of the treated red oak flooring substrate and B) 60-second burn time of the treated red oak flooring substrate with characteristic compounds identified. Compounds from the wood treatment are indicated in red. 66 Styrene Abundance 1,2,3-trichloropropane 0 Biphenyl 10 20 30 Retention Time (min) Figure 4.5: Representative total ion chromatogram of 120-second burn time of nylon carpet with carpet padding with characteristic compounds indicated 67 4.4. Simulated Fire Debris Samples The simulated fire debris containing treated red oak flooring spiked with commercial diesel (Figure 4.6A) has characteristics similar to a petroleum distillate, but due to the interference compounds from the surface treatment, the sample is not readily identified as diesel (Figure 4.6B). For example, the diesel used in this research has a unimodal distribution of normal alkanes with C15 as the maximum peak; however, the simulated fire debris containing diesel has a bimodal distribution with C11 and C13 as the maximum peaks. In the unimodal distribution, a rise in the baseline is observed between approximately tR 12.00 and 21.00 minutes, while two rises in the baseline of the bimodal distribution are observed from approximately tR 8.00 to 12.00 minutes and tR 12.00 to 21.00 minutes. This change in abundances and distributions is a result of the interference compounds from the wood treatment causing identification based on visual assessment of the chromatograms to become more challenging. An example of a simulated fire debris TIC for nylon carpet containing commercial diesel is shown in Figure 4.7. The simulated fire debris containing nylon carpet with carpet padding spiked with commercial diesel (Figure 4.7A) contains the typical distributions of normal alkanes in a petroleum distillate such as diesel (Figure 4.7B). However, the rise in the baseline between approximately tR 12.00 and 21.00 minutes is significantly reduced due to the low abundance of ignitable liquid and significantly higher abundance of interference compounds from the carpet indicating that the substrate is dominating the sample. 68 A C13 C14 Normalized Abundance C11 *C12 C15 C10 C16 C17 C9 0 10 20 30 Retention Time (min) Normalized Abundance B C14 C15 C16 C13 C17 C12 C18 C19 0 10 20 C20 30 Retention Time (min) Figure 4.6: Representative total ion chromatograms for A) treated red oak flooring spiked with 75 µL of commercial diesel with substrate interferences indicated in red and B) commercial diesel ignitable liquid standard. *C12 originates from both wood treatment and diesel 69 Biphenyl Normalized Abundance A Styrene C12 C13 C14 C15 C11 C16 0 10 20 30 Retention Time (min) Normalized Abundance B C14 C15 C16 C13 C17 C12 C18 C19 0 10 20 C20 30 Retention Time (min) Figure 4.7: Representative total ion chromatogram of A) nylon carpet with carpet padding spiked with 175 µL of commercial diesel with substrate interferences indicated in red and B) commercial diesel ignitable liquid standard 70 4.5. Association and Discrimination of Simulated Fire Debris using PCA Principal components analysis was performed using three data sets. The first data set was a chemically diverse data set and contained all eleven commercial ignitable liquid standards consisting of three commercial gasoline standards, six petroleum distillates, two isoparaffinic products, and simulated fire debris samples. The second data set was a refined data set and contained seven commercial ignitable liquid standards consisting of the three commercial gasoline standards, four petroleum distillate standards, and the simulated fire debris. The chemically diverse and refined data sets were used to demonstrate the limitations of PCA with commercial ignitable liquids, investigate the effects of chemical diversity within a data set, and demonstrate the impact of data set selection for successful association and discrimination. The third data set was the class reference data set and contained the three class reference standards and the simulated fire debris. The class reference data set was used to investigate the utility of class reference standards as an alternative to the commercial ignitable liquids for PCA association and discrimination. For each data set, PCA was performed initially on the standards alone and scores for the simulated fire debris samples were calculated from the resulting eigenvectors and then projected onto the scores plot. The simulated fire debris samples for each substrate were projected separately to realistically represent how PCA may be applied in the analysis of fire debris in a forensic laboratory. Projecting scores for the fire debris reduced the influence of interference compounds on the positioning of the samples on the scores plot as only the variables accounting for the characteristic compounds of the ignitable liquid were considered. 71 4.5.1. Commercial Ignitable Liquid Standards – Chemically Diverse Data Set Principal components analysis was first performed on the chemically diverse data set with only the gasoline, petroleum distillate, and isoparaffinic commercial standards. The scores plot is shown in Figure 4.8. The scores plot illustrates principal component one (PC1) versus principal component two (PC2) where PC1 accounts for 32.5% of the variance within the data set and PC2 accounts for 24.4% of the variance. The gasoline standards, positioned positively, and upholstery protector, positioned negatively, are distinguished along PC1, and all other standards are positioned approximately at zero. Along PC2, the gasoline standards and the upholstery protector are positioned negatively and distinguished from all other standards, which are positioned slightly positive. The positioning of the standards on the scores plot can be explained using the loadings plots for PC1 and PC2 (Figure 4.9). Compounds characteristic of gasoline are weighted positively on the PC1 loadings plot (Figure 4.9A) and compounds characteristic of the upholstery protector are weighted negatively. As a result, all gasoline standards are positioned positively on PC1 in the scores plot and are distinguished from the upholstery protector standard, which are positioned negatively on the scores plot. All petroleum distillate standards are positioned close to zero on PC1, as the characteristic compounds of those standards such as the heavier alkanes (C15C20) are less volatile and less variable and as a result, are not described by PC1. The majority of the variance along PC1 originates from the chemical differences of the compounds present in upholstery protector (tR: 3.00 to 8.00 minutes) from all other standards which contain compounds that do not begin eluting until much later (8.00 to 20.00 minutes). 72 0.15 PC 2 (24.4%) -0.15 0.15 -0.15 PC 1 (32.5%) Gas A Gas B Gas C Paint Thinner Upholstery Protector Diesel Kerosene Fuel Injector Torch Fuel Charcoal Lighter Figure 4.8: PCA scores plot of PC1 (32.5%) verses PC2 (24.4%) for the chemically diverse data set with commercial standards only 73 Normalized Abundance A C3-alkylbenzenes C2-alkylbenzenes C4-alkylbenzenes Toluene 0 10 20 30 20 30 Branched C5-C7 Retention Time (min) Normalized Abundance B C11 Toluene C12 C13 Branched C7-C12 C14 0 C15 C 16 10 C4-alkylbenzenes Branched C5-C7 C2-alkylbenzenes C3-alkylbenzenes Retention Time (min) Figure 4.9: Loadings plots for the chemically diverse data set with A) PC1 representing 32.5% of the variance and B) PC2 representing 24.4% of the variance 74 Compounds characteristic of paint thinner (branched C7-C12) and the petroleum distillate standards (C11-C16) are weighted positively on the PC2 loadings plot (Figure 4.9B). While compounds characteristic of upholstery protector (branched C5-C7) and the gasoline standards (C2, C3, and C4-alkylbenzenes) are weighted negatively on the PC2 loadings plot. Therefore, the paint thinner and petroleum distillate standards are positioned positively on PC2 in the scores plot and are distinguished from the upholstery protector and gasoline standards, which are positioned negatively on PC2 in the scores plot. In addition, paint thinner and upholstery protector are both classified as isoparaffinic products due to the presence of branched alkanes, and cyclic alkanes. However, the two samples are significantly different where upholstery protector contains branched alkanes ranging from C5-C12 and paint thinner contains branched alkanes ranging from C7-C12. Chemical differences within the isoparaffinic product class results in paint thinner associating closely to the petroleum distillates while upholstery protector is distinguished from all other standards. In order to differentiate the standards positioned close to zero, additional principal components would need to be investigated; however, distinguishing standards within a class was not the purpose of this research and additional principal components were not investigated. The gasoline standards are not distinguished from one another because there is little chemical variation within the three gasoline standards; however, there is more variability within the gasoline standard replicates due to the volatility of compounds present in gasoline such as toluene, ethylbenzene, and m-xylene. Furthermore, the remainder of the scores plots in this chapter can be explained similarly using the corresponding loadings plots. Next, one set of simulated fire debris samples containing carpet spiked with diesel were 75 projected onto the chemically diverse data set scores plot as shown in Figure 4.10. From visual assessment of the scores plot, the fire debris samples are positioned approximately at zero and are positioned closely to the petroleum distillate standards, specifically the diesel, kerosene, fuel injector, and charcoal lighter standards, all of which are classified as heavy petroleum distillates with the exception of charcoal lighter. Euclidean distances were calculated between the simulated fire debris scores and each of the ignitable liquid standard scores to quantitatively assess association on the scores plot. A short Euclidean distance indicates that the samples are more similar, while a longer Euclidean distance indicates that the two samples are dissimilar. The Euclidean distances between the simulated fire debris scores and the ignitable liquid scores for the chemically diverse data set, based on seven PCs are shown in Table 4.1 The shortest Euclidean distance was calculated between the fire debris and the kerosene standard, rather than the corresponding diesel standard. The calculated Euclidean distance between kerosene standard and the fire debris was 0.01572 and the next closest was the diesel standard with a distance of 0.02554. Visually the simulated fire debris associates to all three heavy petroleum distillate standards, but quantitatively the fire debris is most closely associated to the kerosene standard. After the burning process and introduction of interference compounds, the fire debris samples contain a normal alkane range more similar to kerosene than diesel. For example, the commercial diesel standard contains normal alkanes in the range C12-C20, while the commercial kerosene standard and simulated fire debris contain alkanes in the range C11-C16. Overall, having a more 76 0.15 PC 2 (24.4%) -0.15 0.15 -0.15 Gas A Gas B Upholstery Protector Torch Fuel PC 1 (32.5%) Gas C Paint Thinner Diesel Kerosene Fuel Injector Charcoal Lighter Lamp Oil Fire Debris Figure 4.10: PCA scores plot of PC1 (32.5%) verses PC2 (24.4%) for the chemically diverse data set with commercial standards and simulated fire debris projected 77 Table 4.1: Euclidean distances between fire debris scores and ignitable liquid standard scores for the chemically diverse data set Ignitable Liquid Standard Gasoline A Chemically Diverse Data Set 0.08718 Gasoline B 0.07551 Gasoline C 0.08032 Diesel 0.02554 Fuel Injector 0.02871 Torch Fuel 0.05890 Charcoal Lighter 0.06414 Kerosene 0.01572 Lamp Oil 0.08662 Paint Thinner 0.1194 Upholstery Protector 0.1455 78 chemically diverse data set can make both visual association challenging and can influence association. Using calculated Euclidean distances can help determine which standard the fire debris is most closely associated to when visual association becomes challenging. Euclidean distances are also beneficial because using this calculation, multiple dimensions can be considered at once, while only two or three dimensions are visualized at a time in a scores plot. For example, the simulated fire debris appears to be positioned similar distance from lamp oil, a medium petroleum distillate, when compared to paint thinner, an isoparaffinic. However, the calculated Euclidean distances (based on seven PCs) indicate that the fire debris samples are positioned more closely to the lamp oil (0.08662) than the paint thinner (0.1194) which may not be initially apparent. Euclidean distances are nevertheless still limiting, as selecting how many PCs to use could be considered subjective. Selecting a different number of PCs could result in association to different standards but will increase the ability to discriminate from other standards, as additional discriminatory information is included. Distances do not typically differ by orders of magnitude unless they are substantially different. As there is no distinction between a ‘short’ and ‘long’ distance, an indiscriminate interpretation of the calculated distances could occur. In this iteration of PCA, the fire debris samples do not correctly associate to the specific ignitable liquid, but association to the corresponding chemical class is possible. Association of all other simulated fire debris samples containing petroleum distillates spiked onto both nylon carpet with carpet padding and treated wood flooring were properly associated to the petroleum distillate class, but not necessarily to the specific ignitable liquid standard. Fire debris samples 79 containing paint thinner on nylon carpet were improperly associated to the diesel standard; however, these samples were positioned closed to zero indicating that they are not largely influenced by the variance in the data set. Although commercial paint thinner was included in the data set, loss of characteristic compounds through the burning process resulted in improper association. Fire debris samples containing paint thinner spiked onto treated wood flooring were improperly associated to charcoal lighter. Improper association occurred due to the addition of normal alkanes (C9-C12) from the wood treatment that are similar to charcoal lighter. Fire debris samples consisting of nylon carpet spiked with gasoline associated to the diesel standard. However, due to extensive evaporation, few compounds characteristic of gasoline were present. 4.5.2. Commercial Ignitable Liquid Standards – Refined Data Set Principal components analysis was performed on the refined data set with the simulated fire debris containing carpet spiked with diesel projected and the resulting scores plot is shown in Figure 4.11. The refined data set was used to investigate the impact of data set selection on the success of association and discrimination. Standards selected for this data set was based from common ignitable liquids found in fire debris and was selected to correlate with the class reference data set. In this refined data set, the gasoline standards, positioned positively, and petroleum distillate standards, positioned negatively, are distinguished along PC1. Along PC2, the ignitable liquids within the petroleum distillate class are distinguished where charcoal lighter is positioned positively, torch fuel is positioned negatively, and all other standards are approximately zero. All other petroleum distillate standards are positioned close to zero on PC2, as the 80 0.15 PC 2 (22.4%) -0.15 0.15 -0.15 PC 1 (60.2%) Gas A Gas B Gas C Fuel Injector Diesel Torch Fuel Charcoal Lighter Fire Debris Figure 4.11: PCA scores plot of PC1 (60.2%) versus PC2 (24.4%) for the refined data set with commercial standards and simulated fire debris projected 81 characteristic compounds of those standards such as the heavier alkanes (C15-C20) are less volatile and less variable. These less variable compounds are not influenced significantly on PC2 but the more volatile and more variable alkanes (C10-C14) do contribute to the variance on PC2. The positioning of the gasoline and petroleum distillate standards differs from the chemically diverse data set as there are less standards and less chemical diversity within the refined data set. Differentiating gasoline from the petroleum distillates along PC1 is more straightforward once the isoparaffinic products (i.e., paint thinner and upholstery protector) are removed. Removing standards from the data set distinguishes the petroleum distillate standards from one another along PC1 and PC2. From visual assessment of the scores plot, the fire debris samples are positioned negatively along PC1, approximately zero on PC2, and are positioned close to the diesel and fuel injector standards, both of which are classified as heavy petroleum distillates. Euclidean distances were calculated between the simulated fire debris scores and each of the ignitable liquid standard scores based on four PCs (Table 4.2). The shortest Euclidean distance was calculated between the fire debris scores and the diesel standard, indicating the greatest similarity to the corresponding standard. These calculated Euclidean distances confirm the visual association of the scores plots. However, Euclidean distances can be limiting, as the number of principal components to include must be designated. Selecting a different number of principal components will result in different calculated distances and could change the association as more discriminatory information (additional PCs) is added to the calculation. Using a refined data set association to the corresponding ignitable liquid and ASTM class was possible, but the simulated fire debris can only be associated to the specific ignitable liquid 82 Table 4.2: Euclidean distances between fire debris scores and ignitable liquid standard scores for the refined data set Ignitable Liquid Standard Gasoline A Refined Data Set 0.08695 Gasoline B 0.07509 Gasoline C 0.07992 Diesel 0.02653 Fuel Injector 0.02867 Torch Fuel 0.05852 Charcoal Lighter 0.06370 83 if it is present in the data set. PCA was performed again, this time excluding the commercial diesel standard from the refined data set. This iteration of PCA was performed to demonstrate that the specific ignitable liquid must be present and to establish the potential utility of class reference standards. The resulting scores plot with the simulated fire debris projected is shown in Figure 4.12. From visual assessment of the scores plot excluding the diesel commercial standard, the fire debris samples are most closely positioned to the fuel injector standard. Additionally, the shortest Euclidean distance was calculated between the fire debris scores and the fuel injector standard, indicating greatest similarity not to the corresponding standard as the standard is not present in the data set, but to a standard of similar chemical composition within the same ASTM class. The Euclidean distances between the simulated fire debris scores and the ignitable liquid scores for the refined data set, excluding diesel, based on three PCs are shown in Table 4.3. Visual association and Euclidean distances indicate that the fire debris can be associated to a specific ignitable liquid when it is present in the given data set. If the specific ignitable liquid is not present in the data set, the fire debris samples can be associated to a standard that is similar in chemical composition. However, if diesel and fuel injector were both removed, the fire debris may not associate well with other petroleum distillates present in the data set due to the chemical variations within this ASTM class. This demonstrates the limitations of using commercial ignitable liquids in PCA, as chemical variability within an ASTM class can be limiting and it is not practical to include every commercial ignitable liquid in the data set. Ultimately, association to ASTM class is still possible if chemical composition is similar indicating that the use of class reference standards has potential for successful association. 84 0.15 PC 2 (25.1%) -0.15 0.15 -0.15 Gas A Gas B Gas C PC 1 (63.5%) Fuel Injector Torch Fuel Charcoal Lighter Fire Debris Figure 4.12: PCA scores plot of PC1 (63.5%) versus PC2 (25.1%) for the refined data set with commercial standards (excluding diesel) and simulated fire debris projected 85 Table 4.3: Euclidean distances between fire debris scores and ignitable liquid standard scores for the refined data set containing diesel and excluding diesel Ignitable Liquid Standard Gasoline A Refined Data Set Excluding Diesel 0.08549 Gasoline B 0.07318 Gasoline C 0.07736 Diesel N/A Fuel Injector 0.02281 Torch Fuel 0.05581 Charcoal Lighter 0.06126 N/A indicates the standard was not present in the scores plot 86 Association of other simulated fire debris samples was also successful using the refined data set. All simulated fire debris samples consisting of nylon carpet with carpet padding and treated wood flooring spiked with a petroleum distillate (diesel or torch fuel) associated to the petroleum distillate class. In the case of the torch fuel spiked onto the treated wood flooring, improper association to the fuel injector standard occurred due to the additional normal alkanes from the wood treatment. Association of fire debris samples containing paint thinner was not performed, as no isoparaffinic products were included in the data set. Similar to the chemically diverse data set, fire debris samples consisting of nylon carpet spiked with gasoline associated to the diesel standard due to extensive evaporation. Comparing the chemically diverse data set to the refined data set demonstrated that the success of association was highly dependent on the composition of the data set used. Considering there is a range of commercially available ignitable liquids within each ASTM class and chemical variability within a class, selecting an ideal data set for analysis is problematic. Specifically this is problematic in forensic science as it could be interpreted as manipulating results. Overall, this emphasizes the need for a more standardized approach that may be provided by generating reference standards that are representative of a given ASTM class. This would eliminate problems associated with changing the size and/or chemical diversity of the data set. 4.5.3. Class Reference Standards Principal components analysis was performed on the class reference data set with the simulated fire debris containing carpet spiked with diesel projected and the resulting scores plot is shown in Figure 4.13. The class reference data set was used to investigate the effects of association and discrimination using standards based on chemical class composition rather than commercial standards. In this data set, PC1 accounts for 72.9% of the variance and distinguishes 87 the petroleum distillate standards, loading positively, from the gasoline standard, loading negatively. PC2 accounts for 27.1% of the variance and further distinguishes the heavy petroleum distillate standard, loading positively, from the medium petroleum distillate standard, loading negatively. The medium and heavy petroleum distillates are well distinguished from one another, particularly on PC2, despite the chemical overlap of normal alkanes where the MPD standard contains C8-C13 and the HPD standard contains C8-C20. From visual assessment of the scores plot, the fire debris samples are most closely positioned to the heavy petroleum distillate standard. Euclidean distances were calculated between the simulated fire debris scores and each of the class reference standard scores based on two PCs and are shown in Table 4.4. The shortest calculated Euclidean distance was between the fire debris and the heavy petroleum distillate standard with a distance of 0.09245, indicating the greatest similarity. The next shortest distance was between the fire debris and the gasoline standard with a calculated distance of 0.1426. These calculated Euclidean distances confirmed the visual association of the scores plot. Despite chemical differences between the ignitable liquid present in the debris and the heavy petroleum distillate reference standard, successful association was possible. Association of the fire debris in this scenario is relatively easy due the presence of fewer standards that are more representative resulting in the standards being well distinguished from one another along both PCs. Additionally, the precision of the class references standards is higher when compared to the precision of commercial standards. Using class reference standards does show potential in PCA analysis; however, further exploration of additional standards is necessary. 88 0.16 PC 2 (27.1%) -0.16 0.16 -0.16 Medium Pet. Dist. PC 1 (72.8%) Heavy Pet. Dist. Gasoline Fire Debris Figure 4.13: PCA scores plot of PC1 (72.8%) versus PC2 (27.1%) for the class reference standards with the projected scores of the simulated fire debris 89 Table 4.4: Euclidean distances between fire debris scores and class reference scores for the class reference data set Ignitable Liquid Standard Gasoline Class Reference Data Set 0.1426 Medium Petroleum Distillate 0.1445 Heavy Petroleum Distillate 0.09245 90 All other simulated fire debris samples containing a petroleum distillate spiked onto nylon carpet and treated wood flooring associated to the heavy petroleum distillate class. Association of the fire debris containing paint thinner was not performed as no isoparaffinic standards were present in the data set. Once again, the fire debris samples containing gasoline associated to the heavy petroleum distillate class due to extensive evaporation and loss of characteristic compounds. 4.6. Summary The use of PCA as an additional tool in fire debris analysis is currently limited by the arbitrary selection of the data set where no designated number or type of standards to use is specified. Given the variety of ignitable liquids that are commercially available, including each one in the data set is not practical. Successful association and discrimination can be affected by the composition of the data set where association may change depending on the size and chemical diversity of the data set. This could be problematic in forensic science as it may be interpreted as manipulating the data set to get results that are more desirable. In addition to selecting the appropriate data set, visual interpretation of association can be challenging and subjective. Utilizing Euclidean distance as an additional metric to quantitatively assess the association reduces the subjectivity when interpreting scores plots. However, there are some limitations to using Euclidean distances such as selecting the number of PCs to use and interpreting a short distance versus a long distance as distances may not typically differ substantially. Class reference standards have demonstrated some potential to associate fire debris samples to the corresponding ASTM class using PCA despite the chemical differences between the class reference standard and the commercial ignitable liquid. However, other class reference 91 standards should be generated to investigate the utility of these standards further. Overall, using these standards for PCA may help to standardize the current approach and overcome problems associated with altering the size or nature of the data set. 92 5. Investigation of Class Reference Standards for Association and Classification of Fire Debris to ASTM Class using Hierarchical Cluster Analysis and k-Nearest Neighbors 5.1. Introduction Hierarchical cluster analysis (HCA) was used to examine the similarity between simulated fire debris samples and corresponding commercial ignitable liquids. Previously, HCA has been investigated as a tool to assist with determining which ignitable liquids the simulated fire debris are most closely associated to on a principal component analysis (PCA) scores plot by observing which standard the fire debris associates to first on the corresponding HCA dendrogram. HCA has some advantages and disadvantages over PCA and shares some of the limitations associated with using commercial ignitable liquids as standards. This chapter highlights the advantages and disadvantages of HCA when compared to PCA and continues to demonstrate the need for a more standardized approach for associating simulated fire debris to ignitable liquids. To continue to establish the utility of class reference standards as an alternative to commercial standards, HCA was performed on both sets of commercial ignitable liquid standards, as well as the generated class reference standards. Furthermore, HCA was performed using the chemically diverse and refined commercial ignitable liquid data sets to investigate how HCA is impacted by data sets with different chemical diversity. Additionally, k-Nearest neighbors (k-NN) was investigated as a classification procedure for classifying simulated fire debris samples based on ASTM class. Previously, k-NN has not been used to classify simulated fire debris, but has been used in other forensic applications (1-3) In this chapter, k-NN was used to examine how well simulated fire debris is classified using class 93 reference standards. k-NN was performed using commercial ignitable liquid standards and class reference standards and the percent of successful classifications, as well as misclassifications, was observed. 5.2. Association of Simulated Fire Debris using HCA 5.2.1. Commercial Ignitable Liquid Standards- Chemically Diverse Data Set Hierarchical cluster analysis was first performed on the chemically diverse data set containing the commercial gasoline, petroleum distillate, isoparaffinic standards, and simulated fire debris. The resulting dendrogram is shown in Figure 5.1. The dendrogram illustrates the clusters generated along the left axis where replicates of each standard are first grouped to one another. Similarity level is indicated along the top axis and by the branching within the dendrogram where a similarity level of 1.0 indicates the greatest similarity and a similarity level of 0.0 indicates no similarity. Based on the dendrogram containing only the commercial ignitable liquid standards, replicates of each standard were clustered at similarity levels ranging from 0.791 to 0.997. This large range of similarity levels between replicates was observed due to differences in abundance between some replicates. For example, one replicate of charcoal lighter clustered to all other charcoal lighter replicates at a much lower similarity level (0.621) than the others. This one replicate has a much higher abundance of C10 and C11 than all other replicates, which are the two dominating normal alkanes in the TIC; as a result, this replicate is still clustered correctly, but at a lower similarity level. Additionally, the replicates of diesel are considered the most similar compared to replicates of all other ignitable liquids because of the low volatility of the heavier 94 1.0 Carpet, Diesel Fire Debris Fuel Injector Kerosene Diesel Charcoal Lighter Torch Fuel 0.8 Similarity Level 0.6 0.4 0.2 0.0 0.543 Gasoline C Gasoline B 0.865 Gasoline A Lamp Oil Paint Thinner Upholstery Protector 0.186 0.000 Figure 5.1: Dendrogram of the chemically diverse data set with similarity levels indicated where appropriate 95 alkanes in this liquid (C12-C20). These compounds are less volatile than compounds found in other ignitable liquids making the replicates of diesel more reproducible. All gasoline standards cluster to one another at a similarity level of 0.865. Despite being similar in chemical composition, there are minor differences in abundance of the characteristic compounds among the three gasoline standards, as shown in Figure 5.2. Gasoline A (Figure 5.2A) has a higher abundance of C2-alkylbenzenes than the other gasoline standards while gasoline B (Figure 5.2B) has a low abundance of toluene and a higher abundance of C3alkylbenzenes. Toluene is highly volatile compound while the C3-alkylbenzenes are less volatile. Therefore, it would be expected for gasoline B to be less variable than the other gasoline standards that contain high levels of toluene. Similarly, gasoline C (Figure 5.2C) has a higher abundance of toluene and a moderate to high abundance of C2-alkylbenzenes. These compounds are the most volatile therefore gasoline C would be expected to be the most variable gasoline standard. This was confirmed in Figure 5.1 where the gasoline C replicates cluster at a lower similarity level (0.837) than the other gasoline standards (0.930 and 0.933 for gasolines A and B, respectively). A majority of the petroleum distillate standards cluster to one another at a similarity level of 0.543, with the exception of lamp oil. This liquid has a more narrow range of normal alkanes (C11-C12) than any other petroleum distillate standard in the data set; consequently, the lamp oil standard does not cluster exclusively to the other petroleum distillates. An isoparaffinic liquid, paint thinner, clustered to the gasoline and petroleum distillate standards at a similarity level of 0.186 while the other isoparaffinic liquid, upholstery protector, clustered at a similarity level of 0.0. Paint thinner contains branched and cyclic alkanes, gasoline 96 Normalized Abundance A C2-alkylbenzenes C3-alkylbenzenes Gasoline A C4-alkylbenzenes Toluene Napthalenes 0 10 20 30 Retention Time (min) C3-alkylbenzenes Normalized Abundance B C2-alkylbenzenes C4-alkylbenzenes Napthalenes Toluene 0 Gasoline B 10 20 30 Retention Time (min) Figure 5.2: Representative total ion chromatograms of A) commercial gasoline A standard, B) commercial gasoline B standard, and C) commercial gasoline C standard with characteristic compounds identified to highlight differences among each gasoline standard 97 Figure 5.2 (cont’d) Normalized Abundance C Gasoline C C3-alkylbenzenes C2-alkylbenzenes Toluene C4-alkylbenzenes Napthalenes 0 10 20 Retention Time (min) 98 30 contains mostly alkylbenzenes, and the petroleum distillates consist predominantly of normal alkanes. As paint thinner is chemically different from the gasoline and petroleum distillate class, the paint thinner standard was the penultimate standard that was clustered at a low similarity level. As previously mentioned, upholstery protector does not have any compounds in common with the other standards or any compounds that elute at similar retention times to those compounds in the other standards. As a result, upholstery protector was not considered similar to any standards and was clustered last, at a similarity level of 0.0. Although the chemical differences are known and observed in the large range of similarity levels, it is not possible to determine what variables are contributing to the clustering. In PCA, a loadings plot is used to determine the variables contributing the most to the variance; however, in HCA no such output is provided. This can be disadvantageous for understanding the clustering of samples; however, the raw data can be interpreted to hypothesize why specific clustering is occurring as demonstrated here. Next, the simulated fire debris consisting of carpet spiked with commercial diesel was included in the data set and the resulting dendrogram is shown in Figure 5.3.The simulated fire debris samples clustered to each other at similarity levels ranging from 0.772 to 0.983. This range of similarity levels was a result of the variability of the burning process. The fire debris samples clustered first to the heavy petroleum distillate cluster containing fuel injector, kerosene, and diesel at a similarity level of 0.705. However, it is important to note that this data set is biased towards the petroleum distillate class, as over half of the standards are petroleum distillates. 99 1.0 Carpet, Diesel Fire Debris Fuel Injector Kerosene Diesel Charcoal Lighter Torch Fuel 0.8 Similarity Level 0.6 0.4 0.2 0.0 0.705 Gasoline C Gasoline B Gasoline A Lamp Oil Paint Thinner Upholstery Protector Figure 5.3: Dendrogram of the chemically diverse data set with simulated fire debris consisting of carpet spiked with diesel 100 In this iteration of HCA, the specific ignitable liquid standard the simulated fire debris was most similar to remains unknown; however, clustering to the appropriate ASTM class was possible. Association of all other simulated fire debris samples containing petroleum distillates spiked onto both nylon carpet with carpet padding and treated wood flooring were properly associated to the petroleum distillate class, but not necessarily to the specific ignitable liquid standard. Fire debris samples containing paint thinner spiked onto nylon carpet were incorrectly associated first to a cluster containing all commercial petroleum distillates and gasoline standards at low similarity level (0.399). Although commercial paint thinner was included in the data set, loss of characteristic compounds through the burning process resulted in improper association. Fire debris samples containing paint thinner spiked onto treated wood flooring were incorrectly associated to charcoal lighter. Incorrect association occurred due to the presence of the same normal alkanes between the wood treatment and charcoal lighter (C9-C12) and compounds with overlapping retention times that were present in both paint thinner and charcoal lighter. Fire debris samples consisting of nylon carpet spiked with gasoline clustered to the standards with zero similarity; however, due to extensive evaporation, few compounds corresponding to gasoline were present in the debris samples and mostly substrate interference compounds (i.e., styrene, 1,2,3-trichloropropane, and biphenyl) dominated the TIC. As a result, the simulated fire debris containing gasoline did not resemble gasoline. Association of treated wood flooring spiked with gasoline could not be investigated, as these simulated fire debris samples were not generated due to continual problems with instrument contamination resulting from the surface treatment. 101 5.2.2. Commercial Ignitable Liquid Standards- Refined Data Set Hierarchical cluster analysis was performed on the refined data set with the simulated fire debris consisting of carpet spiked with diesel and the resulting dendrogram is shown in Figure 5.4. In this data set, replicates of each standard cluster at similarity levels ranging from 0.621 to 0.991. A large range of similarity levels would not typically be expected of replicates, but occurred due to the previously mentioned charcoal lighter replicates (see section 5.2.1) and because of the chemical nature of the ignitable liquids, as some ignitable liquids contain more volatile compounds making those liquids more variable than others. The gasoline standards cluster to the petroleum distillates at a similarity level of 0.0 due to chemical differences present between classes. Each class has distinct chemical differences where the gasoline class contains alkylbenzenes and aliphatic compounds and the petroleum distillate class contains a homologous distribution of normal alkanes and some aromatic compounds. All gasoline standards cluster to one another at a similarity level of 0.756. While all petroleum distillate standards cluster to one another at a similarity level of 0.110 indicating little to no similarity. This similarity level is relatively low due to the diversity of the data set and due to chemical differences within the petroleum distillate class. For example, fuel injector and diesel cluster together at a relatively high similarity level (0.486) as these standards are classified as heavy petroleum distillates and contain normal alkanes C10-C16 and C12-C20, respectively. Alternatively, charcoal lighter clusters to fuel injector and diesel at a relatively low similarity 102 1.0 Carpet, Diesel Fire Debris Fuel Injector Diesel 0.8 Similarity Level 0.6 0.4 0.2 0.0 0.403 Charcoal Lighter Torch Fuel Gasoline C Gasoline B Gasoline A Figure 5.4: Dendrogram of the refined data set with simulated fire debris consisting of carpet spiked with diesel 103 level (0.213) and torch fuel clusters to all of the petroleum distillates at an even lower similarity level (0.110) due to differences in the range of normal alkanes. Both charcoal lighter and torch fuel are classified as medium petroleum distillates with normal alkane ranges of C9-C12 and C11C14 respectively. The three simulated fire debris samples cluster at similarity levels ranging from 0.586 to 0.970 indicating some variation within the generated fire debris samples. This variation between samples was expected due to the inability to completely control the burning process. After the clustering of samples, the simulated fire debris samples were first clustered to both of the heavy petroleum distillate standards (commercial fuel injector and diesel) at a similarity level of 0.403. From this dendrogram, it was unclear whether the simulated fire debris was most similar to the diesel or fuel injector standard as the two standards cluster to one another first. This outcome is similar to the results from the visual assessment of the PCA scores plot (see Figure 4.11) where it was unclear if the fire debris samples were more closely associated to the diesel or to the fuel injector standard. Although the fire debris cannot be associated to a specific ignitable liquid in this case, the corresponding ASTM class of ignitable liquid was properly associated (i.e., petroleum distillate). As the purpose of this research is to associate fire debris to chemical class rather than a specific ignitable liquid, this would not be considered a limitation of HCA. Additionally, if the commercial diesel standard were not present in the data set, association of the simulated fire debris to the heavy petroleum distillate cluster would still occur (similarity level: 0.403), assuming the removal of diesel was the only change to the data set. Association of other simulated fire debris samples was somewhat successful. All simulated fire debris samples consisting of nylon carpet with carpet padding and treated wood 104 flooring spiked with a petroleum distillate (diesel or torch fuel) associated to the petroleum distillate class, but not necessarily to the specific ignitable liquid. All fire debris samples containing paint thinner spiked onto nylon carpet with carpet padding and treated wood flooring incorrectly associated to charcoal lighter, as no isoparaffinic standards were included in the data set. Although paint thinner and charcoal lighter belong to two different ASTM classes, these ignitable liquids have similar retention time ranges (8.52-11.96 min. and 7.42-12.50 min. respectively) with overlapping compounds, which explains why such clustering occurs. Similar to the chemically diverse data set, the gasoline fire debris samples clustered to the standards with no similarity due to extensive evaporation of the ignitable liquid in the fire debris in which the presence of an ignitable liquid was not detectable. As the data set becomes more refined, the similarity levels of standards within the same class decrease. For example, the three commercial gasoline standards clustered at a similarity level of 0.865 in the chemically diverse data set while the same three standards clustered at a similarity level of 0.756 in the refined data set. Similarity levels can be beneficial as a numerical representation of the similarities observed in the data set, but these levels can also be limiting as the similarity levels provided are only relative to the given data set. Attempting to associate the same simulated fire debris samples using two different data sets will yield different similarity levels. While HCA is used to highlight similarities in the data set there will always be a sample denoted as different due to the way in which clustering occurs. As a result, it is difficult to determine if the given association of the fire debris is a good fit based on similarity level alone. Overall, selecting a data set with a different chemical diversity did not alter the association of the fire debris, but did change the similarity level at which clustering occurred. As 105 a result, the same general trends of association were observed. This is an advantage of HCA for fire debris analysis as association to the same ASTM class occurred regardless of the diversity of the data set. 5.2.3. Class Reference Standards Hierarchical cluster analysis was performed on the class reference data set with the simulated fire debris containing carpet spiked with diesel and the resulting dendrogram is shown in Figure 5.5. The class reference data set was used to determine if class reference standards have utility for association using HCA. In this data set, replicates of each class reference standard clustered at similarity levels ranging from 0.948 to 0.985. Replicates of each simulated fire debris extract clustered at similarity levels ranging from 0.732 to 0.983. Replicates of the class reference standards clustered together at higher similarity level than the commercial standards indicating more precision and less variability among the class reference standards. The gasoline reference standards clustered to one another at a similarity level of 0.974 and clustered to the petroleum distillates at a similarity level of 0.0. The medium petroleum distillate reference standard replicates clustered to one another at a similarity level of 0.948 and clustered to the heavy petroleum distillate standard at a similarity level of 0.063 while all HPD reference standard replicates clustered to one another at a similarity level of 0.966. A low similarity level between the MPD and HPD reference standards was not expected, but occurred because of the limited chemical diversity and because the given similarity levels are only relative to the data set. As a result, differences were highlighted even when the samples were somewhat similar. This 106 1.0 0.8 Similarity Level 0.6 0.4 0.2 0.0 Carpet, Diesel Fire Debris 0.225 Heavy Petroleum Distillate Medium Petroleum Distillate Gasoline Figure 5.5: Dendrogram of the class reference data set with simulated fire debris consisting of carpet spiked with diesel 107 demonstrates the need to have a standardized data set containing standards representative of all ASTM classes. The simulated fire debris samples containing diesel first clustered to the HPD reference standards at a similarity level of 0.225. Although the similarity level was relatively low, the samples associated to the corresponding class standard first. In addition, the similarity level is low due to the limited chemical diversity of the data set. If additional class reference standards containing classes of different chemical composition were introduced to the data set, the similarity level between the simulated fire debris and HPD reference standard would be expected to increase. As previously mentioned for the chemically diverse data set, this data set was also somewhat biased towards petroleum distillates. All other simulated fire debris samples containing a petroleum distillate spiked on nylon carpet and treated wood flooring associated to the heavy petroleum distillate class. Association of the fire debris containing paint thinner was not attempted, as no isoparaffinic standards were included in the data set. Once again, the fire debris samples containing gasoline clustered to the standards with no similarity due to extensive evaporation of the liquid in these samples. Class reference standards showed potential for association using HCA. Although ASTM class association was achieved using both commercial standards and class reference standards, the class reference standards would be beneficial to generate a more standardized approach. If a full set of class reference standards were generated based on ASTM class and used during HCA, the similarity levels for association would be much more representative of how similar the simulated fire debris is to the standard in which it associates. For example, if the exact simulated fire debris samples were introduced to two different data sets and association occurred to the 108 same ignitable liquid standard, the resulting similarity level will be different. Therefore, it is difficult to determine the significance of a given similarity level. However, if all simulated fire debris samples were consistently introduced to the same data set with varying levels of chemical diversity, a high versus a low similarity level would be more indicative of the actual level of association. Additionally, HCA analysis could be considered advantageous over PCA because association to a specific standard or specific cluster of standards does not change as the diversity of the data changes. As a result, the idea of data manipulation previously mentioned during PCA analysis is eliminated. Furthermore, HCA analysis takes into account all dimensions of the data set and displays them in the form of a single dendrogram with similarity levels as a means of numeric representation. During PCA analysis, all dimensions of the data are accounted for, but only the dimensions that contain useful information are used and only two or three dimensions can be compared simultaneously. As a result, looking at all dimensions or PCs would be very time consuming and would require numerous scores plots. In addition, if numeric representation is desired in PCA an additional step of calculating Euclidean distance or alternative metric is required. However, PCA does have an advantage in identifying the variables responsible for the association and discrimination whereas, in HCA, the original data must be interpreted to hypothesize why clustering is occurring. 5.3. Association of Simulated Fire Debris using k-NN 5.3.1. Commercial Ignitable Liquid Standards- Chemically Diverse Data Set k-Nearest neighbors was first performed on the chemically diverse data set containing the commercial gasoline, petroleum distillates, and isoparaffinic standards. Each commercial ignitable liquid was designated a defined class by specific ignitable liquid; however, all gasoline 109 standards were placed into one class due to the similarities of the ignitable liquids. Based on the given threshold resulting from the t distribution at a 95% confidence level for each class, the commercial class standards have a good class fit with the exception of approximately one to two replicates per ignitable liquid that do not fit within the threshold. Standards that fall outside the given threshold range are similar to standard replicates that cluster at a low similarity level to other replicates using HCA. For example, one gasoline replicate and one charcoal lighter replicate did not fall within the threshold and clustered to the corresponding replicates at lower similarity levels using HCA (see Figure 5.1 and Figure 5.3). However, as a distributional statistic is used to determine the threshold, it is anticipated that approximately 5% of the standards will exceed the given threshold. Given this expectation and the fact that the replicates clustered to the appropriate standard using HCA, the replicates were not removed and were used to represent the variance found within ignitable liquids. Classification of the fire debris samples was carried out using 1, 3, 5, 7, and 9 nearest neighbors. For all values of nearest neighbors, the fire debris samples containing diesel were all misclassified as kerosene. This misclassification is similar to the results observed during PCA (see section 4.5.1) in which the fire debris samples were incorrectly associated to the kerosene standard. All other simulated fire debris samples containing petroleum distillates spiked on nylon carpet and treated wood flooring were classified within the petroleum distillate class although correct classification to the specific ignitable liquid was not always achieved. Fire debris samples containing paint thinner spiked on nylon carpet were incorrectly classified as diesel. Incorrect classification may due to the loss of characteristic compounds of paint thinner from the burning process and because of the presence of substrate interference compounds that have overlapping 110 retention times with compounds present in diesel. Fire debris samples containing paint thinner spiked onto treated wood flooring were incorrectly classified as fuel injector (33% of samples) and charcoal lighter (67% of samples) using all values of nearest neighbors. Incorrect association most likely occurred due to the presence of the same normal alkanes between the wood treatment and charcoal lighter (C9-C12) and overlapping retention times of compounds present in paint thinner and charcoal lighter. Fire debris containing gasoline spiked on nylon carpet was dominated by substrate interferences (i.e., styrene, 1,2,3-trichloropropane, and biphenyl), with few of the characteristic gasoline compounds present. In addition, biphenyl (tR: 15.24 min.), a substrate interference compound from the carpet, has a similar retention time to C14 (tR: 15.25 min.) that is present in diesel, which may be why classification to the commercial diesel occurred. 5.3.2. Commercial Ignitable Liquid Standards- Refined Data Set k-Nearest neighbors was performed on the refined data set and classification of the simulated fire debris consisting of carpet spiked with diesel samples was investigated. Based on the given threshold for each class, the commercial class standards have a good class fit with the exception of approximately one replicate per class that did not fit within the given threshold. Similar to the replicates in the chemically diverse data set, some replicates that did not fit within the threshold range were clustered to the other replicates at a lower similarity level using HCA. However, other replicates that did not fit within the threshold appeared to be clustered well using HCA because in k-NN 5% of the population is expected to fall outside of the threshold range when a 95% confidence level is selected. Additionally, fewer replicates fell outside the threshold; as the number of samples within the data set decreases, it is expected that fewer samples will fall outside the threshold range. Similar to the chemically diverse data set, the 111 replicates were not removed but were used to represent the variance found within ignitable liquids. The simulated fire debris samples containing carpet spiked with commercial diesel were then classified based on the defined classes. Classification was performed using 1, 3, 5, 7, and 9 nearest neighbors and the percent correctly classified for each nearest neighbor is indicated in Table 5.1. Using only one nearest neighbor, only 11% of the simulated fire debris samples were correctly classified to the corresponding ignitable liquid and 89% were misclassified as fuel injector, a heavy petroleum distillate. Using just one nearest neighbor is susceptible to misclassification due to the presence of outliers and therefore using one neighbor for classification is not recommended. The percent of correctly classified samples increases as the number of nearest neighbors increases until the percent correctly classified maximizes at 67% using seven nearest neighbors. In this research, selecting a higher number of nearest neighbors, and therefore considering a larger number of standards, improved the classification success because each class is well defined. However, 33% of samples were still misclassified and, in each case, were misclassified as fuel injector, which is a heavy petroleum distillate. Unfortunately, the ability to determine if the classification of the simulated fire debris was a good fit was not possible. The inability to determine if a good class fit has been made is disadvantageous as k-NN is a hard classification procedure and classification will always be forced. However, it was possible to interpret the raw data and hypothesize why the observed misclassification occurred. A representative TIC of the simulated fire debris containing diesel, the commercial diesel standard, and the commercial fuel injector standard is shown in Figure 5.6. 112 Table 5.1: Percent classification of simulated fire debris containing carpet spiked with diesel to the corresponding commercial diesel standard using 1, 3, 5, 7, and 9 nearest neighbors Nearest Neighbors 1 Percent Classification to Diesel Standard (%) 11 3 33 5 44 7 67 9 67 113 Normalized Abundance A C12 C13 C14 C15 C11 C16 0 10 20 30 Retention Time (min) Normalized Abundance B C14 C15 C16 C13 C17 C12 C18 C19 0 10 20 C20 30 Retention Time (min) Figure 5.6: Representative total ion chromatogram of A) simulated fire debris consisting of carpet spiked with commercial diesel, B) commercial diesel standard, and C) commercial fuel injector standard with compounds identified 114 Figure 5.6 (cont’d) Normalized Abundance C C12 C13 C11 C14 C10 C15 C16 0 10 20 Retention Time (min) 115 30 The simulated fire debris containing diesel (Figure 5.6A) contains normal alkanes in the range C11-C16, while the commercial diesel standard (Figure 5.6B) contains normal alkanes in the range C12-C20. The commercial fuel injector standard (Figure 5.6C) contains normal alkanes in the range of C10-C16. After the burning process and introduction of interference compounds, some of the fire debris samples contain a normal alkane range more similar to fuel injector. Interference compounds dominate resulting in some of the heavier normal alkanes being masked by the larger abundance of interference compounds. As a result, some fire debris samples correctly classify to diesel, while others misclassify as fuel injector due to the change in normal alkanes observed because of the dominating substrate interferences. Similar to PCA when diesel was removed from the data set, 100% of the fire debris samples associate to the commercial fuel injector standard. This once again becomes a limitation, as not every commercial ignitable liquid on the market can be included in a given data set. Although classification to the specific ignitable liquid was not always achieved, classification to the proper ASTM class was still possible as diesel and fuel injector are both heavy petroleum distillates. All other simulated fire debris samples containing petroleum distillates spiked on nylon carpet and treated wood flooring were classified within the petroleum distillate class although not always to the corresponding specific ignitable liquid. Classification of fire debris samples containing paint thinner was not performed with this data set, as no isoparaffinic standards were present, while fire debris containing gasoline classified 100% to the commercial diesel standard. Similar to the chemically diverse data set, the fire debris samples containing gasoline associated to the commercial diesel standard. 116 Using a chemically diverse data set, all fire debris samples were misclassified (k=7) based on specific ignitable liquid, but were all classified to the corresponding ASTM class as kerosene is a heavy petroleum distillate. As a more refined data set was introduced, 67% of fire debris samples were properly classified (k=7) to the specific ignitable liquid used while others (33%) were only properly classified by ASTM class. These results indicate the need for a more standardized approach and demonstrate that the use of class references standards representative of different ASTM classes have potential for association and classification purposes. 5.3.3. Class Reference Standards k-Nearest neighbors was performed on the class reference data set and classification of the simulated fire debris consisting of carpet spiked with diesel samples was investigated Based on the given threshold for each class, the commercial class standards each have a good class fit with the exception of a few replicates that fall outside of the threshold range. As the data set is small, only a few replicates are expected to fall outside of the threshold range; however, no replicates appear to be substantially different when compared to the corresponding HCA dendrogram. Classification of the fire debris samples was carried out again using 1, 3, 5, 7, and 9 nearest neighbors. For all values of nearest neighbors, the fire debris samples containing diesel were all properly classified to the heavy petroleum distillate class reference standard. However, as previously mentioned the class fit of the samples could not be analyzed. Using class reference standards for k-NN analysis allows for fire debris samples to be classified based on ASTM class. Generating a larger class reference standard set could potentially be utilized to help standardize a useful data set for fire debris classification. 117 Using class reference standards and k-NN, all simulated fire debris samples containing petroleum distillates (both carpet and wood substrates) were classified as heavy petroleum distillates. Classification of the fire debris containing paint thinner was not performed, as no isoparaffinic class standard was present. Fire debris containing gasoline was misclassified as a heavy petroleum distillate; however, using 3-nearest neighbors, 22% of the simulated fire debris correctly classified to the gasoline standard, while the remaining 78% misclassified as a heavy petroleum distillate. Unsuccessful classification of gasoline may be considered disadvantageous, as it is one of the most common ignitable liquids used. However, extensive burning of the gasoline fire debris samples did not contain many characteristic compound of the commercial gasoline. In addition, a compound from the nylon carpet, biphenyl, had overlapping retention times with C14, a characteristic compound in diesel (see section 5.3.1). 5.4. Summary The use of HCA and k-NN as multivariate statistical procedures to associate and classify simulated fire debris samples to corresponding ASTM class has some advantages and disadvantages. HCA is beneficial compared to PCA because similarity levels are calculated within the analysis and all dimensions are accounted for and displayed simultaneously. In contrast, in PCA, all dimensions are accounted for, but only two or three dimensions can be observed simultaneously making interpretation of many dimensions time consuming. Further, if numeric representation of association is desired in PCA, an additional metric (e.g. Euclidean distances) must be calculated. However, there are some disadvantages to HCA such as the inability to determine which variables are contributing to the clustering and that the similarity levels calculated are only relative to a given data set. That is, the similarity between the same two samples will change depending on the content of the data set. This makes interpreting a high 118 versus a low similarity level difficult. However if a standard data set were used every time, similarity levels would be more representative. Using HCA, association of the fire debris to the specific commercial ignitable liquid was not possible, but association to the corresponding ASTM class was possible using both commercial and class reference standards. Association using HCA was not largely affected by introducing a more diverse data set; however, the similarity level at which the simulated fire debris associated was affected. Using k-NN, classification of the fire debris to the specific commercial ignitable liquid was possible with the refined data set, but as the data set became more diverse classification success was minimal. However, classification to the corresponding ASTM class was possible using k-NN, which is all that is required in fire debris analysis. In addition, one major disadvantage of k-NN is that classification is always forced and there is no way to assess class fit of unknown samples. This is disadvantageous because a sample will always be classified to one class even if it is dissimilar from all of the given classes and, as there is no way to assess class fit it is difficult to determine if the classification was a good fit or if the samples are not similar. Class reference standards have demonstrated potential to associate and classify fire debris samples to the corresponding ASTM class using HCA and k-NN. Although both commercial and class reference standards were successful in class association, the class reference standards could be beneficial for generating a more standardized data set. If a more standardized data set is used in HCA, the given similarity levels will be more representative of similarity between the simulated fire debris and ignitable liquid and therefore would be more beneficial when associating samples to different ASTM classes. As a result, the problem that a similarity level is only relative to a given data set will be eliminated. Additionally, a more standardized data set for 119 k-NN classification would be beneficial as successful classification can be affected by the composition of the data set. Using class reference standards in both statistical procedures would reduce the idea of manipulating the data set in order to obtain results that are more desirable. 120 REFERENCES 121 REFERENCES 1. Said HES, Tan TN, Baker KD. Personal identification based on handwriting. Pattern Recognition 2000; 33: 149-160. 2. Kumar R, Pal NR, Chanda B, Sharma JD. Forensic Detection of Fraudulent Alteration in Ball-Point Pen Strokes. IEEE Transactions on Information Forensics and Security 2012; 7: 809-820. 3. Jiang Y, Liu P. Feature extraction for identification of drug and explosive concealed by body packaging based on positive matrix factorization. Measurement 2014; 47: 193-199. 122 6. Conclusions 6.1. Summary 6.1.1. Objectives and Goals Previous work using statistical analysis has been carried out to determine a more objective approach to fire debris analysis (1-3); however, the methods studied consistently used commercially available ignitable liquids as reference standards. While these statistical procedures have shown some success, the use of commercially available ignitable liquids as standards can be a limitation. Association and classification of simulated fire debris samples can be affected by the number and chemical diversity of reference standards in the data set. Additionally, the large number of commercially available ignitable liquids means that including each one in a data set is not practical. Further, selecting appropriate liquids to include is difficult because of the chemical variation within an ASTM class. This research focused on developing class reference standards that are characteristic of ASTM chemical classes with the intention of using the standards in subsequent multivariate statistical analysis to provide a more standardized approach for the analysis of fire debris evidence and overcome problems associated with selecting suitable reference standards to include in the data set for analysis. In addition, the impact of the data set composition on successful association of simulated fire debris samples to the corresponding standard using a variety of multivariate statistical procedures was investigated. 6.1.2. Association of Fire Debris using PCA Principal components analysis (PCA) has previously been used to associate simulated fire debris samples to ignitable liquid standards (1,2). In this study, PCA was used as the initial 123 statistical procedure to investigate the utility of class reference standards and evaluate the impact of data set selection for statistical analysis. Commercially available ignitable liquid standards from three different ASTM classes were used and class reference standards from the gasoline and petroleum distillate class were developed all of which were spiked onto Kimwipes® for analysis. Class reference standards representative of the gasoline and petroleum distillate classes were developed as these ignitable liquids are commonly found in fire debris evidence. In addition, simulated fire debris samples containing ignitable liquids spiked onto either nylon carpet with carpet padding or treated wood flooring were generated. All standards and simulated fire debris samples were passive-headspace extracted following procedures recommended by ASTM International and analyzed by gas chromatography-mass spectrometry (GC-MS). First, PCA was performed on two data sets containing commercially available ignitable liquids. One data set, referred to as the chemically diverse data set, contained three gasoline, six petroleum distillate, and two isoparaffinic standards while the second data set, referred to as the refined data set, contained three gasoline and four petroleum distillate standards. The simulated fire debris samples were then projected onto the resulting scores plots. In addition, Euclidean distances were calculated between the scores of the fire debris samples and the ignitable liquid standards. Using the chemically diverse data set, association of the simulated fire debris to the corresponding ignitable liquid (diesel) was not possible; however, successful association to the corresponding ASTM class was possible. When PCA was performed using the refined data set, association to the corresponding ignitable liquid (diesel) was successful. 124 A third iteration of PCA was performed using the class reference standards to generate scores and loadings plots and then projecting the simulated fire debris samples onto the scores plot. Association to the corresponding ASTM class was possible despite chemical differences of the class standards when compared to commercially available ignitable liquids. However, the class reference standards were well distinguished using PCA due to the use of fewer more representative standards making association relatively easy. Association to the corresponding ignitable liquid or ASTM class was possible using both commercially available reference standards and the generated class reference standards. However, the association was affected by the data set selected where different results occurred with different data sets. Due to the large number of commercially available ignitable liquids, including each one in a data set is neither practical nor feasible. While association was possible using class reference standards, PCA does have some limitations for this application. Visual interpretation of the PCA scores plot may be subjective and an additional metric (in this case, the Euclidean distance) was required to quantitatively confirm association. When calculating Euclidean distance, the number of PCs used was based on 95% of the variance and was used to determine which standard the fire debris samples was most closely associated. Although PCA takes all of the dimensions of the data set into account to calculate scores, only two or three dimensions can be assessed simultaneously. However, PCA is beneficial as the loadings plots that are generated for each PC can be used to identify the variables contributing the most to the variance. 125 6.1.3. Association of Fire Debris using HCA Hierarchical cluster analysis (HCA) is another statistical procedure that has previously been used to associate simulated fire debris samples to the corresponding ignitable liquid (2). In this research, HCA was also used to investigate the impact of data set selection on association of fire debris samples to corresponding reference standards and to investigate the utility of class reference standards for this purpose. First, HCA was performed on the two data sets containing commercially available ignitable liquids with varying levels of chemical diversity and the simulated fire debris samples. Using both of these data sets, association of the simulated fire debris to the corresponding ASTM class was successful; however, association to the specific ignitable liquid could not be determined due to the order of clustering. For example, the simulated fire debris samples first grouped to a cluster containing the heavy petroleum distillate standards (diesel and fuel injector). As diesel and fuel injector are chemically similar, these standards clustered to one another first before the simulated fire debris was clustered. As a result, the specific ignitable liquid the simulated fire debris was most similar to was not determined. Using class reference standards, successful association to the corresponding ASTM class was possible. Using HCA, association was not largely affected by chemical diversity of the data set although it was not possible to determine if association to the specific ignitable liquid was affected. In addition, HCA provided a similarity level as a means of quantitative association and no additional metrics had to be calculated. Furthermore, HCA is beneficial as all dimensions of the data are displayed in a single dendrogram and the order in which clustering occurs is more straightforward than visually interpreting PCA scores plots. However, the similarity levels are 126 only relative to a given data set and no output is provided in HCA to identify the variables contributing to the clustering observed. As a result, raw data must be interpreted to hypothesize why clustering occurs the way in which it does. 6.1.4. Association of Fire Debris using k-NN k-Nearest Neighbors (k-NN) was also investigated as an additional statistical procedure that has not been previously used for applications in fire debris analysis. k-NN is a classification procedure that can be used to classify unknown samples to a defined class of samples of known origin. In this research, k-NN was used to investigate the impact of data set selection for classification and to demonstrate the utility of class reference standards. Using the chemically diverse data set, the fire debris samples were incorrectly classified to the specific ignitable liquid, but correctly classified to ASTM class using all values of nearest neighbors. Using the refined data set, classification of simulated fire debris samples to the corresponding ignitable liquid was possible to an extent. Using only one nearest neighbor, the correct classification rate was 11%; however, as the number of nearest neighbors increased, the classification rate increased to a maximum of 67%, using 7 nearest neighbors. The remaining simulated fire debris samples that were incorrectly classified were still classified to the corresponding ASTM class (i.e., petroleum distillate). However, when k-NN was performed using class reference standards, 100% of fire debris samples were correctly classified using all values of nearest neighbors. Similar to PCA, k-NN classification was affected by variations in chemical diversity of the data set. While classification was successful using class reference standards, no output was provided to determine the variables contributing to classification Also the goodness of the class 127 fit for the standards could be assessed but could not be evaluated for the simulated fire debris samples. 6.2. Future Work The multivariate statistical procedures investigated in this research provided similar association and classification results, but each had advantages and disadvantages over one another. Although performing the proper statistical procedure is important, standardizing the data set used as the reference standards is more important for an objective approach. To further standardize the statistical analysis of fire debris evidence, the current class reference standards need to be further developed to include additional characteristic compounds in each of the standards. Moreover, additional class reference standards representative of other ASTM classes should be generated to be more representative of the commercially available liquids on the market. Multivariate statistical procedures then need to be applied using the newly developed class reference standards to investigate the potential of a more standardized approach. Overall, this research highlighted the potential utility of class reference standards for a more objective and standardized approach. Developing a set of standards useful for a statistical approach could make fire debris analysis more reliable, could help reduce the potential of false positive and negatives, could aid in convincing a jury, and would satisfy the Daubert standard. While multivariate statistical procedures are not currently used in forensic laboratories, developing a reliable standardized approach would help if these methods were ever implemented. 128 REFERENCES 129 REFERENCES 1. Baerncopf JM, McGuffin VL, Smith RW. Association of ignitable liquid residues to neat ignitable liquids in the presence of matrix interferences using chemometric procedures. Journal of Forensic Sciences 2011; 56: 70-81. 2. Prather KR, McGuffin VL, Smith RW. Effect of evaporation and matrix interferences on the association of simulated ignitable liquid residues to the corresponding liquid standard. Forensic Sciences International 2012; 222: 242-251. 3. Tan B, Hardy JK, Snavely RE. Accelerant classification by gas chromatography/mass spectrometry and multivariate pattern recognition. Analytica Chimica Acta 2000; 422: 37-46. 130