! ! ! AN INVESTIGATION OF UNSUPERVISED AND SUPERVISED MULTIVARIATE STATISTICAL PROCEDURES FOR THE ANALYSIS OF FIRE DEBRIS By Suzanne Towner A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Forensic Science 2012 ABSTRACT AN INVESTIGATION OF UNSUPERVISED AND SUPERVISED MULTIVARIATE STATISTICAL PROCEDURES FOR THE ANALYSIS OF FIRE DEBRIS By Suzanne Towner Gas chromatography-mass spectrometry (GC-MS) is the method of choice for analyzing fire debris. Analysts perform a visual comparison between chromatograms of fire debris and ignitable liquid standards. The analysis is both complex and subjective due to evaporation of the liquid and interference compounds from the matrix as well as thermal degradation of both the matrix and liquid. This research investigates the use of unsupervised and supervised multivariate statistical procedures for simplifying the analysis and creating a more objective approach. Principal components analysis, an unsupervised technique, was used in conjunction with Pearson product moment correlation coefficients to successfully associate simulated fire debris to corresponding ignitable liquid standards. To do this, liquid standards of gasoline and kerosene were evaporated to different evaporation levels. The liquids were spiked onto unburned and burned wood that had been previously treated with Danish Oil. Additionally, simulated debris samples were generated by spiking the liquids onto the matrix prior to burning. The samples were extracted, analyzed by GC-MS, and subjected to the unsupervised data analysis procedures. Soft independent modeling of class analogy, a supervised classification technique was applied to replicate chromatograms from a set of six ignitable liquid standards, different from those used above. The standards’ chromatograms were split into training and test sets. The training set was used to generate models of each liquid to which the test set was classified. Classification of the liquids was successfully performed using the total ion chromatograms and extracted ion chromatograms. Table of Contents List of Tables v List of Figures vi Chapter 1: Introduction 1.1 Background 1.2 Ignitable Liquid Classification 1.4 Current Analysis of Fire Debris 1.5 Difficulties in Analysis of Fire Debris 1.6 Literature Review 1.6.1 Effects of Matrix Interferences and Thermal Degradation 1.6.2 The Application of Multivariate Statistical Procedures 1.7 Considerations for Statistical Analyses 1.8 Research Objectives and Goals REFERENCES 1 1 1 4 6 8 8 12 17 19 22 Chapter 2: Theory 2.1 Passive Headspace Extraction 2.2 Gas Chromatography-Mass Spectrometry 2.3 Data Pretreatment 2.3.1 Smoothing 2.3.2 Retention Time Alignment 2.3.3 Normalization 2.4 Data Analysis 2.4.1 Principal Components Analysis 2.4.2 Pearson Product Moment Correlation Coefficients 2.4.3 Soft Independent Modeling of Class Analogy REFERENCES 24 24 24 36 36 38 39 41 41 43 44 49 Chapter 3: Association of Simulated Fire Debris Samples to Corresponding Standards Using Unsupervised Statistical Procedures 3.1 Introduction 3.2 Materials and Methods 3.2.1 Ignitable Liquid Standards 3.2.2 Surface-Treated Wood Samples 3.2.3 Inherent Matrix Interference Samples 3.2.4 Determination of Optimal Burn Time 3.2.5 Matrix Interference/Thermal Degradation Samples 3.2.6 Simulated Fire Debris Samples 3.2.7 Analysis of Samples by GC-MS 3.2.8 Data Pretreatment 3.2.9 Principal Components Analysis 3.2.10 Pearson Product Moment Correlation Coefficients 3.3 Results and Discussion 3.3.1 Characterization of Compounds Present in Ignitable Liquid Standards 3.3.1.1 Gasoline 50 50 51 51 53 53 54 54 55 55 56 57 58 58 58 58 ! """! 3.3.1.2 Kerosene 3.3.2 Association and Discrimination of Ignitable Liquid Standards 3.3.3 PPMC Coefficients for Ignitable Liquid Standards 3.3.4 Characterization of Compounds Present in Surface-Treated Wood Flooring 3.3.5 Optimization of Burn Times 3.3.6 Association of Samples to Corresponding Standards in the Presence of Inherent Matrix Interferences and Thermal Degradation 3.3.7 PPMC Coefficients for Inherent Matrix Interference Samples 3.3.8 PPMC Coefficients for Matrix Interference/Thermal Degradation Samples 3.3.9 Association of Simulated Fire Debris Samples to Corresponding Standards 3.3.10 PPMC Coefficients for Simulated Fire Debris Samples 3.4 Summary REFERENCES 60 62 71 71 74 75 83 89 91 96 99 102 Chapter 4: Classification of Ignitable Liquid Standards using Soft Independent Modeling of Class Analogy 103 4.1 Introduction 103 4.2 Materials and Methods 104 4.2.1 Liquid Standards 104 4.2.2 Analysis of Standards by GC-MS 105 4.2.3 Data Pretreatment 105 4.2.4 Principal Components Analysis 106 4.2.5 Soft Independent Modeling of Class Analogy 106 4.3 Results and Discussion 107 4.3.1 Characterization of Ignitable Liquid Standards 107 4.3.2 Principal Components Analysis of the Entire TIC Data set 110 4.3.3 Classification of Ignitable Liquid Standard TICs Using SIMCA 115 4.3.3.1 Coomans’ plots 117 4.3.3.2 Sample-to-Model Distance Versus Leverage Plots 122 4.3.3.3 The Unclassified Gasoline Sample 123 4.3.4 Classification of Ignitable Liquid Standard EICs Using SIMCA 135 4.3.4.1 Alkane EIC, m/z 99 136 4.3.4.2 EICs: m/z 91, 83, and 128 142 4.4 Summary 143 REFERENCES 146 Chapter 5 Conclusions 5.1 Summary of Research 5.1.1 Research Objectives and Goals 5.1.2 Unsupervised Multivariate Statistics Study Summary 5.1.3 Supervised Multivariate Statistics Study Summary 5.2 Future Work ! "#! 147 147 147 147 151 153 List of Tables Table 1.1: ASTM International classification of ignitable liquids. 2 Table 3.1: Mean Pearson product moment correlation coefficients ± standard deviations calculated for replicates of standards at each evaporation level (n=105). 72 Table 3.2: Mean Pearson product moment correlation coefficients ± standard deviations for replicates of the inherent matrix interference samples (n=105) and for samples to 0% evaporated gasoline and kerosene (n=225). 84 ! Table 3.3: Mean Pearson product moment correlation coefficients ± standard deviations for replicates of the matrix interference/thermal degradation samples (n=105) and for samples to 0% evaporated gasoline and kerosene (n=225). 90 Table 3.4: Mean Pearson product moment correlation coefficients ± standard deviations for replicates of the simulated fire debris samples (n=105) and for samples to 0% evaporated gasoline and kerosene (n=225). 97 Table 4.1. The suggested number of principal components for soft independent modeling of class analogy on total ion chromatograms. 116 Table 4.2. Classification Table of Ignitable Liquid TICs at 10% Significance Level. 118 Table 4.3. The suggested number of principal components for soft independent modeling of class analogy on extracted ion chromatograms (m/z 99).! 138! ! ! #! ! ! List of Figures Figure 2.1: Schematic of a gas chromatograph. ! 26 Figure 2.2: Schematic of a mass spectrometer. ! 32 Figure 2.3: Diagram of a quadrupole mass analyzer. ! 34 Figure 3.1: Total ion chromatograms of A) 0%, B) 50%, and C) 90% evaporated gasoline. The internal standard used was nitrobenzene. ! 59 Figure 3.2: Total ion chromatograms of A) 0%, B) 50%, and C) 90% evaporated kerosene. The internal standard used was nitrobenzene. ! 61 Figure 3.3: Scores plot of PC1 versus PC2 based on the total ion chromatograms for gasoline and kerosene at the three different evaporation levels. In terms of color, blue, green, and purple represent 0%, 50%, and 90% evaporated kerosene while red, orange, and yellow represent 0%, 50%, and 90% evaporated gasoline. For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this thesis. ! 63 Figure 3.4: Loadings plot of PC1 based on the total ion chromatograms of the unevaporated and evaporated ignitable liquid standards. ! 64 Figure 3.5: Loadings plot of PC2 based on the total ion chromatograms of the unevaporated and evaporated ignitable liquid standards. ! 67 Figure 3.6: Mean-centered total ion chromatogram of the 50% evaporated gasoline standard demonstrating the introduction of n-alkanes from the kerosene standards. ! 68 Figure 3.7: Total ion chromatograms of extracts of surface-treated wood burned for A) 0 seconds, B) 30 seconds, and C) 150 seconds. 73 ! Figure 3.8: Scores plot of PC1 versus PC2 based on the total ion chromatograms for the ignitable liquid standards, represented by the squares, and the projected scores of the inherent matrix interference samples, represented by the circles. In terms of color, blue, green, and purple represent 0%, 50%, and 90% evaporated kerosene while red, orange, and yellow represent 0%, 50%, and 90% evaporated gasoline. 76 ! ! #"! Figure 3.9: Total ion chromatograms of a 50% evaporated gasoline standard (green) and two 50% evaporated gasoline inherent matrix interference samples (red and black), demonstrating the differences in abundance between the standards and samples. 78 ! Figure 3.10: Scores plot of PC1 versus PC2 based on the total ion chromatograms for the ignitable liquid standards, represented by the squares, and the projected scores of the matrix interference/thermal degradation samples, represented by the circles. In terms of color, blue, green, and purple represent 0%, 50%, and 90% evaporated kerosene while red, orange, and yellow represent 0%, 50%, and 90% evaporated gasoline. 81 ! Figure 3.11: Total ion chromatograms of a kerosene standard (red) and a matrix interference/thermal degradation sample (black), demonstrating the difference in peak width between the standards and samples. ! 87 Figure 3.12: Scores plot of PC1 versus PC2 based on the total ion chromatograms for the ignitable liquid standards, represented by the squares, and the projected scores of the simulated fire debris samples, represented by the circles. In terms of color, blue, green, and purple represent 0%, 50%, and 90% evaporated kerosene while red, orange, and yellow represent 0%, 50%, and 90% evaporated gasoline. 92 ! Figure 3.13: Total ion chromatograms of the C2-alkylbenzenes from the five simulated fire debris samples generated using gasoline, demonstrating the variation in abundances across samples. 94 ! Figure 4.1: Total ion chromatograms of A) insect repellent, B) gasoline, and C) paint thinner, D) fuel stabilizer, E) fuel injector cleaner, and F) diesel with selected peaks labeled. ! 108 Figure 4.2: Scores plot of PC1 versus PC2 based on the total ion chromatograms of the ignitable liquid standards training and test sets: insect repellent (green), gasoline (orange), paint thinner (yellow), diesel (blue), fuel injector cleaner (black), and fuel stabilizer (red). 111 ! Figure 4.3: Loadings plot of PC1 based on the total ion chromatograms of the ignitable liquid standards (training and test sets). 112 ! Figure 4.4: Loadings plot of PC2 based on the total ion chromatograms of the ignitable liquid standards (training and test sets). 114 ! Figure 4.5: Coomans’ plot for the gasoline and insect repellent models (at a 10% significance level) based on the total ion chromatograms of the training sets. The sampleto-model distances are plotted for each of the ignitable liquids in the test set: insect ! #""! repellent (green), gasoline (orange), paint thinner (yellow), diesel (blue), fuel injector cleaner (black), and fuel stabilizer (red). The class membership limit for the gasoline model is overlaid on the plot in orange while the limit for the insect repellent model is in green. 120 ! Figure 4.6: Coomans’ plot (at 10% significance level) for the gasoline and insect repellent models based on the total ion chromatograms of the training sets. The sampleto-model distances are plotted for the gasoline test samples (orange). The class membership limit for the gasoline model is overlaid on the plot in orange while the limit for the insect repellent model is in green. 121 ! Figure 4.7: Sample-to-model distance versus leverage plot for the gasoline model (at a 10% significance level) based on the total ion chromatograms of the gasoline training set. The sample-to-model distances and leverage are plotted for each of the ignitable liquids in the test set: insect repellent (green), gasoline (orange), paint thinner (yellow), diesel (blue), fuel injector cleaner (black), and fuel stabilizer (red). The class membership limit of both sample-to-model distance and leverage for the gasoline model is overlaid on the plot in orange. 124 ! Figure 4.8: Sample-to-model distance versus leverage plot for the gasoline model (at a 10% significance level) based on the total ion chromatograms of the gasoline training set. The sample-to-model distances and leverage are plotted for gasoline test samples (orange). The class membership limit of both sample-to-model distance and leverage for the gasoline model is overlaid on the plot in orange. 125 ! Figure 4.9: Coomans’ plot (at 25% significance level) for the gasoline and insect repellent models based on the total ion chromatograms of the training sets. The sampleto-model distances are plotted for the gasoline test samples (orange). The class membership limit for the gasoline model is overlaid on the plot in orange while the limit for the insect repellent model is in green. 127 ! Figure 4.10: Sample-to-model distance versus leverage plot for the gasoline model (at a 25% significance level) based on the total ion chromatograms of the gasoline training set. The sample-to-model distances and leverage are plotted for gasoline test samples (orange). The class membership limit of both sample-to-model distance and leverage for the gasoline model is overlaid on the plot in orange. 128 ! Figure 4.11: Loadings plot of PC1 of the gasoline model based on the total ion chromatograms of the gasoline standards. ! 130 Figure 4.12: Modeling power for the gasoline model based on the total ion chromatograms of the gasoline training samples. The red line represents modeling power of 0.3. Peaks that extend above this line significantly impact the model. 132 ! #"""! ! Figure 4.13: Modeling power for the insect repellent model based on the total ion chromatograms of the insect repellent training samples. The red line represents modeling power of 0.3. Peaks that extend above this line significantly impact the model. 133 ! Figure 4.14: A total ion chromatogram of insect repellent demonstrating the rise in baseline that occurs at the end of the chromatogram. ! 134 Figure 4.15: Scores plot of PC1 versus PC2 based on the extracted ion chromatograms (m/z 99) of the ignitable liquid standards training and test sets: insect repellent (green), gasoline (orange), paint thinner (yellow), diesel (blue), fuel injector cleaner (black), and fuel stabilizer (red). 137 ! Figure 4.16: Modeling power for the gasoline model based on the extracted ion chromatograms (m/z 99) of the gasoline training samples. The red line represents modeling power of 0.3. Peaks that extend above this line significantly impact the model. 139 ! Figure 4.17: Scores plot of PC1 versus PC6 based on the extracted ion chromatograms (m/z 99) of the ignitable liquid standards training and test sets: insect repellent (green), gasoline (orange), paint thinner (yellow), diesel (blue), fuel injector cleaner (black), and fuel stabilizer (red). 141 ! ! ! ! "$! Chapter 1: Introduction 1.1 Background Every year approximately 267,000 fires are the result of arson, which may be defined as 1 the setting of a fire with intent to cause damage or harm . Arson is a prevalent and destructive 1 crime that costs the United States an estimated $684,000,000 in damages, annually . As a result, numerous fire investigations are conducted yearly. It is the job of fire investigators to determine if the fire was the result of an accident or arson. Oftentimes, in intentional fires, an accelerant is used to maximize the spread and damage of the fire. As a result, debris is collected from the fire scene and taken to a forensic laboratory where it is extracted and analyzed for the presence of accelerants, such as ignitable liquids. 1.2 Ignitable Liquid Classification Ignitable liquids are volatile and easy to ignite. They have a broad range of uses and chemical compositions. ASTM International developed a classification scheme for ignitable 2 liquids based on their chemical composition (Table 1) . The eight classes are gasoline, petroleum distillates, isoparaffinic products, aromatic products, naphthenic paraffinic products, normal alkane products, oxygenated products, and miscellaneous. Each class, except gasoline, is further characterized by the length of carbon chains present in the liquid, such as light (C4-C9), medium (C8-C13), and heavy (C9-C20). ! %! ! Table 1.1: ASTM International classification of ignitable liquids. ! Composition Gasoline- all brands, including gasohol C3- and C 4- alkylbenzenes and various aliphatic compounds Petroleum Distillates Homologous series of nalkanes; less significant isoparaffinic, cycloparaffinic, and aromatic compounds Petroleum ether, cigarette lighter fluids, camping fluids Charcoal starters, paint thinners, dry cleaning solvents Kerosene, diesel fuel, jet fuels, charcoal starters Isoparaffinic Products Branched chain (isoparaffinic); cyclic (naphthalenic) alkanes and n-alkanes insignificant or absent Aviation gas, specialty solvents Charcoal starters, paint thinners, copier toners Commercial specialty solvents Aromatic compounds; aliphatic compounds absent or insignificant Paint and varnish removers, automotive parts cleaners, xylenes, toluene-based products Automotive parts cleaners, specialty cleaning solvents, insecticide vehicles, fuel additives Insecticide vehicles, industrial cleaning solvents &! Class Aromatic Products Light (C4 -C9) Medium (C8-C13) Heavy (C8 -C20+) Fresh gasoline is typically in the range of C4 -C12 2 ! Table 1.1 (continued)! Composition Light (C4 -C9) Medium (C8-C13) Heavy (C8 -C20+) Naphthenic Paraffinic Products Branched chain (isoparaffinic) and cyclic (naphthalenic) alkanes insignificant or absent Cycohexane-based solvents/products Charcoal starters, insecticide vehicles, lamp oils Insecticide vehicles, lamp oils, industrial cleaning solvents n-Alkane Products Only n-alkanes, typically containing 5 or less Solvents, pentane, hexane, heptane Candle oils, copier toners Candle oils, carbonless forms, copier toners Oxygenated Solvents Oxygenated products including alcohols, esters, ketones; major components include toluene or xylene Alcohol, ketones, lacquer thinners, fuel additives, surface preparation solvents Lacquer thinners, industrial solvents, metal cleaners/gloss removers Liquids that cannot otherwise be classified Single component products, blended products enamel reducers Turpentine products, blended products, specialty products '! Class OthersMiscellaneous 3 Blended products, specialty products 1.3 Extraction of Volatile Compounds from Fire Debris There are a number of different procedures that are approved by ASTM International for the extraction of ignitable liquids from fire debris. The extraction method is tailored to the sample matrix that is to be analyzed. For example, the ASTM standard does not recommend that a liquid extraction be used for porous debris as the matrix may trap the solvent and result in an inefficient extraction. The passive headspace extraction, on the other hand, is extremely sensitive and efficient. Consequently, the passive headspace extraction is more commonly used in many forensic laboratories. When debris is collected from the fire scene, it is sealed in an airtight container to prevent any volatile compounds from being lost to the atmosphere. Analysts perform a passive headspace extraction by suspending an activated charcoal strip (ACS) in the headspace of the container. The sample is then placed in an oven for 2 to 24 hours at a temperature ranging from 50° C to 3 80° C . The volatile compounds from the debris adsorb onto the ACS. An organic solvent such as carbon disulfide, n-pentane, diethyl ether, or methylene chloride is used to elute the volatiles from the ACS. The resulting extract is then analyzed most commonly by gas chromatographymass spectrometry (GC-MS). 1.4 Current Analysis of Fire Debris Gas chromatography-mass spectrometry is the gold standard by which fire debris samples are analyzed for evidence of residues from ignitable liquids. A total ion chromatogram (TIC) is the product of analysis by GC-MS. A TIC is a graph in which retention time is on the abscissa and abundance is on the ordinate axis. This graph shows the ion current from all ions for each peak present in the chromatogram and, consequently, represents every extractable compound in ! (! the debris. A mass spectrum is also generated for each compound and can be used to determine their identities. Fire debris analysts perform a visual comparison between the TIC of an ignitable liquid reference standard and that of the fire debris. Analysts look for similar compounds or peaks in the two chromatograms, aiming to identify the presence of an ignitable liquid. The standards used for comparison are typically generated in-house. Standard operating procedures are established so that certain compounds characteristic of the ignitable liquid must be present in the chromatogram in order for the analyst to determine that the liquid is indeed present in the debris sample. Additionally, analysts use the relative ratios of peak abundances or peak patterns from the standards to identify the presence of an ignitable liquid in the debris. One difficulty with this type of analysis is that peaks from the debris itself, or matrix interference compounds, may mask the presence of compounds from the ignitable liquid. As a result, the TIC can be very complex and difficult to interpret. In order to overcome this problem and simplify the interpretation, different types of chromatograms can be generated using computer software. An extracted ion chromatogram (EIC) shows only the contribution of a specific selected ion to peak abundances. An EIC can be more sensitive than a TIC because the selected ion may be present only in compounds from the ignitable liquid and not in the matrix. The EIC could therefore reveal the presence of compounds indicative of an ignitable liquid despite the additional matrix compounds. Similarly, an extracted ion profile (EIP), which consists of multiple extracted ions, can also be used to reveal concealed peaks. Another alternative approach is to use selected peaks based on their retention times in the chromatogram. Selected peaks may be especially helpful when analyzing samples according to pattern recognition or similarities in peak ratios. In this method, compounds characteristic of the ignitable liquid are selected while all ! )! others are removed from the chromatogram. Using this procedure, all compounds at the other retention times, which are likely to originate from the matrix, are removed to decrease the complexity of the data analysis. However, ions from the matrix may contribute to the height of the selected peaks because the abundances are typically based on the total ion current as opposed to extracted ions. 1.5 Difficulties in Analysis of Fire Debris While the visual comparison between the chromatogram from fire debris and that of a standard is, conceptually, quite simple, it is greatly complicated by many factors such as the evaporation of the ignitable liquid, interference compounds from the debris matrix, and thermal degradation of both the liquid and the matrix. The evaporation of the ignitable liquid is quite problematic because it can lead to the loss of volatile compounds that are characteristic of a specific liquid and that could aid in its identification. Many times the ignitable liquids used to commit arson are not purchased specially for the deed; instead, they are taken from garages and storage sheds where they may have been sitting for a period of time. Evaporation of the volatile components can occur during this time. Evaporation can also occur during the burning of the fire since the heat of the flame can cause the more volatile compounds in the ignitable liquid to evaporate. Certain volatile compounds may be partially or totally removed from the debris sample. Evaporation of volatiles is reflected in the debris chromatogram and the partial or total loss could result in peak ratios differing from those observed in the corresponding reference standard. While compounds can be removed from the chromatogram through evaporation, they can also be added to the chromatogram from the debris matrix. Common items such as clothing and ! *! building materials contain volatile compounds that may be incorporated into the chromatogram of the debris sample. As a result, it is extremely important to know the type of debris matrix being analyzed and to identify the compounds that the matrix is likely contributing to the chromatogram. To combat this problem, analysts are often given more debris from the scene that is unlikely to be contaminated with an ignitable liquid. The additional debris sample is analyzed and a chromatogram is generated, which is used as an exclusionary tool to determine the compounds that come from the matrix itself. Thermal degradation, which occurs at temperatures between 100° C and 300° C, affects 4 both the debris and the ignitable liquid . Thermal degradation is the breakdown of compounds that occurs due to the heat of the fire. This can further complicate the chromatogram of a debris sample because the degradation of the compounds can lead to the generation of new and, sometimes unexpected, compounds. Additionally, thermal degradation can lead to a change in peak ratios in the chromatogram of the debris as compared to the chromatogram of the reference standard. All of the afore-mentioned factors work together to complicate the already subjective visual comparison between the chromatograms of the debris and standards. These complications can lead to analysts testifying in court that an ignitable liquid was used to set a fire when, in fact, it was not and vice versa. There is an obvious need for improved data analysis and interpretation procedures, as well as safeguards to reduce the number of incorrect conclusions by analysts. In a 2009 report entitled Strengthening Forensic Science in the United States: A Path Forward, the National Academy of Sciences criticized the entire forensic community for the lack of peerreviewed research able to withstand Daubert hearings and provide statistical evaluations of the 5 evidence . ! +! 1.6 Literature Review While arson investigation is extremely complex, research has been performed to improve current methods in an attempt to simplify fire debris analysis. Some studies have identified interference compounds that come from different matrices, as well as their thermal degradation products. Other studies have investigated the usefulness of statistical procedures to increase the certainty of analysts’ findings and avoid the subjectivity involved in a simple visual comparison of chromatograms. 1.6.1 Effects of Matrix Interferences and Thermal Degradation Lentini et al. addressed the issue of inherent matrix interference compounds from many 6 common items . The matrices examined included materials such as clothing, shoes, and building materials. The aim of this study was to demonstrate that compounds indicative of petroleum products can routinely be detected in common items even though no ignitable liquids have been added to them. These materials, examined without being burned, were extracted using the passive headspace method and analyzed by GC-MS. The results show that some items contained compounds indicative of an ignitable liquid; however, the peak ratios were such that an experienced analyst would not likely mistake them as coming from an ignitable liquid. Other items, on the other hand, such as spandex, gave a “strong pattern” indicative of kerosene. These results were not unexpected since petroleum products are used to manufacture many common household items. In terms of building materials, the authors concluded that the presence of floor coatings may be a larger problem than previously recognized because petroleum distillate ! ,! solvents are used in many coatings including stains and may be detected months after application. Most wood used in homes is treated with a finish such as paint, stain, or other surface protectants. In many cases, the treatments contain compounds also found in ignitable liquids and so, can be particularly problematic in fire debris analysis. Hetzel and Moss attempted to determine the point after the last application at which the petroleum distillates from a wood 7 waterproofing coating could no longer be detected on an outdoor patio . The wood purchased for the study was pretreated with a preservative and fungicide, as outdoor wood commonly is. The pretreated pine contained some aldehydes that could further mimic the presence of a medium petroleum distillate. The authors performed two identical experiments in which waterproofer was applied to a treated lumber deck and small samples of decking were collected over several days and analyzed using GC-MS. The combined results indicated that medium petroleum distillates could still be isolated from the decking 16 days after the last application, but not more than 20 days. The temperatures during the studies were warm and predominantly dry with the average temperatures being 76° C and 67° C and the average rainfalls being 1 and 9 cm for the two studies. In a similar study pertaining to indoor treated lumber, Lentini treated pine and oak 8 flooring with either stain and a polyurethane sealer or with an oil finish . The treated wood was sampled over a 24-month period and analyzed by GC-MS. Solvents from the surface treatments were characterized on both types of wood up to two years after the application. In addition, the solvents were all present in essentially the same amounts as when they were first applied to the wood boards, regardless of the time point at which they were sampled. ! -! In a study on inherent matrix interferences and thermal degradation products, Almirall and Furton characterized compounds found in common residential and commercial objects (both old and new) by burning the materials, extracting volatiles via the passive headspace procedure, 9 and analyzing the extracts by GC-MS . The burning was performed at different temperatures and with varying amounts of oxygen present. Some volatiles inherent to the matrix and created by thermal degradation were found to produce certain target compounds indicative of an ignitable liquid residue, but did not generate the same peak ratios that would be found in a chromatogram of a neat ignitable liquid. In a similar series of experiments, Fernandes et al. partially burned many common household items (newspaper, carpet flooring, painted wood, etc.) in an attempt to characterize the matrix interferences in both new and one-month old items and to determine whether more 10 compounds were extracted from the new or the one-month old items . All volatiles were extracted using the passive headspace procedure and analyzed by GC-MS. For most tested items, it was concluded that new items contained more volatiles and created more interferences in chromatograms than the older items. The authors also concluded that the majority of the matrix interferences were inherent to the substrate and did not occur as result of thermal degradation. While volatiles were a source of interferences in the chromatogram, the authors stated that they could not be misidentified as an ignitable liquid because they lacked the characteristic peak profile of a neat ignitable liquid and that any potential for misidentification could be overcome by using control samples. While the authors did examine painted wood, they did not investigate the effects of matrix interference and thermal degradation compounds on the misidentification of an ignitable liquid from other surface treatments. ! %.! While the studies by Almirall and Furton and the Fernandes et al. concluded that matrix interferences do not mimic the correct pattern of ignitable liquids and should not be mistaken for such a liquid, neither study attempted to burn samples in the presence of an ignitable liquid 9, 10 . It is unclear in either study if the authors are taking into account the change that occurs in the pattern of an ignitable liquid that is burned due to the loss of volatiles (evaporation) and the thermal degradation of the liquid itself. Also, when a small volume of an ignitable liquid is applied to a substrate, the residue may be at levels just above the detection limit of the GC-MS and matrix interferences as well as thermal degradation products could mask the visible pattern of peaks from the ignitable liquid residue. Additionally, the studies by Almirall and Furton and the Fernandes et al. did not address the concerns raised by the three previous studies showing that wood, especially surface-treated wood, could contribute compounds to the chromatogram that are indicative of an ignitable liquid. Almirall and Furton did investigate the thermal degradation products of pine wood; however, the thermal degradation products of a surface treatment, which can be detected over two years after application, were not investigated. In a study by Dehaan and Bonarius, gasoline, paint thinner and camp fuel were used as accelerants in a real-life experimental burning of floor coverings such as carpet, padding, and 11 synthetic turf . Debris samples were immediately removed from the fire scene, extracted using the passive headspace procedure, and analyzed using GC-MS. The authors found that, while floor coverings do produce volatiles when burned, the liquids were still identifiable on the debris. It was noted, however, that the volatiles from the flooring could lead to a misidentification of the liquid as a special blend liquid due to deviations from the characteristic patterns of certain ignitable liquids. The results of this study were very encouraging; however, the experimenters did not look specifically at surface-treated wood flooring. ! %%! In an ambitious attempt to determine the element of a fire that most affects the identification of an accelerant, Borusiewicz et al. performed a study on the effect of the type of accelerant, type of burned matrix, the length of burn time, and the availability of air on the 12 detection and identification of ignitable liquid residues . Including gasoline and kerosene, 5 different ignitable liquids were investigated. The liquids were spiked onto the matrices (carpet, wood logs, chipboard) and the samples were burned until they self-extinguished. The samples were immediately collected, extracted using the passive headspace technique, and analyzed by GC-MS. The authors found that the type of burned matrix has the biggest effect on the identification of accelerants. No ignitable liquid was identified in the wood logs; however, in reality, wood does not always burn until it self-extinguishes. Additionally, it would be useful to investigate treated wood as that is the most likely form of wood in a structure fire. Lastly, the spike volume used for each of the liquids was not optimized before or during the study, so an ignitable liquid in a sufficiently large volume may be detected from wood fire debris. 1.6.2 The Application of Multivariate Statistical Procedures In a series of three studies, Sandercock et al. attempted to use principal components analysis (PCA) and linear discriminant analysis (LDA) to differentiate various gasoline samples. In the first study, 35 randomly collected gasoline samples were analyzed based on their trace 13 polar and polycyclic aromatic hydrocarbon content . The gasoline samples were unevaporated and consisted of three different grades: regular unleaded, premium unleaded, and lead replacement. A solid phase micro-extraction (SPME) procedure was used to extract the compounds from the gasoline samples. The extracts containing the different types of compounds were analyzed by GC-MS in selected ion monitoring mode for each sample. The authors ! %&! concluded that the trace polar compounds did not vary significantly among gasoline samples and should not be used as distinguishing compounds. The polycyclic aromatic hydrocarbons, specifically the C0- to C2- naphthalenes, on the other hand, were sufficiently variable across gasoline samples and were able to distinguish the samples using PCA and LDA. In the second study, the change in the C0- to C2- naphthalene content across evaporation levels of gasoline samples, as well as unevaporated gasoline samples, collected over an extended 14 period of time was investigated . For the first part of the study, 35 gasoline samples, of the same three grades as before, were evaporated to different extents (25, 50, 75, and 90% by weight). The samples were analyzed in a similar manner as in the first study and PCA was performed in conjunction with LDA on the resulting chromatograms. Using PCA on the C0- to C2-naphthalene compounds, the evaporated gasoline samples were successfully associated to their respective unevaporated counterparts. In the second part of the study, 96 unevaporated gasoline samples were collected from three stations over a 16-week time period and analyzed as described above. Again, the C0- to C2- naphthalenes were used to differentiate samples from one another and associate samples collected from the same stations. Using PCA and LDA, all 96 gasoline samples could be distinguished from one another based on differences in the naphthalene peak ratios. In an almost identical manner, the third study investigated the ability of unevaporated gasoline samples from different locations in two different countries to be differentiated from one 15 another, again using only the C0- to C2- naphthalenes for each sample . By applying PCA to the data set, 28 samples from New Zealand could be differentiated from 24 samples collected in ! %'! Australia. All of the samples from Australia could also be differentiated from one another, but only half of the samples from New Zealand could be differentiated from one another using these compounds and PCA. The series of studies performed by Sandercock and Du Pasquier demonstrate the success and potential of multivariate statistical procedures in differentiating multiple samples of one ignitable liquid 13, 14, 15 . This is encouraging data that may, someday, help link ignitable liquids at fire scenes to those found in a suspect’s possession. These studies, however, do not investigate the usefulness of these statistical procedures when applied to real fire debris where matrix interferences are present and ignitable liquid residues must be extracted from the debris. Hupp et al. used PCA and Pearson product moment correlation (PPMC) coefficients to investigate the discrimination of 25 different diesel samples across 13 brands that were analyzed 16 by GC-MS . It was demonstrated that PCA on the TICs could differentiate the diesel samples into 4 distinct groups based on their chemical compositions. The groupings observed with the PCA were further confirmed by PPMC coefficients for intragroup samples, which indicated strong similarities between samples. Additionally, the authors performed PCA on the alkane and aromatic EIPs of each diesel sample and found that even greater discrimination of the samples was obtained, which was again reflected in the calculated values of the PPMC coefficients. The authors demonstrated that these statistical procedures could be used to differentiate between diesel samples; however, supervised statistical procedures were not investigated in this study. Principal components analysis was further investigated along with canonical variate analysis (CVA), and orthogonal canonical variate analysis (OCVA), which was used in 17 conjunction with LDA, in a study by Petraco et al. ! %(! The authors used 15 selected compounds to differentiate replicates of gasoline chromatograms accumulated from 20 separate fire scene investigations. All of the statistical procedures allowed for discrimination of the samples. For CVA, OCVA, and PCA, the number of dimensions required for accurate differentiation was 3, 4, and 10, respectively. This demonstrates that all of the statistical procedures can be used to differentiate samples, but that there is room for improvement. Oftentimes, the tenth dimension in PCA accounts for a very small percentage of overall variance and, as a result, a weak differentiation of samples; therefore, statistical procedures that allow for a more definitive differentiation would be beneficial in a forensic setting. The authors acknowledged that even though the results are promising, preliminary studies they have performed with evaporated or degraded gasoline samples currently limit the usefulness of some of these statistical procedures as the samples could not be associated to corresponding ignitable liquid standards. Bodle and Hardy investigated the potential use for other statistical analyses such as soft independent modeling of class analogy (SIMCA), in addition to hierarchical cluster analysis 18 (HCA) and PCA . In a study aimed at optimization of an extraction by SPME and analysis by gas chromatography-flame ionization detection, the authors generated chromatograms of ignitable liquids including gasoline, diesel, and kerosene. To condense the data set, the resulting TICs were divided into 30-second or 60-second intervals, the signal intensities of which were summed, such that the statistical analyses were performed on the 114 or 57 newly calculated variables. The ultimate goal of this project was to investigate whether a supervised classification procedure such as SIMCA could be used to group ignitable liquids according to the ASTM International classification scheme (Table 1.1). Hierarchical cluster analysis was used to determine natural linkages or groups within the data set of ignitable liquids collected. Later, PCA models were generated using the previously generated variables, which showed strong ! %)! correlations between ignitable liquids and their respective classes. Lastly, the authors concluded that SIMCA was potentially useful as it was able to correctly classify 97.2% of the ignitable liquid samples. The samples that were not correctly classified were clear outliers of the entire data set and were not assigned to any other ignitable liquid classes. The method of selecting the variables used for differentiation in the studies by Bodle and Hardy and Petraco et al. may not be practical in a forensic laboratory where analysts will not likely know the identity of the ignitable liquid before testing 17, 18 . Even though TICs, EICs, and EIPs are likely to be more realistically useful, the authors of these studies did not investigate the advantages or disadvantages of using these chromatograms, rather than selected variables, for classification procedures. A study by Tan et al. also investigated the use of SIMCA and PCA for the identification and classification of over 50 ignitable liquids by the ASTM International classification, which 19 were extracted from unburned wood and carpet matrices . After the ignitable liquids were exposed to the matrix, the samples were solvent extracted and analyzed by GC-MS. The resulting TICs and selected EICs were divided into 19 equal parts and the signal was summed for each section, which generated the 19 new variables that were used for the statistical analyses. All liquids were correctly classified using this procedure. Simulated fire debris samples were also generated by adding some of the ignitable liquids to carpet and then burning it. The identity of the liquid used to make the simulated debris was also correctly determined using a SIMCA model. While these results are extremely promising, the authors did not investigate the effects of a surface treatment on the wood, which could complicate identification of a liquid, nor did they investigate the use of the original TICs, EICs, or EIPs for classification purposes. ! %*! Baerncopf et al. conducted a study that accounted for thermal degradation as well as the 20 matrix interferences encountered in fire debris analysis . Six ignitable liquids, from different ASTM International classes, were spiked onto a carpet matrix and burned. Samples underwent a passive headspace extraction and subsequent analysis by GC-MS. Principal components analysis and PPMC coefficients were successfully applied to the full TICs to objectively associate the ignitable liquid residues back to their corresponding neat liquids. The effect of evaporation on association was not examined in these experiments; however, the positive results from this study demonstrate the potential for the use of some multivariate statistical procedures and provide a foundation for the research performed in this thesis with surface-treated wood as a matrix. The afore-mentioned studies demonstrate the potential of using multivariate statistical procedures for the purpose of classifying ignitable liquids; however, very little of the data analyzed was performed on representative chromatograms that would result from an actual fire debris sample. Other than the work by Tan et al. and Baerncopf et al., the effects of thermal degradation on the ignitable liquid and the difficulties that arise from extracting the volatiles from a matrix were not investigated 19, 20 . Furthermore, no statistical analyses were performed on simulated fire debris containing surface-treated wood. This type of investigation is a necessary next step since surface-treated wood is commonly used in building and decorating and, as a result, is likely to be contained in fire debris submitted to forensic laboratories for analysis. 1.7 Considerations for Statistical Analyses While multivariate statistical procedures have shown to be promising in a research setting, they introduce new difficulties. Principal components analysis, for example, is such a powerful tool because it describes the data set in terms of the factors corresponding to the ! %+! greatest variance. This type of analysis procedure is so sensitive to variation, however, that it will sometimes place more emphasis on meaningless nonchemical variations as opposed to the chemical variations that actually describe the data. To minimize these meaningless differences, data pretreatment procedures can and, oftentimes, are performed on chromatographic data prior to data analysis. These procedures include smoothing, retention time alignment, and normalization of chromatograms in the data set. A smoothing algorithm is often applied to the data first because chromatograms consist of both noise and signal. Noise is unintentionally introduced as part of the data collection process and can come from many different sources, such as random fluctuations in measurements made by a detector. Signal, on the other hand, is the desired output, which describes the data. Noise can be extremely detrimental to data analysis because it is possible for the noise to mask or misrepresent the signal and, therefore, distort the results of the data analysis. A smoothing algorithm minimizes the noise while enhancing the true signal of the data. After the signal of chromatographic data has been enhanced, it is commonly retention time aligned. Retention time drift can cause the same peak in different chromatograms to have different retention times. This drift occurs naturally when samples are analyzed over a period of time. As a result of retention time drift, variation is identified across chromatograms that should not exist. This can be corrected by applying alignment algorithms to the chromatographic data. Ideally, the end result of alignment is that corresponding compounds should have the same retention time across all chromatograms. Normalization is commonly performed next and is used to reduce the non-significant variations in peak abundance between replicates, between samples, or between sample populations. These variations have many different sources and may be inherent to the data ! %,! collection process. Again, for an analysis procedure such as PCA, which describes the greatest sources of variance, random differences in peak abundance may result in the data being inaccurately described. 1.8 Research Objectives and Goals The current methods of fire debris analysis are extremely subjective, even with standard operating procedures and other safeguards. This research attempts to demonstrate the potential of both unsupervised (PCA) and supervised (SIMCA) statistical procedures for performing objective fire debris analyses. The combination of PPMC coefficients and PCA has been successfully used in the literature to associate evaporated liquids and simulated fire debris made of carpet. The first objective of this research was to investigate the effects of evaporation, matrix interferences, and thermal degradation on the association of surface-treated wood samples containing ignitable liquids to their respective standards using PCA and PPMC coefficients. In order to meet the first objective, standards and three data sets were generated. Each data set demonstrated the effects of evaporation, matrix interferences, and thermal degradation in a piecewise manner using a surface-treated wood matrix. All statistical analyses were performed on the full TICs of each data set. To evaluate the unsupervised association of samples to their standards, PCA was first performed on the liquid standards. The samples from each data set were later projected separately onto the scores plot of the standards. A visual assessment of the resulting scores plots was used to gauge the association of the samples to the standards in light of the complicating factors. For each association, mean ! %-! PPMC coefficients were also calculated to provide a numerical value of the similarity between samples. The second objective of this research was to perform a preliminary investigation on the potential of SIMCA for providing a supervised classification of ignitable liquids without the effects of the afore-mentioned complicating factors. In order to fulfill the second objective, a new set of ignitable liquid standards was generated. Six ignitable liquids were chosen for this preliminary study, all from different ASTM International classes. The liquids used were fuel stabilizer, gasoline, paint thinner, insect repellant spray, diesel, and fuel injector. Each liquid was diluted in methylene chloride and analyzed in replicate by direct injection GC-MS, generating fifteen chromatograms per liquid. Initially, PCA was performed on the entire data set to determine natural groupings of the liquids, according to chemical composition. Next, SIMCA models were generated and validated using the TICs as well as selected EICs and EIPs in an attempt to determine which type of chromatogram, if any, is more successful for classification purposes. Lastly, SIMCA models were developed based on the unevaporated and evaporated gasoline and kerosene standards from the previous study to demonstrate the effects of evaporation and passive headspace extraction on the supervised classification. ! &.! REFERENCES ! &%! REFERENCES ! ! 1. Karter MJ, Jr. Fire Loss in the United States During 2009. Quincy (MA): National Fire Protection Association; 2010 Aug. Report No. FLX09. 2. ASTM International, ASTM E 1618-06e1. Annual Book of ASTM Standards 14.02. 3. ASTM International, ASTM E 1412-07. Annual Book of ASTM Standards 1402. 4. Stauffer E. Concept of pyrolysis for fire debris analysts. Science & Justice 2003; 43(1): 29-40. 5. Committee on Identifying the Needs of the Forensic Sciences Community, National Research Council. Strengthening Forensic Science in the United States: A Path Forward. Washington, D.C.: National Academies Press, 2009. 6. Lentini JJ, Dolan JA, Cherry C. The Petroleum-Laced Background. Journal of Forensic Sciences 200; 45(5): 968-989. 7. Hetzel SS, Moss BA, Moss RD. How long after waterproofing a deck can you still isolate an ignitable liquid? Journal of Forensic Science. 2005; 50(2): 269–276. 8. Lentini JJ. Persistance of Floor Coating Solvents. Journal of Forensic Science 2001; 46(6); 1470-1473. 9. Almirall JR, Furton KG. Characterization of background and pyrolysis products that may interfere with forensic analysis of fire debris. J. Anal. Appl. Pyrolysis 2004; 71: 51–67. 10. Fernandes MS, Lau CM, Wong WC. The effect of volatile residues in burnt household items on the detection of fire accelerants. Science & Justice 2002; 42: 7-15. 11. Dehaan JD, Bonarius K. Pyrolysis products of structure fires. Journal of the Forensic Science Society 1988; 28(5-6): 299-309. 12. Borusiewicz R, Zi!ba-Palus J, Zadora G. The influence of the type of accelerant, type of burned material, time of burning and availability of air on the possibility of detection of accelerants. Forensic Science International 2006; 160: 115-126. 13. Sandercock PML, Du Pasquier E. Chemical fingerprinting of unevaporated automotive gasoline samples. Forensic Science International 2003; 134: 1-10. 14. Sandercock PML, Du Pasquier E. Chemical fingerprinting of gasoline 2. Comparison of unevaporated and evaporated automotive gasoline samples. Forensic Science International 2004; 140: 43-59. ! &&! 15. Sandercock PML, Du Pasquier E. Chemical fingerprinting of gasoline Part 3. Comparison of unevaporated automotive gasoline samples from Australia and New Zealand. Forensic Science International 2004; 140: 71-77. 16. Hupp AM, Marshall LJ, Campbell DI, Smith RW, McGuffin VL. Chemometric analysis of diesel fuel for forensic and environmental applications 2008; 606(2): 159-171. 17. Petraco NDK, Gil M, Pizzola PA, Kubic TA. Statistical Discrimination of Liquid Gasoline Samples from Casework. Journal of Forensic Science 2008; 53(5): 1092-1101. 18. Bodle ES, Hardy JK. Multivariate pattern recognition of petroleum-based accelerants by solid-phase microextraction gas chromatography with flame ionization detection. Analytica Chimica Acta 2007; 589: 247-254. 19. Tan B, Hardy JK, Snavely RE. Accelerant classification by gas chromatography/mass spectrometry and multivariate pattern recognition. Analytica Chimica Acta 2000; 422: 37-46. 20. Baerncopf JM, McGuffin VL, Smith RW. Association of ignitable liquid residues to neat ignitable liquids in the presence of matrix interferences using chemometric procedures. Journal of Forensic Sciences 2011; 56: 70-81. ! ! &'! Chapter 2: Theory 2.1 Passive Headspace Extraction Passive headspace extraction is but one of the methods recommended by ASTM International and is commonly used as an extraction method for fire debris samples suspected of 1 resulting from arson . For a passive headspace extraction, fire debris samples are placed in a sealed container and an activated charcoal strip (ACS) is suspended within the container. The samples are left in 1 an oven at a temperature ranging from 50 to 80 °C over a period of 2 to 24 hours . When the sample is heated, the volatile compounds are released into the headspace of the container and adsorb onto the ACS. The type of volatile compounds that adsorb on the strip are dependent on the heat, as well as the duration, of the extraction. At higher temperatures, heavier, less volatile compounds are released into the headspace, while at lower temperatures, the smaller, more volatile compounds are primarily collected. Longer extraction times also favor heavier molecules because, if at some point in the extraction the strip becomes saturated, the heavier molecules have a tendency to displace the smaller molecules. After the headspace extraction is performed, the ACS is eluted with an organic solvent such as carbon disulfide, n-pentane, diethyl ether, or methylene chloride. The resulting extract is then analyzed, typically, by gas chromatographymass spectrometry (GC-MS). 2.2 Gas Chromatography-Mass Spectrometry Gas chromatography-mass spectrometry is the method of choice for analyzing suspected arson fire debris samples in forensic laboratories. Chromatography techniques are used to separate sample mixtures into individual analytes through interaction of the sample between a ! &(! 2 mobile phase and a stationary phase . In modern gas chromatography, the mobile phase is a gas, commonly referred to as the carrier gas, while the stationary phase is typically a liquid. The mobile phase gas is contained in a pressurized cylinder that is connected to the injection port of the GC (Figure 2.1). The gas flows through the column that contains the stationary phase. The column is housed in an oven to allow careful control of temperature during the analysis. The column is fed through the transfer line and emerges directly into the mass spectrometer detector where additional information about the sample is generated and collected. The separation begins when a syringe is used to inject a liquid sample mixture into the injection port of the GC. Once the sample is injected, it is quickly volatilized by the hot temperatures of the port. It is important to note that because the separation of the mixture occurs while it is in the gas state, all of the analytes that can be separated and detected using this method must be sufficiently volatile or they will not be converted to the gas state and carried through the column. The temperature of the injection port is typically 50 °C above the boiling point of the least volatile compound in the mixture to ensure volatilization and separation in the 2 column . If the injection port were any cooler than that, the mixture would not be volatilized rapidly and would enter the column over too broad a period of time, which could result in inefficient separations. Additionally, inadequate volatilization may result in only part of the sample being analyzed. Specifically, if the injection port is at a lower temperature than the highest boiling point of an analyte, that analyte would not enter the column and the results of the analysis would not be representative of the actual sample. The injection can be performed in four different modes: split, splitless, pulsed split, and pulsed splitless. A split injection disposes of a fraction of the sample before it even reaches the column. Some common split ratios are 50:1 or 100:1. This is used, and can be beneficial, for ! &)! Inlet Detector Column Oven Gas Cylinder Figure 2.1: Schematic of a gas chromatograph. ! &*! highly concentrated samples. Discarding some of the sample prevents the column from being overloaded or contaminated. Overloading the column, which is discussed later, leads to poor separation of the sample. A splitless injection, on the other hand, injects the entire volume of the sample onto the column. This mode is ideal for low concentration samples and allows for the maximum amount of sample to reach the column and undergo separation. In pulsed split or splitless injection, a pressure is simply applied to transfer all or part of the sample quickly from the inlet onto the column. This results in the sample entering the column in a tight plug with minimal spread of the analytes. Also, within the injection port is an inlet that allows a constant flow of carrier gas (mobile phase) to enter and flow through the system (typically ~1 mL/min for GC-MS). The sample mixture is carried in the flow of gas from the injection port through the column and to the detector. For GC-MS applications, the most commonly used carrier gas is helium due to its inert nature and low molecular weight. Ideally, the sample mixture should be introduced onto the column in as narrow a band as possible. As the mixture travels through the column, analytes within the sample mixture interact differently with the stationary and mobile phases, depending on the properties of the analyte molecules. In gas chromatography, interaction with the stationary phase is mainly through absorption, which is also known as partition. This occurs when molecules of the analyte diffuse into the thin coating of the liquid stationary phase. Ideally, in a mixture each analyte will have a slightly different affinity for both phases. An analyte that spends more time in the mobile phase, for example, will travel more quickly through the column, while an analyte that spends more time in the stationary phase will travel more slowly. As a result of the varying affinities, when ! &+! the carrier gas carries the mixture through the column, the sample mixture is separated into several distinct bands, which ideally each contain one type of analyte. The choice of stationary phase is very important for optimal separation. The mobile phase merely carries the sample through the column whereas the stationary phase interacts with and retards specific compounds differently so that they can be separated from one another. The stationary phase is chosen based on the extent of its thermal stability in the high oven temperatures, inertness and compatibility (similar polarities) with the compounds to be separated2. The most common type of stationary phase used in forensic laboratories are those with polysiloxane backbones. One such column is known commercially as HP-5 where HP stands for the manufacturer and the 5 indicates that the stationary phase is 5% phenyl- and 95% methyl-polysiloxane. These columns are very useful for separating a large range of polar, nonpolar, basic, and acidic compounds, including those routinely seen when performing an 3 analysis of fire debris . For efficient chromatographic separation of the sample analytes, any band broadening of the sample mixture should be minimized. There are multiple factors that lead to band broadening such as longitudinal diffusion and the efficiency of the mass transfer between the mobile and stationary phase2. Longitudinal diffusion occurs over time as the analyte molecules diffuse from a more concentrated region to a less concentrated region within the mobile phase. This can occur in a column depending on the length of time it takes for the separation to occur. The less time a mixture spends in the column, the less time is available for diffusion to occur. As a result, a higher flow rate of the carrier gas is usually considered to be best to decrease diffusion since it decreases the amount of time the mixture spends in the column. However, a high flow rate is not ! &,! always optimal because it can adversely affect the efficiency of the mass transfer of analytes during the separation. Additionally, higher flow rates can negatively impact the separation by moving the sample through the column so quickly that it does not have enough time to separate. This may result in the co-elution of compounds and a decrease in overall resolution. Mass transfer is the transfer of analyte molecules from the mobile phase to the stationary phase and back again. Ideally, equilibrium should exist between the analytes in the mobile phase and stationary phase during the separation; however, equilibrium is established so slowly that 2 separations never occur under equilibrium conditions . Some adjustments can be made so that an equilibrium approximation can occur resulting in an increase in mass transfer efficiency as well as a decrease in band broadening. Some factors affecting mass transfer are the flow rate, the concentration of the analyte, and the length of the column. High flow rates decrease the efficiency of the mass transfer since there is less time for equilibrium to occur. An analyte in a mobile phase with a high flow rate will travel a long distance down the column while some of the analyte is partitioned into the stationary phase. This results in irreversible band broadening. The same analyte traveling at a lower flow rate will not travel as far ahead of the analyte in the stationary phase; therefore, lower flow rates aid in increasing the efficiency of the mass transfer and decreasing band broadening. Similarly, a separation using a long column may result in more band broadening than a short column. A common column length is 30 meters. Regardless of flow rate and length of the column, a high concentration of the analyte, known as overloading the column, also decreases the mass transfer. This occurs when the excess analyte is present such that there are no more sites in the stationary phase for the analyte to partition into. As a result, most of the analyte is in the mobile phase and band broadening occurs. To increase mass transfer efficiency and consequently ! &-! decrease band broadening, the stationary phase in the column is applied as a very thin layer of liquid, commonly less than one micron thick, on the inner walls of the column. The analytes can completely partition into the stationary phase more quickly because there is less distance or width to travel through before they begin to partition back into the mobile phase. Since the separation of the molecules is also dependent on the temperature at which the separation occurs, the column is housed within an oven to allow strict control and close monitoring of temperature. It is possible to use two types of temperature programs in gas chromatography. An isothermal program is when the oven, and consequently the column, is maintained at one temperature for the entire analysis. This type of program is typically used for the separation of molecules with very similar boiling points and provides the best resolution. Some limitations to isothermal analyses are that they require a longer period of time to be completed and cannot separate mixtures that contain analytes that have a wide range of boiling points. In temperature programming, the oven and hence, column temperature, is increased from low to high temperatures in a controlled manner. Temperature programming in this way has many benefits. One such benefit is that a more complex mixture containing analytes with a wide range of boiling points can be separated in a short period of time. Since the separation can occur over a shorter time, there is less band broadening and more efficient separation. Conversely, one disadvantage of temperature programming is that the ramp rate may be too high so that compounds with similar boiling points co-elute, leading to poor separation efficiency. Once the separated molecules reach the end of the column, they travel into the detector. While many different detectors are available for GC, the mass spectrometer is widely used for forensic applications. The analytes are transferred to the mass spectrometer directly from the column by way of a heated transfer line, which is kept at a temperature equal to the highest ! '.! temperature used in the oven temperature program, typically 250-300 °C. The transfer line is heated to prevent or minimize condensation of the separated analytes. As the analyte is carried into the mass spectrometer, the carrier gas is pumped away while allowing the analyte to reach the ion source. This is important because gas chromatography is performed under atmospheric pressure whereas mass spectrometry must be performed under -4 -8 2 vacuum conditions. Typical pressures needed for this analysis range from 10 to 10 torr . Because the flow rate of the mobile phase is so low (~1 mL/min) in gas chromatography when using a capillary column, a specialized interface is not needed to remove the carrier gas from the sample; the vacuum pumps associated with the MS are able to pump the carrier gas away and maintain the low pressures needed. Vacuum conditions are needed in mass spectrometry to ensure that the ions being created and analyzed do not undergo any reactive collisions between ionization and detection. Multiple types of pumps work together to generate and maintain the low pressures that are required for mass spectrometry to be performed. The pumps work by removing excess molecules and therefore increasing the mean free path, or the distance that the ion can travel without chance of collision with another molecule. There are three major parts of a mass spectrometer: an ion source, a mass analyzer, and an ion detector (Figure 2.2). In a GC-MS bench top instrument, the most commonly used ion source is an electron ionization source, while the mass analyzer is typically a quadrupole analyzer and the detector is an electron multiplier detector. Overall, the mass spectrometer works by ionizing and fragmenting molecules, determining the masses and charges of the fragments produced, and then detecting the fragments. Under given conditions, the molecules fragment in a ! '%! Data Acquisition System Inlet System (From GC) Ion Source Mass Analyzer Detector Vacuum Pump Figure 2.2: Schematic of a mass spectrometer. ! '&! unique and reproducible manner and so the resulting fragmentation patterns can be used to definitively identify the separated analytes. Electron ionization (EI) requires the creation of free electrons to remove electrons from a neutral molecule, which causes a cascade of events leading to fragmentation. Free electrons are produced by applying a current across a thin filament. The electrons are accelerated across a potential toward an anode, which imparts the electrons with energy (typically 70 eV although this energy can be varied). The ion source also contains a collimating magnet, which causes the electrons to travel in spiral pathways. The neutral analytes from the gas chromatograph are introduced perpendicularly to the flow of electrons. The compounds have to travel through the flow of electrons in order to reach the mass analyzer and, on doing so, are bombarded with electrons. The spiral motion of the electrons increases the probability of interaction with the neutral molecules. When an electron passes close to a neutral molecule, energy is transferred. If the electron can transfer sufficient energy, the ionization potential of the neutral molecule will be surpassed, which creates a positively charged ion. Excess energy from the electron or from interactions with subsequent electrons can lead to an excess of energy in the once neutral molecule. The excess energy cannot be disposed of quickly enough, resulting in fragmentation. Because EI results in extensive fragmentation, it is referred to as a ‘hard’ ionization technique and, due to the extent of fragmentation, EI is very useful in structural determinations. Once the positive fragment ions have been produced, they are directed to the mass analyzer by way of a positively-charged repeller plate and a negatively-charged extractor plate. A quadrupole mass analyzer consists of four rods running parallel to one another in a diamond formation (Figure 2.3). Rods located oppositely from one another are paired. The rods are connected to a direct current (DC) source; one set of rods is positive, while the other is ! ''! Non-resonant Ions Resonant Ions To Detector _ +! ! _ ! +! Ions from Source Figure 2.3: Diagram of a quadrupole mass analyzer"! ! '(! negative. Additionally, radio-frequency (RF) alternating current is applied to both sets such that one set of rods is always out of phase with the other set. Quadrupole mass analyzers are used to perform mass selective stability scans, which are performed by scanning the DC and RF 2 potentials at a fixed ratio . At a given ratio, only ions with a specific mass-to-charge (m/z) value will have stable trajectories that allow them to pass through the cavity defined by the rods and reach the detector. Ions that are lighter and heavier than the stable m/z value will have unstable trajectories that cause them to hit the rods where they are neutralized and pumped away by the vacuum system. In order to collect an entire mass spectrum of each analyte, the entire range of DC/RF fixed ratio potentials are scanned so that ions with a large range of m/z values can pass through and reach the detector. Once an ion of a specific m/z has passed through the mass analyzer, it is detected using a continuous-dynode electron multiplier detector. The detector is horn-shaped and made of glass 2 doped with lead, which easily emits secondary electrons . The opening, where the positive ions enter, is held at a slight negative potential while the other end is held at ground. This produces a potential gradient down the length of the horn. When a positive ion strikes the opening of the detector, electrons are emitted. The emitted electrons are then attracted toward a less negative part of the detector where they again strike the surface and more electrons are emitted. This occurs several times until the signal of the original ion has been amplified (by approximately 5 82 10 -10 ) . Since the response of the detector is constant, meaning that each ion leads to the emission of a constant number of electrons, the amplified signals are comparable for all of the ions generated in the mass spectrometer. As a result, the amplified signal can be used to quantify the analytes in a mixture. ! ')! An analog-to-digital converter is used to transform the electrical current to a digital signal that is interpreted by a computer. The end result of a GC-MS analysis is the generation of a total ion chromatogram. This is a graph where the abscissa is the retention time of the molecules, in minutes, and the ordinate is the total ion current, which is dimensionless. The retention time of the molecule is simply the time that it takes the molecule to travel from the beginning of the column to the detector. The molecules are represented as peaks on the chromatogram and the area underneath the peak can be used to quantify the amount of the molecule present in the mixture. Furthermore, the identity of the molecule represented by each peak can be deduced by analyzing the mass spectrum associated with the peak, which is reproducible and characteristic to that molecule, under a given set of conditions. 2.3 Data Pretreatment ! 2.3.1 Smoothing All data sets, including chromatograms, consist of both signal and noise. The signal is the part of the data that is intentionally collected and is the desired output. Noise, on the other hand, is incidental and can come from a number of sources. For example, natural fluctuations in the measurements made by a detector will lead to small, but random, variations within the data. Noise can be detrimental to the analysis or characterization of a data set because it has the potential to partially or completely mask trends that would otherwise be visible. Smoothing of the data set can be performed as a way to minimize noise and its effects. In short, smoothing methods can be used to increase the signal-to-noise ratio of the data set. A very popular method of smoothing is the Savitzky-Golay smoothing algorithm. This procedure uses a local polynomial regression over a set number of data points to describe the ! '*! 4 data and reduce the noise . A certain number of data points, called a window size, is used to perform a polynomial regression, which essentially fits a polynomial to that chosen set of points. Once the polynomial has been fitted, the y-value of the centermost point in the window size is replaced by the new y-value predicted from the fitted polynomial. After the center point for that window size has been smoothed, the algorithm moves to the next x-value and continues to smooth the center point, one at a time until it has gone through all of the x-values. Eventually the entire data set will be smoothed with one notable exception. The first and last few points at the beginning and the end of the data set will not be smoothed because they do not fulfill the window size requirement: that is, these data points can never be the center point of a window in order to 5 be smoothed . The number of points used for the window size is crucial and is modified according to the data set. Larger window sizes smooth the data more than smaller window sizes, but a window size that effectively smooths one data set will not necessarily smooth another data set as well. Large window sizes, for example, may over-smooth the data and remove some of the actual signal as well as the noise. Smaller window sizes, on the other hand, may do the opposite and not 5 remove enough of the noise . It should also be noted that window sizes must contain an odd 4 number of points . This is because the center point is smoothed using this method and a center point can only exist if the window size is an odd number of points. Typically, for chromatographic data, a window size that has a similar number of points as an average peak within the chromatogram is used as a starting point. From there, adjustments are made and evaluated to determine if larger or smaller sizes should be used. ! '+! The order of the polynomial used for the regression can also be modified to best complement the data set. Higher-order polynomials tend to preserve tall and narrow peak shapes better, while lower-order polynomials tend to work best for wide peaks. The order of the polynomial chosen depends not only on the trends in the data that need to be preserved, but also, the window size that is chosen for the smooth to occur. The order of the polynomial has to be less than the number of points chosen to compose the window size because a specific number of points is needed to create a polynomial depending on the order. A first-order equation, for example, is a straight line. In order to draw a line, at least two points must be present so the window size has to be greater than two. The same logic follows for polynomials of every other order. The Savitzky-Golay smoothing algorithm has many advantages over other types of smoothing. The fact that a polynomial is fitted to a window means that this method is extremely good for preserving the overall trend of the data set, while minimizing the noise. This is in contrast to other smoothing procedures that may replace the center point of the window with the average across all data points that make up the window, which can distort peak shape, trends in the data, and decrease, or even completely remove, some of the signal from the data set. 2.3.2 Retention Time Alignment After smoothing, it is often necessary to perform retention time alignment on chromatographic data. The alignment minimizes drift in retention time of the same analyte in samples analyzed over a period of time. Oftentimes, the sample chromatograms are aligned to a chromatogram of a consensus target, which contains all of the compounds known to exist in the samples. ! ',! A correlation optimized warping (COW) algorithm can be used to align chromatographic data. The chromatogram is divided into sections, each containing the same number of data points 6 as defined by the analyst, and referred to as the segment length . Another parameter, known as the warp, defines the maximum number of data points that can be added to or removed from each 6 segment in order to produce the best alignment . For example, a warp of 3 means that 0, 1, 2, or 3 data points can be added or subtracted from each segment. This number is also chosen by the analyst. Alignment of the sample chromatogram to the target chromatogram is performed from the end to the beginning of the chromatogram, so the last segment is optimized first. This is performed by calculating a local correlation coefficient between corresponding segments in the sample and the consensus target chromatograms. The coefficients are calculated for each possible warp and segment combination. All of the local coefficients are then summed, for each specific warp and segment combination, to generate multiple global correlation coefficients. The warp and segment combination resulting in the highest global correlation coefficients is considered to be the optimal alignment. A high coefficient, however, does not always mean that the consensus target and sample chromatogram are well aligned because the slope of a peak in one chromatogram may be aligned to the apex of a peak in the other chromatogram. Consequently, the optimal alignment is best determined through a visual comparison of the aligned chromatograms. 2.3.3 Normalization Normalization is a very common pretreatment procedure that is used to reduce the nonsignificant variations in abundance between replicates, between samples, or between sample ! '-! populations. These variations in abundances are expected and have many sources such as variations in the volume of sample injected into the GC for analysis. It is necessary to minimize the fluctuations in abundance so that later data analysis procedures can describe actual differences or trends in the data as opposed to changes in abundance inherent to the data collection process. Ideally, this ensures that any changes described by later statistical procedures come from meaningful, chemical, differences in the data and not random fluctuations. As with any other pretreatment procedure, there can be negative consequences when normalizing data, depending on the goal of the subsequent analysis steps and the information that is gained from the data set. One such draw back of normalization is that since it stretches or compresses the data to be comparable across a set, all information about concentration or 5 abundance is completely removed . This means that normalization in certain situations can be detrimental to analysis and should not be performed. In chromatograms used for this project, however, it is the presence and pattern of the peaks that is most important and not the abundances, which makes this type of data set an ideal candidate for undergoing the normalization process. There are many different types of normalization that can be performed on chromatographic data. One commonly used type is total area normalization. In this method, the abundance at each retention time in a single chromatogram is divided by the total area of that chromatogram and then multiplied by the average total area of the chromatograms within the data set. The concept of first dividing the chromatogram by its own total area, is to reduce all of the chromatograms in the data set to the same scale, where the abundances vary from 0 to 1. The later step of multiplying by the average restores the rescaled chromatograms to a similar abundance that they began with. ! (.! 2.4 Data Analysis 2.4.1 Principal Components Analysis Principal components analysis (PCA) is an extremely powerful tool that is used to analyze multidimensional data and reduce it into fewer dimensions that explain from where the most variance in the samples is coming. This multivariate statistical procedure allows the user to condense the data and discover trends that might otherwise be masked by the overwhelming dimensionality of the data. Since chromatographic data are multidimensional, PCA is a good statistical procedure to identify the variations in the data and assess natural groupings of the data. The first step in performing PCA is to mean center the data. Specifically for chromatographic data, this is done by calculating the average abundance at each retention time, across the entire data set. This average is then subtracted from the abundance at the corresponding retention time in each individual chromatogram. Mean centering is a way to ensure that the principal components (discussed later) describe the maximum amount of variance by redefining the average or mean as zero. The next step is to calculate the covariance matrix on the mean-centered data. Covariance 7 (Equation 2.1) is a measure of spread in the data set . Or, it can simply be defined as the variance between two samples. In the equation below, x and y are the individual data points in samples x 7 and y, respectively, and n is the number of data points being evaluated . !"#!!!!! ! ! !!!!!! !!!!!! !!! !!! Equation 2.1 The covariance identifies how one variable changes across two samples. In chromatographic data, this occurs by a point-by-point comparison between the abundances at ! (%! each retention time in two chromatograms. In order to accomplish these comparisons, a covariance matrix is developed, in which the covariance calculated between all retention times in each pair of chromatograms is displayed. Once the covariance matrix is established, its eigenvectors and eigenvalues are calculated. The eigenvector is a unit vector that, when multiplied by the covariance matrix, produces a multiple of the original matrix. The eigenvalue is the number by which the original matrix was multiplied. Eigenvectors are used to identify the sources of variance within the entire data set. Many eigenvectors, located orthogonally to one another, can be calculated to satisfy the same data set; however, each accounts for a different amount of variance. The number of eigenvectors that can be calculated for a given covariance matrix is equal to the number of samples being evaluated7. For example, if a covariance matrix was developed using 90 samples (90 " 90) then 90 eigenvectors could be calculated to describe it. Along with each eigenvector is an associated eigenvalue, which defines the amount of variance described by the eigenvector. Thus, the eigenvector with the highest corresponding eigenvalue accounts for the most variance 7 and is considered to be the first principal component . The eigenvectors are ranked in this way so that the second highest eigenvalue corresponds to the second principal component and describes the next greatest amount of variance, and so on. The mean-centered data are then multiplied by each eigenvector, separately. This essentially results in the original data being described by the eigenvectors. For chromatographic data, the original abundance at each retention time is replaced by a new number, which is called the loading. The loadings at each retention time are summed, which results in the score of the sample. From this a scores plot can be developed, which is a visual representation of the scores ! (&! of the samples within the two dimensions used for the analysis. In a scores plot, chemically similar samples are clustered more closely to one another than other dissimilar samples. The eigenvector can be graphed to identify the variables that are responsible for the variance in the data set. The resulting graph, called a loadings plot, can also be used to explain the positioning of the samples in the scores plot. The loadings plot identifies not only the variables that are leading to the positioning of each sample, but also how heavily each is weighted in determining the position. A loadings plot can be generated to describe each principal component. For chromatographic data, the eigenvector can be plotted versus retention time such that each variable (in this case, compound) can be identified based on retention time. The sign of the components (positive or negative) within each loadings plot is assigned arbitrarily and only serves as a way to place the samples positively or negatively on PC1 or PC2 in the scores plot. 2.4.2 Pearson Product Moment Correlation Coefficients Pearson product moment correlation (PPMC) coefficients provide a pairwise comparison between two different samples and result in a numerical value that describes the relationship between the two samples. The coefficient is calculated by dividing the covariance calculated 8 between two variables by the product of the variances for both variables (Equation 2.2) . The variance is a measure of spread in the data for each set of variables, while the covariance identifies how the x-variable changes in relation to the y-variable. In the equation below, x and y 8 represent the individual data points from samples x and y, respectively . !!" ! ! ! !!!!!! !!!!!! !!! ! ! !!!!!! !!! ! ! !!!!!! !!! ('! Equation 2.2 In terms of chromatographic data, the two variables would be the abundance, at a specific retention time, in two chromatograms. The resulting coefficient represents a pairwise comparison of the two chromatograms, on a point-by-point basis. For a given peak in the two chromatograms, differences in the point at which the peak begins, reaches the apex, and ends result in lower coefficients. The value of a PPMC coefficient can range from -1 to +1. A coefficient of +1 means that the two samples are perfectly, positively correlated to one another; while a coefficient of -1 means that the two samples are perfectly negatively correlated to one another. A coefficient greater than ±0.8 indicates that two samples are strongly correlated, a coefficient between ±0.5 and ±0.79 indicates a moderate correlation, and a coefficient less than ±0.49 indicates a weak 8 correlation . A coefficient of zero indicates no correlation between the two samples. 2.4.3 Soft Independent Modeling of Class Analogy Unlike PCA, soft independent modeling of class analogy (SIMCA) is a supervised pattern recognition procedure. In this case, ‘supervised’ means that unknown samples can be identified as belonging to groups or classes, which are predetermined by the user. To perform SIMCA, the data set is divided into a training set and a test set. ‘New’ samples may later be classified using the SIMCA model that was developed on the training set and validated using the test set. The first step in SIMCA is to generate statistical models for each predefined group within the training set. This is typically done using PCA, in which a PCA model is developed for each of the known groups in the training set. The resulting models characterize each group in the training set independently of one another. The number of PCs used to describe one group is ! ((! 9 determined independently of the number of PCs used to describe the others . The resulting scores for each sample can be plotted on a scatter plot to visualize the results of PCA for each group. A loadings plot is also generated for each PC and can be used to explain the positioning of the samples on the plot as well as identify the sources of variance for each PCA model. The PCA models are validated to evaluate how well the models describe the predefined groups. While many validation procedures are available, cross validation using the ‘leave one out’ method is commonly used. In this validation procedure, one sample is removed from the training set and is used as a testing sample. A new model is made using the remaining samples from the training set and then the model is applied to the test sample. This procedure is repeated numerous times until each sample has been used as a test sample to validate the model created from the remaining samples in the training set. When performing SIMCA, the original data set is split into a training and testing set, as mentioned previously. The training set is used to develop the PCA models, as described above. The test set is used to assess the ability of the models to classify samples according to the appropriate group. The assessment is performed by projecting each sample in the test set on to each of the PCA models, in a manner identical to that used when projecting scores for PCA. Next, a distance measurement, known as the object-to-model distance, is calculated to determine 9 how far the test samples are from each model . Equation 2.3 below describes the calculation performed to determine the object-to-model distance of a new sample (Si) where m is the model, 10 ResXCal is the variance per x-variable, and a is the principal component number . !! !! ! ! ! !"#$%&'!"#!! !!! !! ()! Equation 2.3 Another measurement, called the leverage (Hi), is calculated and describes the distance 9 of the test samples from the mean score of the group . Equation 2.4 describes how leverage is calculated for each sample (Hi) where I is the number of samples, a/A are the principal 10 component number/number of principal components, and t/T are the scores (vector/matrix) . ! !! ! ! ! ! !!" ! !!! ! ! ! Equation 2.4 ! ! A combination of the object-to-model distance and the leverage, are used to determine whether the test samples fall within the predefined group membership limit set for each PCA 9 group model . If the test samples do, then they are assigned to that class. If the test samples do not, they may not be classified. If the test set does not validate the SIMCA model developed using the training set, the model must be revised, specifically, the PCA models must be re-evaluated. This may be done by changing the number of PCs used to describe the PCA models for each group. Models may also be revised by removing outliers, or samples that falsely and negatively impact the model, from the training set. Lastly, if the PCA models developed from the training set do not have enough discriminating power, other statistical procedures may be used separately or in conjunction with SIMCA to improve the classification. After the model has been validated, the ‘unknown’ samples are subjected to each PCA model to classify the samples according to group. Again, this involves calculating the object-tomodel distance as well as the leverage to determine which unknown samples fall within in each PCA group model limit. ! (*! The SIMCA procedure is considered to be a ‘soft’ classification procedure because there are three possible outcomes for the identification of each unknown sample. The sample could be 9 assigned to one, multiple, or none of the groups . The ability not to force a classification differs from other statistical procedures and may be beneficial in a real world scenario because an unknown sample may not belong to any of the predefined groups. Additionally, the assignations can be calculated at different confidence levels and may change depending on which confidence level is used. Thus, SIMCA not only offers the ability to classify unknown samples to groups, but also provides a statistical confidence associated with the classification. ! (+! REFERENCES ! (,! REFERENCES ! ! 1. ASTM International, ASTM E 1412-07. Annual Book of ASTM Standards. 2. Skoog DA, Holler FJ, Crouch SR, Principals of instrumental analysis. 6th edition. Belmont, CA: Thompson, 2007. 3. Optimized Sensitivity, Accuracy and Reproducibility on a SINGLE Column. 2012. (http://www.labplus.co.kr/catalog/detailed_pages/Hp1n5.pdf) 4. Chau F, Liang Y, Gau J, Shao X, Chemometrics: From Basics to Wavelet Transform. Hoboken, NJ: John Wiley & Sons, Inc., 2004: 25-31. 5. Beebe KR, Pell RJ, Seasholtz MB, Chemometrics: A Practical Guide. New York, NY: John Wiley & Sons, Inc., 1998: 32-34. 6. LineUpTM User Manual (version 1.0.62, Infometrix, Inc., Bothwell, WA). 7. Smith LI, A tutorial on principal components analysis. 2002. (http://www.sccg.sk/ ~haladova/principal_components.pdf). 8. Devore JL, Probability and Statistics for Engineering and the Sciences. Belmont, CA: Duxbury Press, 1991: 487-490. 9. Unscrambler X SIMCA Theory Section of User Manual (version 10.2, Camo, Inc., Woodbridge, NJ). 10. Unscrambler X Methods Manual (version 10.2, Camo, Inc., Woodbridge, NJ). ! ! (-! Chapter 3: Association of Simulated Fire Debris Samples to Corresponding Standards Using Unsupervised Statistical Procedures 3.1 Introduction Wood, of all kinds, is an extremely common product used in building structures as well as furnishing and decorating them. Oftentimes, the wood is treated whether it be to keep away pests, to lend extra strength to the surface, or to make it more aesthetically pleasing. These treatments also introduce compounds to chromatograms of the wood, which can make identification of an ignitable liquid during an arson investigation extremely complex. In this chapter, the use of unsupervised statistical procedures to associate simulated fire debris samples to their corresponding standards is investigated. A set of standards for gasoline and kerosene, at three different evaporation levels, was generated by spiking the ignitable liquid TM onto a Kimwipe and analyzing it by gas chromatography-mass spectrometry (GC-MS). Next, three different data sets were generated. The first data set (known as the inherent matrix interferences data set) was generated by spiking each ignitable liquid onto unburned, surfacetreated wood. These samples were extracted using a passive headspace procedure and analyzed by GC-MS. This data set was used to demonstrate the effects of evaporation and inherent matrix interferences on the association of samples to their respective standards. The second data set (matrix interference/thermal degradation data set) consisted of burned, surface-treated wood that was spiked with each ignitable liquid, then extracted and analyzed. This data set was used to demonstrate the effects of evaporation, matrix interferences, and thermal degradation of the matrix on the association of the samples to their corresponding liquid standards. The final data set (simulated fire debris data set) consisted of surface-treated wood samples that were spiked ! ).! with each ignitable liquid and then burned. This data set was used to investigate the effects of thermal degradation of both the ignitable liquid and the matrix, in addition to evaporation and matrix interferences, on the association of the samples to their corresponding standards. Thus, each data set illustrates, in a piecewise manner, the effect of each complicating factor on the association of the sample to its respective standard. Principal components analysis (PCA) was performed on the chromatograms of the standards to generate a standards scores plot. The three data sets of samples were projected, separately, onto the standards scores plot to investigate the objective association of the samples to their respective standards, in the presence of each of the complicating factors. The calculated Pearson product moment correlation (PPMC) coefficients also provided pairwise comparisons between chromatograms of the standards and samples. The PPMC coefficients provide a numerical value, which describes the similarities between the chromatograms. 3.2 Materials and Methods 3.2.1 Ignitable Liquid Standards The gasoline and kerosene used for this research were available in the laboratory. The fuels were previously collected from fuel stations and stores in the Lansing, MI area. Both were stored at refrigerated temperatures in acid-washed amber containers that were capped and ® covered with parafilm (American National Can TM , Greenwich CT). Both liquids were evaporated to three different levels by volume: 0, 50, and 90%. To do this, a 10-mL acid-washed graduated cylinder was filled with the liquid, which was then evaporated using a stream of nitrogen. A star-shaped stir bar was placed in each cylinder in order ! )%! to maintain the homogeneity of the liquid as it evaporated. This evaporation was done multiple times and aliquots of each evaporated liquid were thoroughly mixed together, once again, to ensure a homogenous sample of each evaporated liquid. The liquids were stored as described above. Prior to analysis, each ignitable liquid was diluted (1:10 v/v) in methylene chloride (J.T. Baker, Phillipsburg, NJ), which contained nitrobenzene (Mallinckrodt, Inc., Paris, KY) as an internal standard at a concentration of 0.2 M. Twenty microliters of the diluted liquid was spiked 2 onto a 4 " 4 cm piece of Kimwipe TM (Kimberly-Clark Global Sales, LLC, Roswell, GA) in a nylon bag (Grand River Products, LLC, Grosse Pointe Farms, MI). A quarter of an activated charcoal strip (Albrayco Technologies, Inc., Cromwell, CT) hanging on a paperclip (previously rinsed with methylene chloride) was inserted into the nylon bag, which was then sealed. Five samples were generated in this manner for each evaporation level of both ignitable liquids. The samples underwent a passive headspace extraction where the bags were placed in an 80° C oven 1 for 4 hours as recommended by ASTM International . Following extraction, the activated charcoal strips were removed from the bags and eluted with 200 µL of methylene chloride. The resulting extracts were analyzed, in triplicate, by gas chromatography-mass spectrometry (GCMS). In addition to the liquid standards, a consensus target was also prepared. The target was made in a manner identical to the liquid standards except, gasoline and kerosene were both diluted (1:10 v/v) in the same aliquot of methylene chloride (containing nitrobenzene) and that mixture was spiked onto the Kimwipe TM . The consensus target was extracted and analyzed as described above. ! )&! 3.2.2 Surface-Treated Wood Samples Unfinished Red oak hardwood flooring was purchased from a local home improvement store. The flooring boards were cut into 4.2 cm x 7 cm rectangles using a compound miter saw ® (Delta Power Equipment Corporation, Anderson, SC). The boards were 2.9 cm thick. A Watco TM ® Danish Oil finish (Rust-oleum Corporation, Vernon Hills, IL) was applied to the wood with a disposable foam brush, as indicated by the manufacturer. More finish was applied to areas that soaked up the finish. Thirty minutes after the first application, another coat was applied and allowed to soak for an additional 15 minutes before the excess oil was removed with a dry cloth, as per the manufacturer’s instructions. Samples of untreated (n=3) and treated (n=3) wood were placed in separate nylon bags containing an activated charcoal strip, then extracted using the passive headspace procedure described above. Following extraction, the charcoal strips were eluted with methylene chloride and the extracts analyzed by GC-MS, as described above. A NIST library search was performed on the TICs of these samples in order to identify the compounds inherent to the wood and to the surface treatment. 3.2.3 Inherent Matrix Interference Samples Gasoline (1:10 v/v) and kerosene (9:100 v/v) were diluted in methylene chloride containing nitrobenzene (0.2M) as the internal standard. The same dilution factor was used for all of the evaporation levels of that liquid. Next, 20 µL of the diluted ignitable liquid was spiked onto a 4.2 cm x 7 cm rectangle of treated, unburned wood. This procedure was used to create five samples per evaporation level of both ignitable liquids, resulting in a total of 30 samples. The samples were then sealed in nylon bags and underwent the passive headspace extraction ! )'! with subsequent analysis by GC-MS as described previously. Each sample was analyzed in triplicate, resulting in a final data set of 90 chromatograms. This data set was used to investigate the effect of interferences inherent to the matrix on the association of the samples to their respective standards. 3.2.4 Determination of Optimal Burn Time The optimal burn time for the wood samples was determined by applying a propane torch ® (Bernzomatic , Medina, NY) to the surface-treated wood squares for 30, 60, 90, 120, 150, and 180 seconds. An over-turned beaker was used to distinguish any flames still observed beyond the burn time evaluated. The wood squares were sealed in nylon bags with activated charcoal strips, then subjected to the same extraction and analysis procedures described previously. Unburned, but treated, wood was also analyzed, in a similar manner, and used for comparison with the chromatograms from the burned samples. The burn time that generated the most abundant matrix interferences was selected and used throughout the rest of the study for the matrix interference/thermal degradation and simulated fire debris samples. 3.2.5 Matrix Interference/Thermal Degradation Samples The diluted ignitable liquid standards prepared in section 3.2.1 were spiked onto separate 4.2 cm x 7 cm rectangles of treated wood, which were previously burned for 30 seconds by applying a propane torch. This procedure was repeated to create five samples per evaporation level of both ignitable liquids, resulting in a total of 30 samples. The samples were then sealed in nylon bags containing activated charcoal strips and, again, underwent the passive headspace extraction with subsequent analysis by GC-MS. Each sample was analyzed in triplicate, resulting ! )(! in a final data set of 90 chromatograms. This data set was used to investigate the effect of thermal degradation of the surface treatment, as well as the inherent matrix interferences and evaporation of the ignitable liquids, on the association of samples to their respective standards. 3.2.6 Simulated Fire Debris Samples Each ignitable liquid standard was spiked onto separate 4.2 cm x 7 cm rectangles of treated wood, then a propane torch was applied for 30 seconds. The spike volumes were 225 µL of gasoline and 115 µL of kerosene. These spike volumes were used for each evaporation level of both ignitable liquids. The burned samples were placed in separate nylon bags containing activated charcoal strips. To each sample, 20 µL of methylene chloride with nitrobenzene (0.2 M) as the internal standard was added. This procedure was used to create five samples per evaporation level of both ignitable liquids, resulting in a total of 30 samples. The samples, again, underwent the passive headspace extraction procedure and were analyzed by GC-MS. Each sample was analyzed in triplicate, resulting in a final data set of 90 chromatograms. This data set was used to investigate the effects of thermal degradation of both the surface treatment and the ignitable liquid, as well as the evaporation of the liquid and inherent matrix interferences, on the association of the samples to their respective standards. 3.2.7 Analysis of Samples by GC-MS All samples were analyzed using an Agilent 6890N gas chromatograph, coupled to an Agilent 5975C mass spectrometer, and equipped with an Agilent 7683B autosampler (Agilent Technologies, Palo Alto, CA). The GC contained an Agilent HP-5MS capillary column (30 m x 0.25 mm I.D. x 0.25 µm film thickness). The carrier gas was ultra high purity helium (Airgas, ! ))! East Lansing, MI), at a nominal flow rate of 1mL/min. One µL of each sample was injected using the pulsed, splitless mode, with a pressure of 15 psi for 0.25 minutes. The inlet was maintained at 250 °C. The GC oven temperature program was as follows: 40 °C for 3 min, 10 °C/min to 280 °C, hold for 4 min. The transfer line was maintained at 280 °C and the mass spectrometer was operated in electron ionization mode (70 eV). Full mass scan mode was used, scanning the range 50 to 550 amu, with a scan rate of 2.91 scans/s. 3.2.8 Data Pretreatment Data pretreatment was performed on the total ion chromatograms (TICs) of the standards’ and samples’ extracts within each data set. The Savitsky-Golay smooth was performed in the © ChemStation Enhanced Data Analysis Software (version E.01.01.335, Agilent Technologies). A correlation optimized warp (COW) alignment was used to align all the TICs to the TIC of the consensus target. This alignment was performed using LineUp TM (version 1.0.62, Infometrix, Inc., Bothwell, WA). Many combinations of the warp and segment size were investigated and the alignment afforded by each combination was evaluated based on visual assessment of the aligned chromatograms. The parameters offering optimal alignment were a warp of 3 and a segment size of 75 and this combination was used to align all data sets. Next, the TICs were subjected to a total area normalization procedure, which was performed using Microsoft Excel (version 12.0.6425.1000, Microsoft Corp., Redmond, WA). For a specific evaporation level, the total area of each chromatogram (n=15) across all retention times was calculated and then the average area of all 15 chromatograms was calculated. To perform the normalization, each chromatogram was divided by its total area and then multiplied ! )*! by the corresponding average. This process was repeated for each evaporation level of each ignitable liquid, for both the standards and the samples. 3.2.9 Principal Components Analysis Principal components analysis was performed on the pretreated TICs of the ignitable ® ® liquid standards using MatLab (version 7.11.0.584, Mathworks , Natick, MA). Scores for each standard were generated, along with the eigenvectors and corresponding eigenvalues for each principal component described. The scores for the standards on PC1 and PC2 were graphed in Microsoft Excel to create the scores plot for the ignitable liquid standards. The eigenvectors for PC1 and PC2 were plotted against the retention time (also in Microsoft Excel) to create the loadings plot for each PC. The samples in the inherent matrix interference data set were then projected onto the scores plot generated for the liquid standards. In order to project the scores, the TICs of the samples were mean centered. To do this, the average abundance at each retention time in the liquid standards was calculated and then subtracted from the corresponding abundance in the TIC of the sample. To calculate the score for a sample on PC1, the mean-centered data for that sample was multiplied by the eigenvector for PC1 (generated from Matlab). The product was summed across all retention times to generate the score on PC1. The score on PC2 was calculated in a similar manner, using the eigenvector for PC2. This was repeated for all samples and the calculated scores of the samples were graphed onto the scores plot of the standards. This procedure was repeated for the remaining two data sets, resulting in a total of three scores plots, in addition to the scores plot of the liquid standards. ! )+! Each scores plot was used to visually assess the association of samples to their respective standards despite evaporation, matrix interferences, and thermal degradation. 3.2.10 Pearson Product Moment Correlation Coefficients Pearson product moment correlation coefficients were calculated in MatLab. Coefficients were calculated for all pairwise comparisons of the standards and the samples within each data set, as well as among the replicates of the standards and samples. The comparison of the standards and samples was used to investigate the similarity between the sample and its respective standard. The comparison of standards’ replicates illustrated the precision of the sample preparation, headspace extraction, and GC-MS analysis procedures, while the comparison of the samples’ replicates additionally included variability introduced by the burning process. 3.3 Results and Discussion 3.3.1 Characterization of Compounds Present in Ignitable Liquid Standards 3.3.1.1 Gasoline Exemplar total ion chromatograms (TICs) of each evaporation level for the gasoline standards are shown in Figure 3.1. The compounds characteristic of 0% evaporated gasoline (Figure 3.1A) are toluene, the C2-, C3-, and C4-alkylbenzenes, and the methylnaphthalenes. The evaporation of gasoline leads to a loss, or decrease in abundance, of the earlyeluting, more volatile, compounds. At 50% evaporation (Figure 3.1B), there is a decrease in ! ),! Internal Standard A C2-Alkylbenzenes Toluene C3-Alkylbenzenes C4-Alkylbenzenes Methylnaphthalenes Abundance B C 0 Retention Time (min) Figure 3.1: Total ion chromatograms of A) 0%, B) 50%, and C) 90% evaporated gasoline. The internal standard used was nitrobenzene. ! )-! 20 abundance of toluene, while the later-eluting compounds become concentrated. At 90% evaporation (Figure 3.1C), there is significant evaporative loss of toluene as well as a decrease in abundance of the C2-alkylbenzenes. The later-eluting C3- and C4-alkylbenzenes, as well as the methylnaphthalenes, become more concentrated, leading to an increase in abundance of these compounds. 3.3.1.2 Kerosene Exemplar TICs of each evaporation level for the kerosene standards are shown in Figure 3.2. The 0% evaporated kerosene (Figure 3.2A) contains normal (n)-alkanes in a Gaussian distribution. The kerosene used in this project contained n-alkanes C9 through C17, in addition to a myriad of branched and aromatic alkanes. The C17 peak, however, is in such low abundance that its existence is not immediately obvious in the 0% evaporated standard. Once again, the evaporative process results in the loss, or decrease in abundance, of early-eluting, more volatile, compounds. At 50% evaporation (Figure 3.2B), there is evaporative loss of the early-eluting aromatic compounds, as well as the C9 and C10 n-alkanes, while the abundances of the later-eluting alkanes increase. At this evaporation level, a Gaussian distribution of the remaining n-alkanes is still obvious. At 90% evaporation (Figure 3.2C), there is a significant loss of the n-alkanes up to C13. The abundance of the remaining compounds is markedly increased. At this evaporation, the C17 peak is also visible for the first time. Additionally, the ratios of the later-eluting n-alkanes change from the Gaussian distribution described for the 0% and 50% evaporated kerosene. At 90% evaporation, C15 becomes the most ! *.! Internal Standard C11 C9 C12 C10 C13 A C14 C15 C16 C17 Abundance B C 0 Retention Time (min) Figure 3.2: Total ion chromatograms of A) 0%, B) 50%, and C) 90% evaporated kerosene. The internal standard used was nitrobenzene. ! *%! 20 abundant compound, followed by C14, then C16 whereas, before, the abundance of C15 was between C14 and C16. 3.3.2 Association and Discrimination of Ignitable Liquid Standards A combined total of approximately 84% of the variance within the data set is described by the first and second principal components (PC1 and PC2, respectively) in the scores plot of the ignitable liquid standards (Figure 3.3). Replicates of each liquid are clustered closely and each liquid forms a distinct cluster from the others. From visual assessment of the scores plot, ignitable liquid type can be discriminated on PC1. Additionally, each evaporation level for both liquids can be distinguished when using both PCs. The gasoline standards are positioned positively on PC1, whereas the kerosene standards are positioned negatively. This difference in positioning can be explained by using the loadings plot for PC1 (Figure 3.4). The plot shows that toluene, the C2-, C3- and C4- alkylbenzenes are weighted positively. These compounds are present in the gasoline standards, thus explaining the positive positioning on the scores plot. The n-alkanes (C11-C17) are weighted negatively on PC1 in the loadings plot. These compounds are present in the kerosene standards, thus explaining the negative positioning on PC1 in the scores plot. It should be noted that since the n-alkanes C9 and C10 have a low weighting in the loadings plot, these compounds do not contribute significantly to the positioning of the kerosene standards on PC1 in the scores plot. The 50% and 90% evaporated standards of gasoline are positioned more positively, while the 50% and 90% evaporated kerosene standards are positioned more negatively, on PC1 than ! *&! Principal Component 2 (15.61%) 4.5E6 -4.0E6 4.0E6 -4.5E6 Principal Component 1 (68.82%) Figure 3.3: Scores plot of PC1 versus PC2 based on the total ion chromatograms for gasoline and kerosene at the three different evaporation levels. In terms of color, blue, green, and purple represent 0%, 50%, and 90% evaporated kerosene while red, orange, and yellow represent 0%, 50%, and 90% evaporated gasoline. For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this thesis. ! *'! 0.25 C3-Alkylbenzenes Principal Component 1 C2-Alkylbenzenes Toluene C11 C17 C12 C16 C13 C14 C15 -0.25 0 Retention Time (min) Figure 3.4: Loadings plot of PC1 based on the total ion chromatograms of the unevaporated and evaporated ignitable liquid standards. ! *(! 20 their 0% evaporated counterparts. These positioning shifts can be explained by examining the chromatograms of the standards at each evaporation level (Figures 3.1 and 3.2), as well as the loadings plot for PC1 (Figure 3.4). As gasoline is evaporated to 50%, toluene decreases in abundance, while the remaining compounds increase in abundance. A slight increase in the C2and C3-alkylbenzenes, which are more positively weighted than toluene in the loadings plot, results in a more positive positioning of the 50% evaporated standard on PC1 in the scores plot, compared to the 0% evaporated standard. At 90% evaporation, toluene is present at very low abundance and the C2-alkylbenzenes decrease in abundance. Conversely, the C3- and C4alkylbenzenes increase significantly in abundance. Collectively, the C3- and C4-alkylbenzenes are more positively weighted than toluene and the C2-alkylbenzenes in the loadings plot. The marked increase of these late-eluting compounds overcompensates for the decrease in abundance of toluene and the C2-alkylbenzenes, which results in a more positive positioning of the 90% evaporated gasoline standards on PC1 in the scores plot, compared to the 0% and 50% evaporated standards. The more negative positioning of the evaporated kerosene standards on PC1, compared to the 0% evaporated standard, can be explained similarly. As kerosene is evaporated, some of the earlier eluting compounds (C9-C13) undergo varying degrees of evaporative loss. While the characteristic kerosene compounds are all weighted negatively in the loadings plot for PC1, these earlier eluting compounds are less heavily weighted and, therefore, do not contribute greatly to the positioning of the standards on the scores plot. The most heavily weighted compounds in the loadings plot are C13 through C16, which are concentrated as evaporation level increases. An ! *)! increase in concentration of the most heavily weighted compounds as kerosene is evaporated, therefore, explains the more negative position of the 50% and 90% evaporated standards on PC1 compared to the 0% evaporated standard. The loadings plot for PC2 (Figure 3.5) can be used, in a similar manner, to explain the positioning of the standards on PC2 in the scores plot. The 0% and 50% evaporated gasoline standards are positioned negatively, whereas the 90% evaporated standard is positioned positively. According to the loadings plot, the only compounds present in gasoline that contribute significantly to its positioning on this PC are toluene and the C2-alkylbenzenes. These compounds are weighted negatively and are present in highest abundances in the 0% and 50% standards, explaining the negative positioning of these standards in the scores plot. In addition, since these compounds are present in similar abundances in the 0% and 50% evaporated standards, the standards are positioned similarly on PC2. The 90% evaporated standard is positioned positively on PC2 in part, because of the lower abundance of toluene and the C2alkylbenzenes as a result of evaporation, and in part, due to the mean centering of the data. When the chromatographic data are mean centered, the average abundance at each retention time across all standards is subtracted from each standard chromatogram at the corresponding retention time. The result of this procedure is that sometimes compounds that are not originally present in the standard can be introduced into the chromatogram. For the gasoline standards, n-alkanes C13 through C17 were introduced into the chromatograms (Figure 3.6). The n-alkanes were present in high abundance in the kerosene standards, which means that the average value at that retention time was a large positive number. As a result, when the ! **! 0.25 C12 Principal Component 2 C11 C13 C10 C9 C14 Toluene C16 C2-Alkylbenzenes -0.25 0 C17 C15 Retention Time (min) Figure 3.5: Loadings plot of PC2 based on the total ion chromatograms of the unevaporated and evaporated ignitable liquid standards. ! *+! 20 C3-Alkylbenzenes Abundance C2-Alkylbenzenes Toluene C16 C13 C14 0 C17 C15 Retention Time (min) Figure 3.6: Mean-centered total ion chromatogram of the 90% evaporated gasoline standard demonstrating the introduction of n-alkanes from the kerosene standards. ! *,! 20 average was subtracted from the gasoline standards, the mean-centered data contained a negative contribution from these n-alkanes. The mean-centered data are then multiplied by the eigenvector for the PC to generate the score of the sample on that PC. For example, in the case of the 90% evaporated gasoline standard, the n-alkanes C14 through C17, which contribute negatively in the mean-centered data, are also weighted negatively on PC2. When these two negatives are multiplied, a positive loading results for each of the n-alkanes. It should be noted that C13, which is also negatively introduced in the chromatogram, is weighted positively in the loadings plot for PC2, resulting in one negative loading. Additionally, the average abundance of toluene, calculated across all standards, is greater than its abundance in the 90% evaporated gasoline standard; therefore, toluene is also negatively introduced into the chromatogram. Since toluene is negatively weighted in the loadings plot for PC2, and negatively introduced into the mean-centered chromatogram, it contributes positively to the positioning of the 90% evaporated gasoline standard the scores plot. The final score for the sample, which is graphed in the scores plot, is the sum of the loadings across all retention times. Overall, for the 90% evaporated gasoline standard, the final score is positive. For the 0% and 50% evaporated gasoline standards, the mean-centered data also contain negative contributions from the n-alkanes. However, in this case, more of these n-alkanes are weighted positively in the PC2 loadings plot. When the mean-centered data are multiplied by the eigenvector, the result is an increase in the number of negative loadings. When summed, the negative loadings cancel out many of the positive loadings that contribute to the positive positioning of the 90% evaporated gasoline standard in the scores plot. Hence, the introduction of the n-alkanes, in the neat and 50% evaporated gasoline standards, does not contribute ! *-! significantly to their positioning in the scores plot. Thus, positioning of the 0% and 50% evaporated gasoline standards is more affected by the decrease in abundance of toluene and the C2-alkylbenzenes than by the compounds introduced during the process of mean centering. The 0% and 50% evaporated kerosene standards are positioned positively on PC2, while the 90% evaporated standard is positioned negatively on this PC in the scores plot. In the loadings plot for PC2 (Figure 3.5), C9-C13 n-alkanes are weighted positively and C14-C17 are weighted negatively. The 0% evaporated kerosene standard contains all of these compounds. Overall, more of the n-alkanes are weighted positively, and are more heavily weighted, than the n-alkanes that are weighted negatively. This results in the positive positioning of the 0% evaporated kerosene standard in the scores plot. The 50% evaporated standard contains similar abundances of the positively weighted (C11-C13) and negatively weighted (C14-C17) n-alkanes. Because the positively weighted compounds contribute more on this PC than the negatively weighted compounds, the 50% evaporated standard is also positively positioned in the scores plot. However, the 50% standard is less positively positioned on PC2 than the 0% evaporated standard due to evaporative loss of C9 and C10, which are weighted positively in the PC2 loadings plot. The 90% evaporated standard is positioned negatively on PC2 because it contains only one compound (C13) that is weighted positively on this PC, while the remaining compounds (C14-C17) are all weighted negatively. ! +.! 3.3.3 PPMC Coefficients for Ignitable Liquid Standards Mean PPMC coefficients calculated for pairwise comparisons of replicates, at each evaporation level for the gasoline and kerosene standards, demonstrate the precision of the extraction and analysis procedures (Table 3.1). In theory, the PPMC coefficients calculated for replicates should be 1, indicating complete correlation. In reality, however, a value of 1 is difficult to attain due small imprecisions in the measured spike volume of the ignitable liquid, the variability in the passive headspace extraction procedure, and variability in the GC-MS analysis. All of the replicates for each evaporation level are strongly correlated with a coefficient greater than 0.98. The strong correlations, coupled with the small standard deviations for the coefficients, indicate that the extraction and analysis procedures are precise. 3.3.4 Characterization of Compounds Present in Surface-Treated Wood Flooring The most identifiable compounds present in the chromatograms of the unburned, surface-treated wood are the C9, C10, C11, and C12 n-alkanes (Figure 3.7). This is an important observation since all of these alkanes are also present in the kerosene standards. Also present in the surface treatment are branched and cyclic alkanes, as well as aldehydes. It is important to note that all of the compounds present in the chromatogram of the treated wood come from the treatment itself and not from the wood. Extraction and analysis of untreated wood, in a similar manner, yielded two very small and unidentifiable peaks at the beginning of the resulting chromatogram. Peaks at those retention times were not present in the chromatogram of the treated wood. ! +%! Table 3.1: Mean Pearson product moment correlation coefficients ± the standard deviations calculated for replicates of standards at each evaporation level (n=105). Ignitable Liquid Standard Evaporation Level 0% Gasoline 50% Gasoline 90% Gasoline 0% Kerosene 50% Kerosene 90% Kerosene ! Mean PPMC Coefficient ± Standard Deviation (n=105) 0.9976 ± 0.0017 0.9947 ± 0.0039 0.9956 ± 0.0027 0.9969 ± 0.0023 0.9950 ± 0.0036 0.9839 ± 0.0184 +&! C10 C11 A C12 C9 Abundance B C 0 Retention Time (min) Figure 3.7: Total ion chromatograms of extracts of surface-treated wood burned for A) 0 seconds, B) 30 seconds, and C) 150 seconds. ! +'! 20 The absence of compounds in the chromatograms of the untreated wood could be for a number of reasons. Before wood is used in homes, it oftentimes goes through an intense drying stage; therefore, this wood may have been sufficiently dried so that all of the volatiles were removed or present at extremely low abundance. 3.3.5 Optimization of Burn Times The burn times investigated were 30, 60, 90, 120, 150, and 180 s. A sample of unburned, surface-treated wood was also analyzed for comparison purposes. Exemplar chromatograms are shown in Figure 3.7. The unburned, surface-treated wood showed the most abundant matrix interferences (Figure 3.7A). As burn time increased, there was a marked decrease in abundance of matrix interference compounds. At burn times greater than 150 s (Figure 3.7C), no peaks were observed in the chromatograms. This is likely due to the flame removing the entire layer of the surface treatment. This hypothesis is strengthened by the fact that, as the burn time increases, the abundances of the peaks decreases until C10 and C11 are barely visible at 150 seconds. Since the matrix interference/thermal degradation and simulated fire debris samples require that the wood be burned, the burn time had to be balanced with the observed decrease in abundance of the interferences. Based on this compromise, a burn time of 30 s (Figure 3.7B) was used to create the matrix interference/thermal degradation and simulated fire debris samples. This short burn time ensured that the abundances of the interferences were maximized. Shorter burn times between 0 and 30 s were not investigated because a shorter time period could further contribute to the irreproducibility of the burning process. In terms of the compounds present, the chromatograms of the unburned and burned surface-treated wood appear very similar (Figure 3.7A and B). It was expected that compounds ! +(! from the surface treatment be degraded by the heat of the propane flame, generating additional compounds. However, this does not appear to be the case for any of the burn times evaluated. 3.3.6 Association of Samples to Corresponding Standards in the Presence of Inherent Matrix Interferences and Thermal Degradation Principal components analysis was performed on the data set containing liquids extracted from the unburned surface-treated wood to investigate the effect of matrix interferences on the association to the liquid standard. Similarly, PCA on the data set containing liquids extracted from the burned, surface-treated wood was used to investigate the effect of both matrix interferences and thermal degradation on the association. In general, similar trends were observed in the scores plots for both data sets and hence, only results from the liquids extracted from the burned surface-treated wood will be discussed in detail. Scores for the gasoline and kerosene samples extracted from the burned matrix were calculated and projected onto the scores plot generated for the liquid standards (Figure 3.8). The gasoline samples are all positively positioned on PC1 in the scores plot, similarly to the corresponding standards, whereas the kerosene samples are negatively positioned on this PC, similar to the kerosene standards. Thus, the gasoline and kerosene samples can be associated to their corresponding standards by liquid type. The samples cannot, however, be associated to their respective standards in terms of evaporation level. Even though the gasoline and kerosene samples are clustered by evaporation level, the samples are spaced too far apart from their respective standards to be associated to them based solely on visual assessment of the plot. The less positive positioning of the gasoline samples on PC1 than their corresponding standards in the scores plot, is due to differences in abundance of the gasoline compounds in the ! +)! Principal Component 2 (15.61%) 1.5E6 -2.5E6 2.5E6 -1.5E6 Principal Component 1 (68.82%) Figure 3.8: Scores plot of PC1 versus PC2 based on the total ion chromatograms for the ignitable liquid standards, represented by the squares, and the projected scores of the matrix interference/thermal degradation samples, represented by the circles. In terms of color, blue, green, and purple represent 0%, 50%, and 90% evaporated kerosene while red, orange, and yellow represent 0%, 50%, and 90% evaporated gasoline. ! +*! chromatograms. While these differences in abundance can be observed in all of the gasoline compounds, only those compounds present in the loadings plot for PC1 will affect the positioning of the samples in the scores plot. This negative shift is exhibited by all of the gasoline samples, but it is most significant between the 50% evaporated gasoline samples and standards. In the chromatograms of the 50% evaporated samples, the abundance of the characteristic compounds are a factor of 0.25 to 0.5 less than the abundance of the corresponding compounds in the standards (Figure 3.9). The decrease in abundance is translated in the scores plot such that the scores on PC1 for the 50% evaporated samples are still positive, but one-fourth to one-half of the magnitude of their respective standards. This decrease in abundance of varying degrees between standards and samples is true at all evaporation levels, resulting in less positive positioning of the samples on PC1 compared to the standards. The range in abundance of gasoline compounds across all gasoline samples, regardless of evaporation level, also resulted in the spread of the samples on PC1. The variations in abundance of compounds are likely due to the porous nature of the wood, since some of the ignitable liquid soaked into the wood. Additionally, the presence of the surface treatment may have affected the extent to which the gasoline soaked into the wood. The compounds may not have been entirely available for extraction using the passive headspace procedure, which led to the range in abundance of gasoline compounds observed in the chromatograms and illustrated by the scores plot. The positive shift of the kerosene samples in comparison to their respective standards on PC1 can also be described in a similar manner. A decrease in abundance of the n-alkanes in the samples, which are negatively weighted in the loadings plot for PC1, translated into the positive shift of the kerosene samples in the scores plot. The differences in the abundances of the n- ! ++! Abundance 1E6 0 0 Retention Time (min) 13 Figure 3.9: Total ion chromatograms of a 50% evaporated gasoline standard (green) and two 50% evaporated gasoline inherent matrix interference samples (red and black), demonstrating the differences in abundance between the standards and samples. ! +,! alkanes are greater than those observed in the gasoline samples, resulting in greater shift of the kerosene samples compared to the corresponding standards than previously observed for the gasoline samples and standards. The shift in positioning of the gasoline and kerosene samples compared to their standards on PC2 can be explained in a similar manner, based on differences in abundance. All samples exhibit spread on PC2 except the 90% evaporated gasoline and 50% evaporated kerosene samples. For gasoline, only toluene and the C2-alkylbenzenes affect the positioning of the samples on PC2, according to the loadings plot for this PC (Figure 3.5). At 90% evaporation, the gasoline samples do not contain a significant abundance of toluene, and the C2-alkylbenzenes are present at the lowest abundance of all the evaporation levels. As a result, differences in abundances of these compounds results in minimal spread on this PC. The 50% evaporated kerosene samples exhibit less spread on PC2 as opposed to on PC1 because of the weighting of the n-alkanes present in the loadings plots for both PCs. The chromatograms of the 50% evaporated kerosene samples contain n-alkanes C11-C16 in a Gaussian distribution. In the loadings plot for PC2, C11-C13 are positively weighted and C14C16 are negatively weighted. The positively and negatively weighted n-alkanes are present in collectively equal abundances in the sample chromatograms; however, the positively weighted nalkanes are more heavily weighted in the loadings. As a result, any variation in abundance of these n-alkanes will both positively and negatively affect the scores of the samples similarly. Specifically, a decrease in abundance of the positively weighted n-alkanes will be minimized by the proportional decrease in abundance of the negatively weighted n-alkanes. Consequently, differences in abundances of n-alkanes of replicates will not create significant spread. This can ! +-! be contrasted to the loadings for PC1 in which all of the n-alkanes load negatively; therefore, any decrease in the abundance of n-alkanes will result in an entirely positive shift in positioning of the samples. It should be noted that the presence of the surface treatment does not significantly affect the overall positioning of the samples for two reasons. First, although the surface treatment contributes C9-C12 to the chromatograms of the samples, only C11 and C12 are present in the PC1 loadings plot. Furthermore, C11 and C12 are not heavily weighted in the loadings plot for PC1 so they do not contribute significantly to the positioning of the samples on PC1 in the scores plot. Secondly, the surface-treated wood was burned 30 s prior to being spiked, which diminishes the abundances of these compounds in the chromatograms. Because of the low abundances, the surface treatment does not contribute significantly to the positioning of the scores on PC2 either, even though the compounds are more heavily weighted in the loadings plot for this PC. The scores plot generated for the liquids extracted from the unburned surface-treated wood samples displayed the same general trends in terms of positioning of the samples compared to the corresponding liquid standards (Figure 3.10). However, there was one notable difference: on PC2, all samples, except for 90% evaporated gasoline, exhibited a positive shift compared to the corresponding standard. This shift was not apparent in the scores plot for the liquids extracted from the burned wood samples (Figure 3.8). The positive shift is due to the addition of the C9C12 n-alkanes from the surface treatment, which are weighted positively in the loadings plot for PC2. A visual comparison between the matrix interference samples and the corresponding matrix interference/thermal degradation samples reveals that the abundance of the n-alkanes from the surface treatment is much higher in the chromatograms of the former data set. The differences in ! ,.! Principal Component 2 (15.61%) 1.5E6 2.5E6 -2.5E6 -1.5E6 Principal Component 1 (68.82%) Figure 3.10: Scores plot of PC1 versus PC2 based on the total ion chromatograms for the ignitable liquid standards, represented by the squares, and the projected scores of the inherent matrix interference degradation samples, represented by the circles. In terms of color, blue, green, and purple represent 0%, 50%, and 90% evaporated kerosene while red, orange, and yellow represent 0%, 50%, and 90% evaporated gasoline. ! ,%! abundance of n-alkanes between the two data sets is due to the fact that a flame was applied to the surface-treated wood to generate the matrix interference/thermal degradation samples, but not to generate the inherent matrix interference samples. The burning resulted in a decrease in abundance of n-alkanes in one data set, but not in the other. Therefore, samples from the matrix interference data set are positioned more positively on PC2 than the standards due to the high abundance of the matrix interference compounds. The positive shift in positioning on the scores plot is not exhibited for the 90% evaporated gasoline inherent matrix interference samples, which are similarly positioned to the corresponding samples in the matrix interferences/thermal degradation scores plot. A visual comparison of the 90% evaporated gasoline sample chromatograms from both data sets reveals little difference in the abundance of the n-alkanes from the surface treatment or the compounds characteristic of gasoline, thus explaining their similar positioning. A possible explanation for the similarities of the sample chromatograms from both data sets is analyst error. The 90% evaporated gasoline samples from both data sets were generated on the same day. It is possible that the surface-treated wood pieces were all burned before the evaporated gasoline was spiked onto them. In this way, two sets of 90% evaporated gasoline matrix interference/thermal degradation samples may have been, inadvertently, generated. This would explain the lower abundance of the n-alkanes from the surface treatment in the 90% evaporated gasoline samples in the inherent matrix interference data set as opposed to the higher abundance of n-alkanes in the rest of the samples within this data set. It also explains why these same 90% evaporated gasoline samples did not exhibit the expected positive shift on PC2 in the scores plot. Differences in abundances of compounds between standards and samples, as well as among replicates, are the main factors contributing to the shift in positioning of the samples on ! ,&! the scores plot, compared to the corresponding standards, as well as the spread exhibited by them. This is a result of inadequate normalization procedures. Specifically, the total area normalization performed was able to minimize the variations between abundances of replicates of each individual sample, but not across all samples. This could potentially have been corrected by normalizing to an internal standard in addition to the total area normalization, but the internal standard in this study was affected by the porosity of the wood to the same extent that the ignitable liquids were. This resulted in variations in abundance of the internal standard even though it was applied to the samples and standards at the same concentration. The raw chromatograms of the sample replicates contain identical compounds, but in different abundances; therefore, better normalization procedures should be able to reduce the spread in the scores plot. Improved normalization procedures may also increase the association of the samples to the standards in both data sets because the positioning of the samples is predominantly determined by the abundance of compounds from the ignitable liquids, which vary greatly from standards to samples. 3.3.7 PPMC Coefficients for Inherent Matrix Interference Samples The calculated mean PPMC coefficients for replicates of the inherent matrix interference samples are greater than 0.92, indicating strong correlation among replicates for each evaporation level of each liquid (Table 3.2). Even the samples, such as the 90% evaporated kerosene samples, that are positioned both negatively and positively on PC2, are strongly correlated to one another. Initially these results may seem to conflict with one another, but they do not. This is because PCA and PPMC coefficients are two fundamentally different statistical ! ,'! Table 3.2: Mean Pearson product moment correlation coefficients ± standard deviations for replicates of the inherent matrix interference samples (n=105) and for samples to 0% evaporated gasoline and kerosene (n=225) standards. Ignitable Liquid Sample Evaporation Level 0% Gasoline 50% Gasoline Mean PPMC Coefficient ± Standard Deviation Sample Replicates 0% Evaporated 0% Evaporated (n=105) Gasoline (n=225) Kerosene (n=225) 0.9356 ± 0.0525 0.5205 ± 0.1370 0.4492 ± 0.0530 0.9672 ± 0.0270 0.5978 ± 0.0830 0.4346 ± 0.0282 90% Gasoline 0% Kerosene 0.6251 ± 0.0557 0.2756 ± 0.0481 0.4813 ±0.0242 0.5639 ± 0.0751 50% Kerosene 90% Kerosene ! 0.9794 ± 0.0179 0.9726 ± 0.0264 0.9650 ± 0.0245 0.9240 ± 0.0586 0.3011 ± 0.0421 0.3535 ± 0.0485 0.6476 ± 0.0674 0.6184 ± 0.0588 ,(! procedures that highlight different aspects of the data (variations and similarities) within the data set. Principal components analysis identifies and emphasizes specific variables across a data set that describe the majority of the variance, in order to discriminate samples from one another. Consequently, samples are not discriminated based on all of the compounds in the chromatograms; only specific compounds are considered. The extent of the discrimination is based on the magnitude of compound’s contribution to the variance in the data set, as well as the abundance of the compound in the sample chromatogram. Pearson product moment correlation coefficients, on the other hand, provide a point-bypoint comparison between two chromatograms in an effort to describe the similarity or extent of correlation between samples. As a result, coefficients are affected by differences in the retention time at which a peak begins, reaches the apex, and ends. Even for peaks with an apex at the same point, differences in the width of the peak translate into differences in the beginning and end retention times of the peak, which lowers the coefficient. As a result of these fundamental differences, samples that contain the same compounds, in different abundances, may seem to be discriminated by PCA, yet be strongly correlated according to PPMC coefficients. This is demonstrated by the 90% evaporated kerosene samples, which are positioned positively and negatively in the scores plot (Figure 3.10) but are strongly correlated (Table 3.2). For PCA, the C9-C13 n-alkanes are weighted positively on PC2, while C14-C17 are weighted negatively. Kerosene evaporated to 90% by volume contains C13-C17, which results in an overall negative positioning of the standards, but the addition of the positively weighted C9-C13 from the surface treatment results in a positive shift of the samples. ! ,)! Differences in abundance of the positively weighted n-alkanes in the surface treatment correspond to the extent of the positive shift; large abundances will result in a positive score whereas small abundances will result in a negative score for the samples. Since PPMC coefficients are insensitive to differences in overall abundance, a point-by-point comparison of the peaks, in the chromatograms of 90% evaporated kerosene samples, resulted in a strong correlation because the peak widths and, consequently, the relative abundance of the data points at each retention time do not vary significantly between the chromatograms of samples. Even though replicates of samples are strongly correlated, the coefficients are less than those calculated for replicates of the standards (Table 3.1). This observation can be explained by differences in the width of the peaks for corresponding compounds across normalized sample chromatograms. These differences are not significant and, as a result, the calculated coefficients were not significantly impacted; however, these minor differences in the width of the peaks led to small differences in the abundance between the data points at each retention time, which reduced the correlation between replicates of samples in comparison to replicates of standards. Some compounds from the ignitable liquids, surface treatment, and the internal standard vary enough in abundance to impact the peak widths across sample replicates (Figure 3.11). This variation is further reflected by the increased standard deviations for the coefficients of the sample replicates as opposed to those of the standard replicates. Since the extraction and analysis procedures were demonstrated to be precise, the differences in abundance and, therefore variations in the peak widths of compounds from the ignitable liquids and internal standard and, therefore, relative abundance of corresponding data points, are likely due to the porous nature of the wood matrix. As a result of this porosity, some of the ignitable liquids may have soaked into the wood and therefore, been unavailable for adsorption onto the charcoal strip during the ! ,*! Abundance 8E5 0 7.63 Retention Time (min) 7.80 Figure 3.11: Total ion chromatograms of a kerosene standard (red) and a matrix interference/thermal degradation sample (black), demonstrating the difference in peak width between the standards and samples. The peak depicted here is the C10 n-alkane. ! ,+! passive headspace extraction. Additionally, the surface treatment works by penetrating into the wood, which could affect the degree to which the ignitable liquid can soak into the wood and therefore, the extent of its availability during the extraction. Thus, the variability observed in the chromatograms for these particular samples is likely due to the properties of the matrix before the burning process even occurs. Mean PPMC coefficients demonstrate that most samples are moderately correlated to their corresponding 0% evaporated standard. Strong correlation between samples and standards was not expected, especially between the gasoline samples and standards, due to the addition of the C9-C13 n-alkanes from the surface treatment. The gasoline standards do not contain the nalkanes; therefore, the introduction of these compounds into the chromatograms of the gasoline samples decreases the correlation between the standards and samples. Since the kerosene standards already contain the n-alkanes from the surface treatment, the application should not have a significant negative impact on the correlation between standards and samples. The surface treatment does, however, contain compounds other than the n-alkanes that the kerosene does not contain, such as aldehydes, which will negatively impact the correlation. Weak correlation was observed between most samples and the other non-corresponding 0% evaporated standard. For example, the 90% evaporated kerosene samples are moderately correlated to 0% evaporated kerosene standards and weakly correlated to the 0% evaporated gasoline standards. However, the standard deviation of the calculated coefficients does increase some of these weak correlations above the threshold of 0.5, which indicates a moderate correlation. This is true of the correlation between the 0% and 90% evaporated gasoline samples with the 0% evaporated kerosene standard. ! ,,! While the gasoline samples are weakly correlated to the 0% evaporated kerosene standard, the coefficient is higher than that between the kerosene samples and the 0% evaporated gasoline standard. The higher correlation between gasoline samples and kerosene standards is likely due to the addition of n-alkanes to the gasoline samples, which are present in the kerosene standards. In these cases, the chromatograms of the gasoline samples become more similar to the kerosene standards, due to the presence of these alkanes in both the samples and standards. 3.3.8 PPMC Coefficients for Matrix Interference/Thermal Degradation Samples The mean PPMC coefficients calculated for replicates of the matrix interference/thermal degradation samples at each evaporation level are greater than 0.91, indicating that replicates are strongly correlated (Table 3.3). The calculated coefficients of replicate samples, however, are not as high as the coefficients of replicate standards. The overall decrease in mean coefficients is due to differences in relative abundance of data points comprising peaks from the ignitable liquids, surface treatment, and internal standard as well as misalignment of the peak apexes. This is likely due to the liquids soaking into the wood and being retained so that the compounds from the liquid are not entirely available during the passive headspace extraction step. Furthermore, the large differences in relative abundance of corresponding data points within the peaks are reflected by the large standard deviations associated with the coefficients. The samples are moderately to strongly correlated to their corresponding 0% evaporated standards, even with the high standard deviations associated with the calculated coefficients. This range of coefficients was expected because the n-alkanes from the surface treatment are being introduced into the samples, but the burning process decreases the abundances, resulting in fewer points comprising the peaks, of these compounds between standards and samples. Fewer ! ,-! Table 3.3: Mean Pearson product moment correlation coefficients ± standard deviations for replicates of the matrix interference/thermal degradation samples (n=105) and for samples to 0% evaporated gasoline and kerosene (n=225) standards. Ignitable Liquid Sample Evaporation Level 0% Gasoline Mean PPMC Coefficient ± Standard Deviation Sample Replicates 0% Evaporated 0% Evaporated (n=105) Gasoline (n=225) Kerosene (n=225) 0.9192 ± 0.0742 0.7237 ± 0.1123 0.5392 ± 0.0267 50% Gasoline 90% Gasoline 0.8293 ± 0.0559 0.6545 ± 0.0658 0.4785 ± 0.0314 0.4999 ± 0.0245 0% Kerosene 50% Kerosene 0.9716 ± 0.0248 0.9360 ± 0.0599 0.4323 ± 0.0324 0.4479 ± 0.0702 90% Kerosene ! 0.9563 ± 0.0359 0.9690 ± 0.0354 0.9500 ± 0.0409 0.4492 ± 0.0569 0.7092 ± 0.0513 0.8266 ± 0.1001 0.6951 ± 0.0694 -.! data points in a peak translates into fewer retention times at which the relative abundance between corresponding data points can differ. For example, the differences in relative abundance between data points in the small peak widths in the gasoline samples, to which the n-alkanes were introduced, and the gasoline standards, which do not contain these n-alkanes, will be minimized. When accounting for the large standard deviations, the samples can be weakly to moderately correlated to the non-corresponding 0% evaporated standard, for reasons similar to those noted for the inherent matrix interference data set. It should be noted that the burning process minimizes peak widths from the application of the surface treatment and the number of data points comprising each peak; however, the mere presence of the peaks from the surface treatment will negatively impact the correlations between the standards and samples. 3.3.9 Association of Simulated Fire Debris Samples to Corresponding Standards Scores for simulated fire debris samples were calculated and projected onto the scores plot generated for the liquid standards to illustrate the effects of evaporation and matrix interferences, as well as thermal degradation of the liquids and matrix (Figure 3.12). The gasoline samples are positioned positively on PC1, as are their respective standards. Similarly, the kerosene samples are positioned negatively on this PC, as are their respective standards. This demonstrates that the fire debris samples containing gasoline and kerosene can be associated according to liquid type and differentiated from each other on PC1. The samples, however, could not be associated to their respective standards in terms of evaporation level for either ignitable liquid. This is due to spread in the samples on the scores plot as well as some shifts in positioning relative to their respective standards. Regardless, the explanation concerning the ! -%! Principal Component 2 (15.61%) 4.5E6 4.0E6 -4.0E6 -4.5E6 Principal Component 1 (68.82%) Figure 3.12: Scores plot of PC1 versus PC2 based on the total ion chromatograms for the ignitable liquid standards, represented by the squares, and the projected scores of the simulated fire debris samples, represented by the circles. In terms of color, blue, green, and purple represent 0%, 50%, and 90% evaporated kerosene while red, orange, and yellow represent 0%, 50%, 90% evaporated gasoline. ! -&! general positioning of the samples on PC1 and PC2 remains the same as that described previously for the standards. An obvious difference between the standards and samples is that the standards are tightly clustered, while the samples exhibit considerable spread, mainly on PC1. This is true for all samples and is mainly due to differences in abundances of compounds as a result of the porous nature of the wood, as well as the variability in the burning process. This is illustrated by the C2and C3- alkylbenzenes in the 50% evaporated gasoline samples (Figure 3.13). Even after normalization, there are still differences in abundance of these compounds among samples of the same evaporation level, despite using the same spike volume to generate the samples. The loadings plots (Figures 3.4 and 3.5) illustrate that the C2- and C3-alkylbenzenes are more heavily weighted on PC1 than on PC2. As a result, spread in the abundances of these compounds will lead to greater spread in their positioning on PC1 than on PC2. In spite of the spread observed for the kerosene samples, there is a clear negative shift on PC1 of the samples in comparison to their respective standards. This shift is also due to differences in abundance, but in this case, it is a difference in abundance of the n-alkanes in the samples compared to the standards. In this case, the spike volume used to generate the samples was greater than that used to generate the standards, resulting in the increase in abundance. A larger spike volume was needed so that compounds from the ignitable liquid would survive the burning process and exhibit thermal degradation effects; the smaller spike volume used to generate the standards would not allow for this to happen. The n-alkanes present in kerosene have a high, negative weighting on PC1 (Figure 3.4). As a result, an increase in abundance of ! -'! 1E6 0 7.63 Retention Time (min) 7.80 Figure 3.13: Total ion chromatograms of the C 2-alkylbenzenes from the five simulated fire debris samples generated using gasoline, demonstrating the variation in abundances across samples. ! -(! these compounds in the samples will translate to a more negative positioning of the samples in comparison to their respective standards on PC1. It is important to note that surface treatment does not have a large effect on the positioning of the samples on PC1. The surface treatment contains n-alkanes C9 through C12, but only C11 and C12 affect the positioning of the samples, according to the loadings plot for this PC (Figure 3.4). While C11 and C12 load negatively on PC1, these compounds are not very heavily weighted; therefore, the surface treatment provides only minimal contributions to positioning of the samples on PC1. The surface treatment does, however, greatly affect the positioning of the gasoline and kerosene samples on PC2. The loadings plot illustrates this for PC2 (Figure 3.5), where the nalkanes that are present in the surface treatment (C9-C12) load positively and are, collectively, heavily weighted. The addition of these compounds from the surface treatment results in the samples being more positively positioned on PC2 compared to their respective standards. This is especially illustrated by the positioning of the gasoline samples. The 90% evaporated gasoline samples are positioned even more positively on PC2 than the other gasoline samples because the more heavily weighted n-alkanes C12 and C13, which load positively, are present in higher abundances (by more than an order of magnitude) in the 90% evaporated samples in comparison to the other gasoline samples. This positive shift is also observed for the kerosene samples, albeit to a lesser extent. The shift is less obvious than for the gasoline because of the increase in abundance of the negativelyweighted n-alkanes that resulted from using a larger spike volume to create these samples. The increase in abundance of the n-alkanes may offset some of the positive contributions of the ! -)! surface treatment. The 90% evaporated and two of the 50% evaporated kerosene samples are not shifted on PC2 compared to the corresponding standards. In addition to the previous explanation, these samples also display less matrix contributions, in terms of abundance, from the surface treatment than replicates of the same samples. 3.3.10 PPMC Coefficients for Simulated Fire Debris Samples Mean PPMC coefficients calculated for pairwise comparisons of replicates were greater than 0.89 indicating strong correlation, even though the samples exhibited spread in the scores plot (Table 3.4). It may seem like the strong correlation conflicts with the extent of the spread observed; however, PPMC coefficients provide a measure of similarity using an entire chromatogram while PCA identifies and emphasizes specific peaks in the chromatogram that lead to the variance. The PPMC coefficients, in this case, demonstrate that the sample chromatograms contain similar peaks, while PCA highlights the differences in abundance of the peaks. The mean coefficients, however, are less than those calculated for pairwise comparisons of replicates of the standards (Table 3.1). The overall decrease in mean coefficients is due to differences in abundance of compounds from the ignitable liquids, surface treatment, and internal standard, which ultimately lead to differences peak width and larger differences in relative abundance between more data points. This is reflected in the large standard deviations associated with the coefficients. Again, these differences are likely due to the liquids soaking into the wood, which leads to the compounds from the liquid not being entirely available for extraction. In addition, when the liquids soak into the wood they become protected from the full effects of the burning process. Furthermore, these mean coefficients are similar to those calculated for the ! -*! Table 3.4: Mean Pearson product moment correlation coefficients ± standard deviations for replicates of the simulated fire debris samples (n=105) and for samples to 0% evaporated gasoline and kerosene (n=225) standards. Ignitable Liquid Sample Evaporation Level 0% Gasoline 50% Gasoline Mean PPMC Coefficient ± Standard Deviation Sample Replicates 0% Evaporated 0% Evaporated (n=105) Gasoline (n=225) Kerosene (n=225) 0.9530 ± 0.0323 0.3369 ± 0.0731 0.4414 ± 0.0334 0.9262 ± 0.0600 0.5322 ±0.0760 0.4655 ± 0.0435 90% Gasoline 0% Kerosene 0.2869 ± 0.0313 0.1808 ± 0.0180 0.4857 ± 0.0125 0.7526 ± 0.0182 50% Kerosene 90% Kerosene ! 0.9870 ± 0.0095 0.9831 ± 0.0106 0.8976 ± 0.0889 0.9731 ± 0.0245 0.1404 ± 0.0982 0.0860 ± 0.0444 0.6872 ± 0.0919 0.4646 ± 0.0766 -+! other two data sets. This similarity suggests that the majority of changes in the chromatograms are due to the matrix, which is characteristically porous, and has less to do with the irreproducible effects of the burning process. The 0% and 50% evaporated kerosene samples could be moderately associated to the 0% evaporated kerosene, even with the significant standard deviations associated with the calculated coefficients. However, the 90% kerosene samples could only be weakly to moderately correlated to the same standard. The n-alkanes from the surface treatment are present in very low abundances in the chromatograms of the 90% evaporated kerosene as opposed to the 0% and 50% evaporated samples. The 90% evaporated kerosene samples would not contain these compounds if not for the application of the surface treatment, but the 0% evaporated standards do. The addition of these compounds to the 90% evaporated kerosene should increase the correlation to the 0% evaporated standard; however, the small peak widths of the n-alkanes in the samples as opposed to the large peak widths in the standard, prevents the correlation from increasing further. The 0% and 90% evaporated gasoline samples exhibited weak correlation to the 0% evaporated gasoline standard, whereas the 50% evaporated sample was weakly to moderately correlated to the same standard. The weak correlations are due to the addition of compounds from the surface treatment that are not present in the standards. In addition, the 0% evaporated gasoline sample chromatograms exhibit large variation in the abundance of toluene and the C2alkylbenzenes so that the widths of these compounds vary across the chromatograms. The difference in peak widths between the chromatograms of the samples and the standards lead to a decrease in the extent of correlation. This, along with a significant increase in abundances, and ! -,! change in peak width, of the C4-alkylbenzenes in comparison to the standard, explains the low coefficients calculated for 90% evaporated gasoline. A higher correlation was observed between gasoline samples and kerosene standards than between kerosene samples and gasoline standards. The addition of the n-alkanes from the surface treatment to the gasoline samples makes these samples more similar in composition to the kerosene standards, resulting in a slightly higher correlation. 3.4 Summary The addition of compounds from the surface treatment can greatly complicate the visual assessment of a chromatogram from fire debris. This is especially true of the surface-treated wood investigated in this study because the treatment contains n-alkanes (C9-C12), which are also present in kerosene. Principal components analysis can be used to provide a more objective assessment of a chromatogram from fire debris. This statistical procedure can be used to associate simulated debris samples to their respective standard by type of ignitable liquid despite evaporation, matrix interferences, and thermal degradation of the liquid and matrix. The debris samples, however, could not be accurately associated to their respective standards in terms of evaporation level. This was due, primarily, to differences in abundances of compounds for which normalization procedures could not account. Pearson product moment correlation coefficients can be used in conjunction with PCA. The coefficients could only provide a weak to moderate correlation for two of the three data sets, including the simulated fire debris samples. As a result, the coefficients do not provide a numerical value of the association between samples and their respective standards, as was ! --! intended, but instead, can be used to associate replicates at each evaporation level to one another in order to minimize the effects of spread within the scores plot. ! %..! REFERENCES ! %.%! REFERENCES 1. ASTM International, ASTM E 1412-07. Annual Book of ASTM Standards. ! %.&! Chapter 4: Classification of Ignitable Liquid Standards using Soft Independent Modeling of Class Analogy 4.1 Introduction According to a report by the National Academy of Sciences, the forensic sciences are in 1 dire need of ways to assess the accuracy and significance of analysis results . This is especially true for fire debris analysis, which consists of a subjective visual assessment of chromatograms from fire debris to identify the presence of an ignitable liquid. One statistical procedure that can potentially be used to link fire debris back to the ignitable liquid used to generate it, in a more objective manner, is soft independent modeling of class analogy (SIMCA). Since the application of SIMCA to fire debris data is a relatively new concept, the investigation performed in this chapter was simplistic and aimed at classifying ignitable liquid standards based on their chemical compositions as a proof-of-concept study. This supervised procedure provides a more objective approach to association because it can identify the class to which unknown samples are likely to belong, based on statistically meaningful class membership limits. Additionally, classifications at various significance levels are calculated to indicate the probability that an unknown sample belongs to the class to which it was assigned. A set of six ignitable liquids was generated by diluting each liquid in methylene chloride and analyzing it by gas chromatography-mass spectrometry (GC-MS). Each diluted liquid was analyzed in replicate, resulting in 15 chromatograms per liquid and 90 chromatograms in total. The ignitable liquids used, which span five ASTM International classes, were insect repellent 2 spray, gasoline, paint thinner, fuel stabilizer, fuel injector cleaner, and diesel . ! %.'! Principal components analysis (PCA) was applied to the total ion chromatograms (TICs) of the entire set of ignitable liquid standards to assess the natural grouping of the liquids. Next, the data were subjected to SIMCA. To do this, the TICs were split into a training and test set. The training set was comprised of 72 chromatograms (12 chromatograms per liquid) and the remaining chromatograms formed the test set. The TICs for each liquid within the training set were subjected to PCA by liquid type to generate models that described the chemical composition of each liquid. Then, the models were used to classify the ignitable liquids in the test set according to their chemical compositions. Soft independent modeling of class analogy was also applied to selected extracted ion chromatograms (EICs) to investigate whether improvements in classification were possible. 4.2 Materials and Methods 4.2.1 Liquid Standards The six ignitable liquids used for this research were purchased from stores in the Lansing, MI area. Each was diluted, by volume, in methylene chloride (J.T. Baker, Phillipsburg, NJ), as follows: insect repellent spray, 1:1600; gasoline, 1:200; paint thinner, 1:350; fuel stabilizer, 1:150; fuel injector cleaner, 1:100; and diesel, 1:50. The diluted liquids were directly injected and analyzed by GC-MS. ! %.(! 4.2.2 Analysis of Standards by GC-MS All liquids were analyzed using an Agilent 6890N gas chromatograph, coupled to an Agilent 5975C mass spectrometer, and equipped with an Agilent 7683B autosampler (Agilent Technologies, Palo Alto, CA). The GC contained an Agilent HP-5MS capillary column (30 m x 0.25 mm I.D. x 0.25 µm film thickness). The carrier gas used was ultra-high purity helium (Airgas, East Lansing, MI), at a nominal flow rate of 1mL/min. One µL of each liquid was injected using the pulsed, splitless mode, with a pressure of 15 psi for 0.25 minutes. The inlet was maintained at 250 °C. The GC oven temperature program was as follows: 40 °C for 3 min, 10 °C/min to 280 °C, hold for 4 min. The transfer line was maintained at 280 °C and the mass spectrometer was operated in electron ionization mode (70eV). Full mass scan mode was used, scanning the range 50 to 550 amu, with a scan rate of 2.91 scans/s. Each liquid was analyzed in replicate (n=15) and TICs were generated. Additionally, EICs for m/z 83, 91, 99, and 128 were generated from the TICs using the ChemStation © Enhanced Data Analysis Software (version E.01.01.335, Agilent Technologies). 4.2.3 Data Pretreatment Total ion chromatograms and EICs of the six ignitable liquids were treated as separate data sets. Data pretreatment was performed in a similar manner on each data set, separately. © Firstly, the Savitsky-Golay smooth was performed in the ChemStation Enhanced Data Analysis Software. Next, each data set was subjected to a total area normalization procedure, which was performed in Microsoft Excel (version 12.0.6425.1000, Microsoft Corp., Redmond, WA). For a specific ignitable liquid, the total area of each chromatogram (n=15) across all retention times ! %.)! was calculated and then the average area of all 15 chromatograms was calculated. The abundance at each retention time was divided by the total area of the chromatogram and then multiplied by the corresponding average. This process was repeated for each ignitable liquid. 4.2.4 Principal Components Analysis Principal components analysis was performed on the ignitable liquid TICs (n=90) using Unscrambler X (version 10.2, Camo, Inc., Woodbridge, NJ). The scores plots were used to visually assess the natural groupings of the ignitable liquid standards. The loadings plots were used to explain the positioning of the samples in the scores plots. The EICs were also subjected to PCA and assessed in a similar manner. 4.2.5 Soft Independent Modeling of Class Analogy Soft independent modeling of class analogy was applied to the TICs using Unscrambler X. Each data set consisted of 90 chromatograms (n=15 for six liquids), which were further divided into training and test sets. The training set consisted of 12 of the 15 replicate chromatograms from each ignitable liquid, while the remaining chromatograms formed the test set. Chromatograms of liquids in the training set were separately subjected to PCA, by liquid type, to generate six distinct models. The PCA models, which consist of loadings and scores plots, identify the compounds that describe each ignitable liquid. The PCA models were validated using a full validation procedure in the software. In this procedure, one chromatogram was removed from the training set, a new model was generated and a new score of the TIC that was removed was calculated using the new model to assess how well the training sample fit the model. This was repeated for all TICs in the training set. Next, the test samples were classified ! %.*! by projecting the test set TICs onto each model. The probability of each TIC in the test set belonging to each of the modeled ignitable liquid groups was determined. The classifications were investigated at a 0.1%, 1%, 5%, 10%, and 25% significance level. Later, the EICs were subjected to SIMCA in a similar manner. 4.3 Results and Discussion 4.3.1 Characterization of Ignitable Liquid Standards Exemplar TICs of each ignitable liquid are shown in Figure 4.1. Classified as a member of the aromatic class, the insect repellant contains substituted aromatics such as C3alkylbenzenes, as well as malathion (Figure 4.1A). The gasoline fuel, classified as gasoline, contains branched and cyclic alkanes such as the C2-, C3-, and C4-alkylbenzenes, as well as methylnaphthalenes (Figure 4.1B). The paint thinner contains mostly branched alkanes in the C9C12 range (Figure 4.1C) and is classified as isoparaffinic. The fuel stabilizer is a member of the naphthenic paraffinic class due to the presence of branched and cyclic alkanes (Figure 4.1D). The fuel injector cleaner is classified as a heavy petroleum distillate due to the presence of nalkanes C9-C15, as well as substituted aromatics (Figure 4.1E). The diesel fuel is classified as a heavy petroleum distillate, due to the presence of n-alkanes in the range C10-C19 and some aromatic compounds (Figure 4.1F). ! %.+! ! ! A Malathion C3-Alkylbenzenes B Abundance C2-Alkylbenzenes Toluene C3-Alkylbenzenes Methylnaphthalenes C4-Alkylbenzenes %.,! C 2,2,6-Trimethyloctane 2,2,8-Trimethyldecane 3-Methyl-5-propylnonane 3 Retention Time (min) Figure 4.1: Total ion chromatograms of A) insect repellent, B) gasoline, and C) paint thinner, D) fuel stabilizer, E) fuel injector cleaner, and F) diesel with selected peaks labeled. 108 31 ! Figure 4.1 (continued) D Abundance 2,6-Dimethylundecane C10 E C11 C 12 C13 C9 C14 %.-! C15 C 16 C11 C12 C13 C14 C15 C 16 F C17 C10 C9 Retention Time (min) 3 Figure 4.1 (continued) 109 C18 C19 C20 C21 C23 C22 C24 31 4.3.2 Principal Components Analysis of the Entire TIC Data set Prior to SIMCA, principal components analysis was performed on the full data set to assess natural groupings of the liquids. A combined total of approximately 83% of the variance within the data set is described by the first and second principal components (PC1 and PC2, respectively) in the scores plot (Figure 4.2). Replicate TICs of each ignitable liquid were clustered, resulting in the six expected groups according to liquid type. Both principal components were necessary to fully differentiate the ignitable liquids from one another. The diesel and fuel injector samples are located positively on PC1. The positioning of these samples can be explained by the loadings plot for PC1 (Figure 4.3). The plot shows that all n-alkanes (C9-C24) and many of the branched alkanes are weighted positively on PC1. n-Alkanes are present in both diesel and fuel injector cleaner explaining why both are positively positioned on PC1 in the scores plot. Diesel contains more n-alkanes, specifically C9-C24, while fuel injector cleaner contains fewer n-alkanes, specifically C9-C16, thus explaining why diesel is the most positively positioned of the two liquids. Branched and cyclic alkanes, which are weighted positively in the PC1 loadings plot, are present in fuel stabilizer; however, this liquid is negatively positioned on PC1 in the scores plot. When PCA is performed, there is a mean-centering step in which the average abundance at each retention time is calculated across all chromatograms in the data set and then the average is subtracted from each individual chromatogram. The aromatic and n-alkanes that are present in high abundance in the diesel and fuel injector result in a large average abundance for the corresponding retention times. Consequently, when the averages were subtracted from the fuel ! %%.! Principal Component 2 (16%) 3E6 -8E6 8E6 -3E6 Principal Component 1 (67%) Figure 4.2: Scores plot of PC1 versus PC2 based on the total ion chromatograms of the ignitable liquid standards training and test sets: insect repellent (green), gasoline (orange), paint thinner (yellow), diesel (blue), fuel injector cleaner (black), and fuel stabilizer (red). ! %%%! 0.12 C12 C13 C14C15 C16 C17 C18 C19 C20 C21 C22 C24 C23 Principal Component 1 C11 C10 C9 Malathion Toluene C3-Alkylbenzenes C2-Alkylbenzenes -0.12 3 Retention Time (min) 31 Figure 4.3: Loadings plot of PC1 based on the total ion chromatograms of the ignitable liquid standards (training and test sets). ! %%&! stabilizer chromatograms, negative contributions from the aromatics and n-alkanes were introduced into the mean-centered fuel stabilizer data. The negative contributions of the meancentered data, multiplied by the positive weighting in the PC1 loadings, results in the negative positioning of the fuel stabilizer samples in the scores plot on PC1. Consequently, fuel stabilizer is negatively positioned in the scores plot even though it predominantly contains compounds that are positively weighted in the loadings plot. Also positioned negatively on PC1 in the scores plot are gasoline, insect repellent, and paint thinner. The positioning of these liquids can also be explained by the loadings plot for PC1. Toluene, as well as some C2- and C3-alkylbenzenes and malathion, are negatively weighted in the plot. Many of these compounds are present in gasoline, explaining this liquid’s negative positioning on PC1 in the scores plot. Some of the C3-alkylbenzenes and malathion, are also present in insect repellent, thus explaining its negative location in the scores plot. Paint thinner, on the other hand, is positioned negatively on PC1 due to the presence of some substituted alkanes in the C9-C12 range that are negatively weighted in the loadings plot in addition to the negative contributions from the previously mentioned mean-centering of the data. The positioning of the standards on PC2 in the scores plot can be explained, in a similar manner, by the loadings plot for PC2 (Figure 4.4). Diesel is positively positioned on PC2 in the scores plot because it contains higher abundance of the positively weighted n-alkanes (C14-C24) in the loadings plot than the negatively weighted n-alkanes (C9-C13). Insect repellent and gasoline are positively positioned on PC2 in the scores plot because toluene and malathion, as well as the C2- and C3-alkylbenzenes, all of which are contained in one or both of the liquids, are ! %%'! 0.13 Malathion C15 Principal Component 2 C2-Alkylbenzenes C16 Toluene C18 C20 C17 C19 C14 C22 C21 C23 C24 C13 C9 C10 C11 C12 -0.13 3 Retention Time (min) 31 Figure 4.4: Loadings plot of PC2 based on the total ion chromatograms of the ignitable liquid standards (training and test sets). ! %%(! positively weighted on PC2. Paint thinner is also positively positioned on PC2 in the scores plot because the major compounds present in the TIC are weighted positively in the loadings plot for PC2. Fuel stabilizer and fuel injector cleaner, on the other hand, are negatively positioned on PC2 in the scores plot. The branched and cyclic alkanes contained in the fuel stabilizer are all weighted negatively on the loadings plot for PC2. Fuel injector contains compounds that are both positively (C14-C16) and negatively (C9-C13) weighted on PC2; however, more of the compounds are weighted negatively, thus explaining the overall negative positioning of the samples on the scores plot on PC2. 4.3.3 Classification of Ignitable Liquid Standard TICs Using SIMCA Principal components analysis was performed first on the entire set of TICs (training and test samples) not only to assess the natural groupings of ignitable liquid chromatograms, but also to determine the number of PCs necessary to distinguish between the different liquid types. Because the overall PCA scores plot demonstrated that differentiation of ignitable liquid types was possible using 2 PCs, SIMCA was performed using only PC1 and PC2. This does not match the recommended number of PCs to use for SIMCA that was suggested by the software program (Table 4.1). As a result, SIMCA was performed using 2 PCs, as well as the recommended number of PCs. However, since classification of the test set was unaffected by the number of PCs used in SIMCA, only the results using 2 PCs are discussed below. ! %%)! Table 4.1. The suggested number of principal components for soft independent modeling of class analogy on total ion chromatograms. Ignitable Liquid Fuel Stabilizer 5 Gasoline 3 Paint Thinner 7 Insect Repellent 4 Diesel 2 Fuel Injector ! Suggested PCs 5 %%*! The first step in SIMCA is to generate models that will be used for sample classification. To do this, PCA was performed on the TICs of liquids in the test set, by liquid type, thus generating a total of six models (one for each liquid). Using two PCs in each of the models, all TICs in the test set were correctly classified according to liquid type between significance levels of 0.1% and 10%. However, at the 25% significance level, one gasoline replicate was left unclassified to any model while all other test liquids were correctly classified (Table 4.2). The significance level, as calculated in the computer software, is a p-value, which indicates the likelihood that a sample was classified to a model by chance. Since smaller p-values and, consequently, smaller significance levels indicate that the classification of a sample is less likely to have occurred by chance, smaller significance levels (particularly less than 5%) are 3 considered to be more statistically significant . Later in the chapter, the reasoning for the replicate not being classified is discussed; however, since the larger significant levels are considered to be less statistically significant, the lack of classification of a gasoline replicate at 25% is not of great consequence. In the initial PCA scores plot of all liquids (Figure 4.2), differentiation according to type was possible using only two PCs. As a result, correct classification of the test samples using SIMCA was expected at all significance levels. To further investigate the unclassified gasoline replicate in the test set at the 25% significance level, Coomans’ plots and plots of sample-tomodel distance versus leverage were assessed. 4.3.3.1 Coomans’ plots Coomans’ plots are plots of the sample-to-model distance for two models. The sample-tomodel distance describes how far the PCA score of a test sample lies from a model after the ! %%+! ! Table 4.2. Classification Table of Ignitable Liquid TICs at 10% Significance Level. Fuel Stabilizer Fuel Stabilizer 1 Insect Repellent Diesel Fuel Injector * Fuel Stabilizer 3 Paint Thinner * Fuel Stabilizer 2 Gasoline * Gasoline 1 * Gasoline 2 * Gasoline 3 * %%,! Paint Thinner 1 * Paint Thinner 2 * Paint Thinner 3 * Insect Repellent 1 * Insect Repellent 2 * Insect Repellent 3 * Diesel 1 * Diesel 2 * Diesel 3 * Fuel Injector 1 * Fuel Injector 2 * Fuel Injector 3 * 118 model is used to calculate a score for the test sample. Specifically, the sample-to-model distance is the square root of the residual distance from the score of the projected sample with respect to 4 the principal components used to describe the model . An equation describing how the sampleto-model distance is calculated is located in the SIMCA Theory section of this thesis (Equation 2.3). The Coomans’ plot visually demonstrates how and why test samples are likely to be classified. As an example, a Coomans’ plot comparing the models generated for gasoline and insect repellant, at a 10% significance level, is shown in Figure 4.5. In the plot, the sample-tomodel distance for gasoline is on the ordinate while the sample-to-model distance for insect repellent is on the abscissa. The sample-to-model distance for all TICs in the test set is determined for both models and then plotted. Class membership limits, which describe the maximum distance the score of a sample can be from a model and still be classified as that liquid, are overlaid on the Coomans’ plot. Test samples likely to be classified to a model are positioned between zero and the class membership limit for that model. The class membership limits can differ for each model; in the plot for the gasoline and insect repellant models, class membership limits are approximately 780 and 878, respectively. Test samples likely to be classified to one model only will fall within membership limits of that model and outside of the membership limits of the other model. For example, replicates of the gasoline in the test set are plotted on the abscissa within the class membership limits for that liquid; hence, these replicates are classified as gasoline (Figure 4.6). On the ordinate, however, these replicates are positioned outside the class membership limits for the insect repellent model, thus indicating that the gasoline replicates in the test set are not classified as insect repellent. A similar explanation can be used to describe why replicates of insect repellent in the test set are classified as insect repellent and not as gasoline. ! %%-! Sample Distance to Model Insect Repellent (10%) 1.6E5 0 0 Sample Distance to Model Gasoline (10%) 1.6E5 Figure 4.5: Coomans’ plot for the gasoline and insect repellent models (at a 10% significance level) based on the total ion chromatograms of the training sets. The sample-to-model distances are plotted for each of the ignitable liquids in the test set: insect repellent (green), gasoline (orange), paint thinner (yellow), diesel (blue), fuel injector cleaner(black), and fuel stabilizer (red). The class membership limit for the gasoline model is overlaid on the plot in orange while the limit for the insect repellent model is in green. ! %&.! Sample Distance to Model Insect Repellent (10%) 4E4 0 0 Sample Distance to Model Gas (10%) 1E3 Figure 4.6: Coomans’ plot (at 10% significance level) for the gasoline and insect repellent models based on the total ion chromatograms of the training sets. The sample-to-model distances are plotted for the gasoline test samples (orange). The class membership limit for the gasoline model is overlaid on the plot in orange while the limit for the insect repellent model is in green. ! %&%! In addition to illustrating likely classifications, the positioning of test samples within the Coomans’ plot also indicates how well the two models being compared are discriminated from one another. When performing SIMCA, all of the models should be well discriminated to minimize the possibility of incorrect classification of the test samples. The models are considered poorly discriminated if any test samples fall within the area between the origin and where the two class membership limits intersect because this indicates that the test samples could be classified to either of the two models. This is not the case in the Coomans’ plot for the gasoline and insect repellant; no test samples fall within this area, indicating that these two models are well discriminated at the 10% significance level. 4.3.3.2 Sample-to-Model Distance Versus Leverage Plots The Coomans’ plot cannot alone be used to determine the classification of test samples. Classification is determined using a combination of two variables for each test sample: the sample-to-model distance and the leverage. The leverage is the distance calculated between the projected score of a test sample and the mean score of the training samples used to generate the 3 model . The equation describing specifically how leverage is calculated is located in the SIMCA Theory section of this thesis (Equation 2.4). Essentially, leverage is a measure of the variation between the test sample and the model. A sample can only be classified to a model if both the sample-to-model distance and leverage fall within the class membership limits for the model. A sample-to-model distance versus leverage plot can be generated for any model to describe why samples are or are not classified to that model. Unlike the Coomans’ plot, this plot cannot be used to directly compare models, but is instead used to understand classification of the test samples for an individual model. An example of a sample-to-model distance versus leverage ! %&&! plot for the gasoline model at a 10% significance level is shown in Figure 4.7, in which the model leverage is on the abscissa while sample-to-model distance is on the ordinate. The sampleto-model distance and leverage for all of the test samples, with respect to the gasoline model, are plotted along with the class membership limits. Samples that are positioned near the origin where these membership limits overlap are within the corresponding class membership limits and will be classified as the liquid type represented by the model. In this example, all of the gasoline test samples fall within the class membership limits for both distance and leverage, indicating that these test samples are classified as gasoline at the 10% significance level (Figure 4.8). No other test samples fall within these limits, indicating that no other samples will be incorrectly classified as gasoline. 4.3.3.3 The Unclassified Gasoline Sample Coomans’ plots and sample-to-model distance versus leverage plots were used to investigate the unclassified gasoline replicate in the test set at the 25% significance level. A sample can be classified at the 10% significance level, but not at 25% because a change in the significance level translates to a change in class membership limits. The significance level calculated in the software is a P-value and is used to draw general conclusions about a larger population from a small experimentally-collected sample population. P-values are used in SIMCA to calculate the percent probability that the classification of a sample to a model occurred by chance. For example, in terms of class membership limits, classifications assessed using a P-value of 0.025 indicate that, in a larger population, 25% of the samples would have sample-to-model distances greater than the calculated membership limit and would be misclassified. Therefore, in the same larger population, 75% of the samples would ! %&'! Sample Distance to Model Gasoline (10%) 1.6E5 0 0 Leverage Gasoline (10%) 180 Figure 4.7: Sample-to-model distance versus leverage plot for the gasoline model (at a 10% significance level) based on the total ion chromatograms of the gasoline training set. The sample-to-model distances and leverage are plotted for each of the ignitable liquids in the test set: insect repellent (green), gasoline (orange), paint thinner (yellow), diesel (blue), fuel injector cleaner (black), and fuel stabilizer (red). The class membership limits of both sample-to-model distance and leverage for the gasoline model is overlaid on the plot in orange. ! %&(! Sample Distance to Model Gasoline (10%) 1E3 0 0 Leverage Gasoline (10%) 1 Figure 4.8: Sample-to-model distance versus leverage plot for the gasoline model (at a 10% significance level) based on the total ion chromatograms of the gasoline training set. The sample-to-model distances and leverage are plotted for gasoline test samples (orange). The class membership limits of both sample-to-model distance and leverage for the gasoline model is overlaid on the plot in orange. ! %&)! have sample-to-model distances less than the class membership limit and would be classified appropriately. A P-value of 0.025 is the highest used for classifications in the software; the next highest is 0.01. As the P-value decreases, the probability that a sample was classified by chance decreases. In order to decrease the likelihood of classification by chance to 10% (P-value=0.01), the class membership limits need to be made larger so that 90% of the test samples have sampleto-model distances less than the membership limits. The change in class membership limits corresponding to the change in significance level from 10% to 25% is reflected in both the Coomans’ plot (Figure 4.9) and sample-to-model distance versus leverage plots (Figure 4.10). The Coomans’ plot for the gasoline and insect repellent spray at a 10% significance level was previously discussed. At the 10% significance level, the class membership limit for the gasoline model is 780 while, at the 25% significance level, the limit is 530 (Figure 4.9). The sample-to-model distance for the unclassified gasoline replicate in the test set was approximately 677. In this case, the sample-to-model distance was outside the membership limits at the largest significance level, meaning that the sample was not classified as gasoline at the 25% level. In addition, this sample was outside the membership limits of all other models; as a result, this particular gasoline replicate was not classified as belonging to any of the previously defined liquid classes using SIMCA. The sample-to-model distance versus leverage plot for gasoline (Figure 4.10) illustrates the same decrease in class membership limits for the sample-to-model distance, as the significance level increases. This plot indicates that the gasoline test sample only falls outside membership limits for the sample-to-model distance, not for leverage. This is because the membership limit for leverage is a fixed value across all significance levels and is calculated ! %&*! Sample Distance to Model Insect Repellent (25%) 4E4 0 0 Sample Distance to Model Gasoline (25%) 1E3 Figure 4.9: Coomans’ plot (at 25% significance level) for the gasoline and insect repellent models based on the total ion chromatograms of the training sets. The sample-to-model distances are plotted for the gasoline test samples (orange). The class membership limits for the gasoline model is overlaid on the plot in orange while the limit for the insect repellent model is in green. ! %&+! Sample Distance to Model Gasoline (25%) 1E3 0 0 Leverage Gasoline (25%) 1 Figure 4.10: Sample-to-model distance versus leverage plot for the gasoline model (at a 25% significance level) based on the total ion chromatograms of the gasoline training set. The sample-to-model distances and leverage are plotted for gasoline test samples (orange). The class membership limits of both sample-to-model distance and leverage for the gasoline model is overlaid on the plot in orange. ! %&,! 3 from the number of components and training samples used to make the models . As a result, only the sample-to-model distance prevents correct classification of this replicate. Since this is proof-of-concept work to investigate the potential of SIMCA for ignitable liquid classification, the data set was intentionally generated to minimize variation between replicate TICs of individual ignitable liquids. The lack of variation in the TICs of each ignitable liquid resulted in models developed that poorly describe the liquids. This is especially true of the gasoline model and is likely the reason that the gasoline replicate in the test set was not classified. This is illustrated by the PCA loadings plot for the gasoline model (Figure 4.11). The majority of the peaks in the loadings plots for both PC1 and PC2 are derivative-shaped, which are the result of trivial differences in peak shape among the replicate TICs. Due to the high degree of similarity among the TICs for the gasoline samples, PCA identified the trivial difference in peak shape, likely a result of instrument variation, as a major source of variance (i.e., non-chemical variance). As a result, the PCA model for the gasoline class was, unintentionally, built on insignificant variation that occurred due to instrument variations during analysis, rather than chemical differences among samples. The reason that one gasoline sample remained unclassified at the largest significance level is due to those natural and chemically insignificant variations. In the future, classification could be improved by introducing additional samples of different gasoline brands into the training set. This would ensure that the PCA model would be built on chemically meaningful differences between gasoline TICs. As mentioned earlier, all of the ignitable liquid models generated describe the chemically insignificant variations that were illustrated by the gasoline model; however, only one gasoline replicate was left unclassified. The loadings plots for gasoline already established that the model is built on chemically insignificant variations between TIC replicates. The modeling power of the ! %&-! Principal Component 1 0.3 -0.3 3 Retention Time (min) Figure 4.11: Loadings plot of PC1 of the gasoline model based on the total ion chromatograms of the gasoline standards. ! %'.! 31 gasoline model can be used in conjunction with the loadings plots to demonstrate that the gasoline model is more strongly affected by these trivial fluctuations than the other liquid models (Figure 4.12). The modeling power highlights the influence that each variable has on the model. 3 An influence above 0.3 is considered significant to the model . The modeling power of gasoline shows many peaks ranging from approximately 3 to 17 minutes that significant impact the model. In reality, the plot of the modeling power shows more peaks impacting the gasoline model than the number of peaks in the TIC of gasoline replicates. Consequently, some of the peaks significantly impacting the model represent the chemically insignificant variations that correspond to the noise from the baseline. The modeling power for the gasoline model can be contrasted to that representing insect repellent (Figure 4.13). Less of the peaks significantly impacting the insect repellent model are a result of the chemically insignificant variations from the baseline that were seen in the gasoline model. This is highlighted in the modeling power for insect repellent by the fact that the peaks impacting the model correspond, by retention time, to peaks in the TIC of insect repellent. It should also be noted that the loadings plots and modeling power for all of the ignitable liquid models incorporate an additional source of nonchemical variation. A rise in baseline at the end of the all ignitable liquid TICs is identified as variance in the models and is described in the loadings plots (Figure 4.14). The rise in baseline occurs as a consequence of the column being heated to high temperatures at the end of the temperature program. It occurs in every TIC and affects how the models are described, but it is inherent to the analysis process and not the chemical makeup of the sample. And, because the samples are so similar, it is being identified by PCA as a major source of variance. ! %'%! Modeling Power (Gasoline) 1 0 3 Retention Time (min) 31 Figure 4.12: Modeling power for the gasoline model based on the total ion chromatograms of the gasoline training samples. The red line represents modeling power of 0.3. Peaks that extend above this line significantly impact the model. ! %'&! 1 Malathion Raise in baseline Modeling Power (Insect Repellent) C3-Alkylbenzenes 0 3 Retention Time (min) 31 Figure 4.13: Modeling power for the insect repellent model based on the total ion chromatograms of the insect repellent training samples. The red line represents modeling power of 0.3. Peaks that extend above this line significantly impact the model. ! %''! Abundance 7E5 0 3 Retention Time (min) Figure 4.14: A total ion chromatogram of insect repellent demonstrating the rise in baseline that occurs at the end of the chromatogram. ! %'(! 31 The effect of the rise in baseline is further reflected in a plot of the modeling power versus the variable for each individual model. As the modeling power plot for the insect repellent model shows, variations associated with the C3-alkylbenzenes and malathion peaks significantly influence the model (Figure 4.13). Unfortunately, according to the modeling power, the rise in baseline or nonchemical variation influences the model as much as that associated with the actual peaks in the insect repellent. To circumvent this problem in the future, it may be necessary to truncate the TICs before SIMCA is performed to reduce the negative effects of the rise; however, truncating the chromatogram may not be a plausible solution because compounds may be detected in the rise of the baseline in the chromatograms of ignitable liquids investigated in the future. 4.3.4 Classification of Ignitable Liquid Standard EICs Using SIMCA While SIMCA was shown to successfully classify the test samples using TICs up to the 10% significance level, classification using EICs was also investigated across the same significance levels. Extracted ion chromatograms can provide many benefits over TICs including improved sensitivity, as well as reducing the negative effects of matrix interference compounds that do not contain the ion extracted. The ions used to generate the EICs for each ignitable liquid were selected because they represent different classes of compounds and are commonly used in forensic laboratories for EICs or as part of extracted ion profiles. In addition, the selected ions were present in similar abundances in the ignitable liquids used in this research. Extracted ion chromatograms for ions m/z 99, 91, 83, and 128 were generated separately from the TICs of the ignitable liquid standards in both the training and test sets. Each EIC was ! %')! treated as a separated data set. Principal components analysis was performed on the EIC training and test samples to assess the natural groupings of the ignitable liquids. Lastly, SIMCA was performed to classify the EIC test set to the ignitable liquid models generated from the EIC training set. 4.3.4.1 Alkane EIC, m/z 99 Using extracted ion chromatograms of m/z 99, which is representative of the alkane compound class, all six ignitable liquids were differentiated in the PCA scores plot (Figure 4.15). The first two PCs account for approximately 75% of the total variance in the EIC data set. As for the TICs, the plot indicated that classification by SIMCA should be possible using 2 PCs for each ignitable liquid model. The number of PCs suggested from visual assessment of the overall scores plot differs from the number of PCs recommended by the computer software (Table 4.3). As a result, SIMCA was performed using 2 PCs and using the recommended number. Using 2 PCs for each model, correct classification of the test samples by SIMCA was possible at all significance levels investigated. The correct classification of test samples includes the gasoline replicate that was previously unclassified using SIMCA on TICs. To explain why the classification of gasoline replicates was successful at a 25% significance level using EICs as opposed to the TICs, it is necessary to investigate the modeling power of the gasoline model based on EICs (Figure 4.16). By using EICs to generate the models, the problem of the rise in the baseline, which contributed significantly to the TIC models, was greatly reduced. Additionally, in terms of the gasoline model specifically, the problem with the variation in the baseline being detected early in the chromatogram, before the rise at the end, was also reduced. The peaks in the modeling power plot for the gasoline EIC model more accurately correspond to the peaks in the ! %'*! Principal Component 2 (16%) 1E5 -2E5 2E5 -1E5 Principal Component 1 (59%) Figure 4.15: Scores plot of PC1 versus PC2 based on the extracted ion chromatograms (m/z 99) of the ignitable liquid standards training and test sets: insect repellent (green), gasoline (orange), paint thinner (yellow), diesel (blue), fuel injector (black), and fuel stabilizer (red). ! %'+! Table 4.3. The suggested number of principal components for soft independent modeling of class analogy on extracted ion chromatograms (m/z 99). Ignitable Liquid Fuel Stabilizer 1 Gasoline 4 Paint Thinner 3 Insect Repellent 6 Diesel 2 Fuel Injector ! Suggested PCs 2 %',! Modeling Power (Gasoline) 1 0 3 Retention Time (min) 31 Figure 4.16: Modeling power for the gasoline model based on the extracted ion chromatograms (m/z 99) of the gasoline training samples. The red line represents modeling power of 0.3. Peaks that extend above this line significantly impact the model. ! %'-! EICs of gasoline. As a result, even though the gasoline model still poorly characterizes gasoline, the trivial variations in the baseline do not significantly affect the model. Using the recommended number of PCs for each model, all test samples were classified as fuel stabilizer. Paint thinner, insect repellent, and diesel were also correctly classified to their corresponding liquid models. The misclassifications that resulted are likely a product of using too many PCs to describe the liquid models. Generally, only the first few PCs describe chemically significant variation in the data set. Beyond the first PCs, much of the variation described is considered chemically insignificant. This insignificant variation has many different sources such as random instrumental fluctuations that occur as a result of the method of analysis. These incorrect classifications highlight the need to carefully choose the number of PCs to use when performing SIMCA. Scores plots can be used to determine the number of PCs that should be used for SIMCA. For example, using the first two PCs, the overall scores plot of the entire EIC data set exhibits six well-clustered groupings of samples corresponding to the six different ignitable liquids. This trend is maintained when a two-dimensional scores plot is generated using any two of the first five principal components to generate the scores plot. The tight clustering of samples indicates that the PCs used to generate the scores plot describe chemically significant variation in the data set. If PC6 or higher is used to generate a scores plot, the tight clustering of replicates is no longer observed (Figure 4.17). This is especially obvious in the EICs of the diesel samples, which exhibit spread on the scores plot due to PC6 (and higher) describing the insignificant instrumental variation. Since the models are used for classification, it is necessary to examine the corresponding scores plots to determine the optimal number of PCs to use for each model. In the case of this research, it was not possible to determine the optimal number of PCs to use for each model by ! %(.! Principal Component 6 (<0%) 3E6 -8E6 8E6 -3E6 Principal Component 1 (67%) Figure 4.17: Scores plot of PC1 versus PC6 based on the extracted ion chromatograms (m/z 99) of the ignitable liquid standards training and test sets: insect repellent (green), gasoline (orange), paint thinner (yellow), diesel (blue), fuel injector cleaner (black), and fuel stabilizer (red). ! %(%! evaluating the corresponding two-dimensional scores plot. This is due to the fact that the replicate training samples used to generate each individual model were so similar that, when PCA was performed on the training samples, chemically insignificant variation in the data set was emphasized. As a result, the training samples were not well-clustered in the scores plot and the method of selecting the optimal number of PCs discussed above was not possible. The results of SIMCA demonstrate that using the correct number of PCs is essential for accurate classifications. If too few PCs are used, the chemically significant variation may not be described sufficiently to allow accurate classification among samples that contain similar compounds. If too many PCs are used, noise or insignificant chemical variation is accounted for in the model which can result in the misclassification of samples. 4.3.4.2 EICs: m/z 91, 83, and 128 Ions m/z 91 and 83 are representative of the aromatic and olefinic/cycloparrafinic compounds, respectively. Correct classification of the test samples was possible up to the 10% significance level for EICs of m/z 91 and 83 when only 2 PCs were used to model each liquid. In addition, correct classification of all samples occurred at all significance levels when the recommended number of PCs was used in the SIMCA models. Classification across all significance levels, when using the recommended number of PCs, is likely a result of PC1 and PC2 not accounting for enough of the chemically significant variation. As discussed earlier, it is difficult to determine the optimal number of PCs to use for classifying this data set by SIMCA due to the high similarity between replicates. For ion m/z 128, which represents polycyclic aromatic hydrocarbons, correct classification of the test samples by SIMCA was possible at all significance levels when 2 PCs ! %(&! were used to describe each liquid model. Classification using the recommended number of PCs, on the other hand, was only possible up to the 10% significance level. At 25%, one gasoline sample was not classified to any model. The gasoline sample was likely not classified at 25% using the recommended number of PCs because the recommended number for the gasoline model was five. As discussed previously, the higher PCs tend to describe chemically insignificant variation of models that already poorly describe the ignitable liquids for which they were generated; therefore, the lack of classification of one gasoline replicate is likely due to the model describing too much noise. 4.4 Summary The application of SIMCA, a supervised classification procedure, was used to successfully classify TICs of ignitable liquids in a test set to the corresponding liquid standards in a training set up to a 10% significance level regardless of the number of PCs used to make the models. At a significance level of 25%, the high similarity of the replicates within the data set used to create the models resulted in one test sample not being classified. Since the larger significant levels are considered to be less statistically significant, the successful classification of all test samples at the smaller significance levels outweighs the lack of classification of a gasoline replicate at the larger significance level. The use of EICs instead of TICs for SIMCA was demonstrated. Correct classification of all test samples resulted at all significance levels when using a combination of 2 PCs or the number recommended by the software. The classification of standards using TICs and EICs has highlighted the importance of selecting the optimal number of PCs with which to perform SIMCA. The optimal number of PCs ! %('! will describe only chemically significant variation within the models. As a result, using the optimal number of PCs for each model when performing SIMCA could potentially minimize the possibility of false positives. ! ! %((! REFERENCES ! %()! REFERENCES ! 1. Committee on Identifying the Needs of the Forensic Sciences Community, National Research Council. Strengthening Forensic Science in the United States: A Path Forward. Washington, D.C.: National Academies Press, 2009. 2. ASTM International, ASTM E 1412-07. Annual Book of ASTM Standards. 3. Unscrambler X SIMCA Theory Section of User Manual (version 10.2, Camo, Inc., Woodbridge, NJ) 4. Unscrambler X Methods Manual (version 10.2, Camo, Inc., Woodbridge, NJ) ! ! %(*! Chapter 5 Conclusions ! 5.1 Summary of Research ! 5.1.1 Research Objectives and Goals This research investigated the use of multivariate statistical procedures for the objective analysis of fire debris. Specifically, unsupervised statistical procedures such as principal components analysis (PCA) coupled with Pearson product moment correlation (PPMC) coefficients and supervised statistical procedures such as soft independent modeling of class analogy (SIMCA) were explored. While both types of procedures offer an objective approach for a currently subjective visual analysis of chromatographic fire debris data, they achieve this objectivity in two distinct manners. 5.1.2 Unsupervised Multivariate Statistics Study Summary /01!2345!36!70"8!879:;!<48!73!"=#187"2471!701!>371=7"45!36!9=89>1?#"81:!8747"87"@8A! 8>1@"6"@455;!BCD!4=:!BBEC!@3166"@"1=78A!63?!701!4883@"47"3=!36!8"F95471:!6"?1!:1G?"8!84F>518! 73!701"?!@3??18>3=:"=2!"2="74G51!5"H9":!874=:4?:8!"=!8>"71!36!1#4>3?47"3=A!F47?"$! "=71?61?1=@18A!4=:!701?F45!:12?4:47"3=I!J370!BCD!4=:!BBEC!@3166"@"1=78!<1?1!981:!73! 4=45;K1!"2="74G51!5"H9":!874=:4?:8!4=:!70?11!:474!8178A!<0"@0!"55987?471!701!1661@78!36! 1#4>3?47"3=A!F47?"$!"=71?61?1=@18A!4=:!701?F45!:12?4:47"3=!"=!4!>"1@1<"81!F4==1?I! D!89?64@1L7?1471:!<33:!F47?"$!<48!981:!73!"=#187"2471!701!1661@78!36!F47?"$! "=71?61?1=@18I!M33:!"8!@3FF3=5;!981:!"=!G9"5:"=2A!69?="80"=2A!4=:!:1@3?47"=2!87?9@79?18! 4=:A!"=!89@0!@4>4@"7"18A!"8!989455;!@3471:!<"70!4!89?64@1!7?147F1=7!63?!:9?4G"5"7;!48!<155!48! ! %(+! :1@3?47"#1!>9?>3818I!/01!@3F>39=:8!6?3F!701!89?64@1!7?147F1=78!4?1!>1?8"871=7!4=:!@4=! F"F"@!701!>14N!>4771?=8!36!"2="74G51!5"H9":8I!! ! O"?875;A!"2="74G51!5"H9":!874=:4?:8!<1?1!21=1?471:I!P=!3?:1?!73!:1F3=87?471!701! 1661@78!36!1#4>3?47"3=A!24835"=1!4=:!N1?381=1!<1?1!1#4>3?471:!73!70?11!:"661?1=7!51#158A!G;! #359F1!Q"=@59:"=2!.R!1#4>3?47"3=SI!T4@0!36!701!5"H9":8!<48!8>"N1:!81>4?4715;!3=73! U"FM">18I!/01!874=:4?:8!<1?1!1$7?4@71:!98"=2!701!>488"#1!014:8>4@1!>?3@1:9?1!<"70!4=! 4@7"#471:!@04?@345!87?">!4=:!4=45;K1:!G;!248!@0?3F4732?4>0;LF488!8>1@7?3F17?;!QVCLEWSI! /01!?18957"=2!@0?3F4732?4F8!<1?1!89GX1@71:!73!:474!>?17?147F1=7!>?3@1:9?18!73!F"="F"K1! 701!1661@78!36!4=;!=3=L@01F"@45!#4?"47"3=!7047!F4;!04#1!G11=!"=7?3:9@1:!:9?"=2!701! 1$7?4@7"3=!3?!4=45;8"8!>?3@1:9?1I!/01!>?17?147F1=78!"=@59:1:!4!W4#"7KN;LV354;!8F3370"=2! 4523?"70FA!4!?171=7"3=!7"F1!45"2=F1=7!98"=2!4!@3??1547"3=!3>7"F"K1:!<4?>"=2!4523?"70FA!4=:! =3?F45"K47"3=!G;!73745!4?14!63?!14@0!"2="74G51!5"H9":!1#4>3?47"3=!51#15I! ! D671?!:474!>?17?147F1=7!>?3@1:9?18A!701!@0?3F4732?4F8!<1?1!89GX1@71:!73!BCD!4=:! 701=!BBEC!@3166"@"1=78!<1?1!@45@95471:I!/0181!7<3!8747"87"@45!>?3@1:9?18!@4=!G1!7039207!36! 48!@3F>51F1=74?;I!B?"=@">45!@3F>3=1=78!4=45;8"8!":1=7"6"18!701!2?147187!839?@1!36!#4?"4=@1! <"70"=!4!:474!817!73!:"661?1=7"471!4=:A!@3=81H91=75;A!2?39>!8"F"54?!84F>518!G481:!3=!701"?! @01F"@45!@04?4@71?"87"@8I!B14?83=!>?3:9@7!F3F1=7!@3??1547"3=!@3166"@"1=78!3=!701!3701?! 04=:!488188!701!8"F"54?"7;!G17<11=!7<3!@0?3F4732?4F8!4=:!4?1!@45@95471:!G;!>1?63?F"=2!4! >3"=7LG;L>3"=7!@3F>4?"83=I!/01!?18957"=2!@3166"@"1=7!>?3#":18!4!=9F1?"@45!#4591!7047! :18@?"G18!701!1$71=7!36!701!8"F"54?"7;I!/01!@3FG"=47"3=!36!70181!>?3@1:9?18!>?3#":18!G370!4! #"8945!4=:!=9F1?"@45!F1703:!36!@3F>4?"=2!@0?3F4732?4>0"@!6"?1!:1G?"8!:474I! ! M01=!BCD!<48!4>>5"1:!73!701!"2="74G51!5"H9":!874=:4?:8A!701!874=:4?:8!@395:!G1! :"661?1=7"471:!6?3F!3=1!4=3701?!G;!5"H9":!7;>1!48!<155!48!1#4>3?47"3=!51#15!36!14@0!5"H9":I! ! %(,! D::"7"3=455;A!701!BBEC!@3166"@"1=78!@45@95471:!63?!?1>5"@4718!4@@3?:"=2!73!"2="74G51!5"H9":! 1#4>3?47"3=!51#15!:1F3=87?471:!7047!?1>5"@4718!@395:!G1!87?3=25;!@3??1547"3=!73!3=1! 4=3701?I!/0"8A!@39>51:!<"70!701!53?3@1:9?18!981:!"=!70"8!879:;!<1?1! >?1@"81I! ! Y1$7A!73!"=#187"2471!701!1661@78!36!F47?"$!"=71?61?1=@18!4=:!701?F45!:12?4:47"3=!36! 701!F47?"$!3=!701!4883@"47"3=!36!6"?1!:1G?"8!84F>518!73!?18>1@7"#1!874=:4?:8A!701!"2="74G51! 5"H9":!874=:4?:8!<1?1!8>"N1:!3=73!9=G9?=1:!4=:!G9?=1:!89?64@1L7?1471:!<33:A!<0"@0!<48! 701=!1$7?4@71:!4=:!4=45;K1:!G;!VCLEWI!!/01!5"H9":8!8>"N1:!3=73!701!9=G9?=1:!89?64@1L 7?1471:!<33:!:1F3=87?471:!701!1661@78!36!"=01?1=7!F47?"$!"=71?61?1=@18!3=!701!4883@"47"3=! 36!701!8"F95471:!6"?1!:1G?"8!84F>518!73!701"?!?18>1@7"#1!874=:4?:8I!/01!5"H9":8!8>"N1:!3=73! 701!G9?=1:!89?64@1L7?1471:!<33:!:1F3=87?471:!701!1661@78!36!701?F45!:12?4:47"3=!36!701! <33:!F47?"$I!Z4875;A!4!8"F95471:!6"?1!:1G?"8!:474!817!<48!21=1?471:!G;!8>"N"=2!14@0!36!701! "2="74G51!5"H9":8!81>4?4715;!3=73!89?64@1L7?1471:!<33:A!<0"@0!<48!89G81H91=75;!G9?=1:!63?! '.!81@3=:8I!D24"=!84F>518!<1?1!1$7?4@71:!4=:!4=45;K1:!G;!VCLEWI!/0"8!:474!817!74N18!"=73! 4@@39=7!1#4>3?47"3=A!F47?"$!"=71?61?1=@18A!4=:!701?F45!:12?4:47"3=!3=!701!4883@"47"3=!36! 701!84F>518!73!701"?!?18>1@7"#1!874=:4?:8I!! /01!@0?3F4732?4F8!6?3F!455!:474!8178!<1?1!89GX1@71:!73!:474!>?17?147F1=7! >?3@1:9?18I!Y1$7A!63?!14@0!:474!817A!701!8@3?18!36!701!84F>518!<1?1!@45@95471:!4=:! >?3X1@71:A!81>4?4715;A!3=73!701!3?"2"=45!8@3?18!>537!36!701!"2="74G51!5"H9":!874=:4?:8! ?18957"=2!"=!70?11!=15378I!!BBEC!@3166"@"1=78!<1?1!@45@95471:!63?!84F>51! ?1>5"@4718!"=!14@0!:474!817!48!<155!48!G17<11=!84F>518!4=:!701!9=1#4>3?471:!"2="74G51! 5"H9":!874=:4?:8I! ! %(-! ! [124?:5188!36!1#4>3?47"3=A!F47?"$!"=71?61?1=@18!3?!701?F45!:12?4:47"3=A!"=!455!70?11! :474!8178A!701!BCD!8@3?18!>537!@395:!G1!981:!73!:"661?1=7"471!701!84F>518!G;!"2="74G51!5"H9":! 7;>1!981:!73!21=1?471!701FI!\3<1#1?A!"=!=3=1!36!701!>5378!<1?1!701!84F>518!4G51!73!G1! #"89455;!4883@"471:!73!701"?!@3??18>3=:"=2!874=:4?:!"=!71?F8!36!1#4>3?47"3=!51#15I!P=!14@0! @481A!70"8!54@N!36!4883@"47"3=!<48!:91!73!701!@3=8":1?4G51!4F39=7!36!8>?14:!811=!"=!701! 1#4>3?47"3=!51#15!?1>5"@471!84F>518!48!<155!48!4!80"67!"=!>38"7"3="=2!"=!701!8@3?18!>537!36!701! 84F>518!<"70!?18>1@7!73!701!874=:4?:8I! /01!8>?14:!"=!701!8@3?18!>537!"8!5"N15;!:91!73!701!>3?398!=479?1!36!701!<33:!48! 3>>381:!73!701!#4?"4G"5"7;!36!701!G9?="=2!>?3@188!G1@4981!701!8>?14:!<48!3G81?#1:!"=!1#1=! 701!9=G9?=1:!84F>518I!/01!>3?38"7;!36!701!<33:!F4;!45534@7!701!166"@"1=@;!36!701!>488"#1!014:8>4@1!1$7?4@7"3=I!/0"8!"8! 69?701?!?1651@71:!"=!701!>33?1?!QG97!87"55!87?3=25;!@3??15471:S!BBEC!@3166"@"1=78!4F3=2! ?1>5"@4718!36!701!84F>518!63?!14@0!:474!817!<01=!@3F>4?1:!73!70381!@45@95471:!63?!?1>5"@4718! 36!701!874=:4?:8I! /01!80"678!"=!>38"7"3="=2!36!84F>51!?1>5"@4718!4<4;!6?3F!701!874=:4?:8A!<0"@0!<48! >?1#1=7"=2!@3??1@7!4883@"47"3=!73!874=:4?:8!G;!1#4>3?47"3=!51#15!"=!701!8@3?18!>5378!"8! F3875;!:91!73!:"661?1=@18!"=!4G9=:4=@18!36!@3F>39=:8!?18957"=2!6?3F!98"=2!:"661?1=7!8>"N1! #359F18!73!21=1?471!701!:474!8178I!/01!:"661?1=7!4G9=:4=@18!@395:!=37!G1!@3F>1=8471:!63?! <"70!@9??1=7!=3?F45"K47"3=!>?3@1:9?18I!! ! !T#1=!703920!701!84F>518!6?3F!14@0!:474!817!@395:!=37!G1!4883@"471:!73!701"?! ?18>1@7"#1!874=:4?:8!G;!1#4>3?47"3=!51#15!"=!701!8@3?18!>537A!70"8!>?3X1@7!89@@1886955;! :1F3=87?471:!701!>371=7"45!36!F3?1!3GX1@7"#1!F1703:!36!6"?1!:1G?"8!4=45;8"8!98"=2!BCD!4=:! BBEC!@3166"@"1=78I!P=!63?1=8"@!54G3?473?"18!4=45;878!4?1!=37!@3=@1?=1:!<"70!033?471:!701!"2="74G51!5"H9":!981:!73!@3FF"7!4?83=!"8I!/01!X3G!36!701!4=45;87!"8!73!":1=7"6;! 701!>?181=@1!36!4=!"2="74G51!5"H9":!4=:!701=!73!":1=7"6;!"78!@5488I!P=!8>"71!36!701!>?181=@1!36! 701!89?64@1L7?1471:!<33:!F47?"$A!701!84F>518!21=1?471:!"=!455!:474!8178!@395:!G1!":1=7"6"1:! 48!@3=74"="=2!701!"2="74G51!5"H9":!981:!73!21=1?471!701!84F>51I! ! 5.1.3 Supervised Multivariate Statistics Study Summary ! /01!2345!36!70"8!879:;!<48!73!"=#187"2471!701!>371=7"45!63?!98"=2!89>1?#"81:! F957"#4?"471!8747"87"@45!>?3@1:9?18!89@0!48!WPECD!63?!701!@5488"6"@47"3=!36!"2="74G51!5"H9":! 874=:4?:8!8>4=="=2!6"#1!:"661?1=7!DW/E!P=71?=47"3=45!@548818I!J1@4981!70"8!<48!4!>?336L36L @3=@1>7!879:;A!701!>?1#"3985;!F1=7"3=1:!@3F>5"@47"=2!64@73?8!36!6"?1!:1G?"8!4=45;8"8!<1?1! =37!"=#187"2471:I! ! P=81@7!?1>1551=7A!24835"=1A!>4"=7!70"==1?A!6915!874G"5"K1?A!6915!"=X1@73?A!4=:!:"1815!<1?1! 455!8151@71:!48!"2="74G51!5"H9":!874=:4?:8I!T4@0!<48!:"5971:!"=!F170;51=1!@053?":1!4=:! 4=45;K1:!G;!VCLEW!98"=2!4!:"?1@7!"=X1@7"3=I!/01!5"H9":8!<1?1!4=45;K1:!"=!?1>5"@471!Q=]%)SI! /01!?18957"=2!73745!"3=!@0?3F4732?4F8!<1?1!89GX1@71:!73!BCD!73!488188!701!=479?45! 2?39>"=28!36!701!5"H9":8!4=:!:171?F"=1!701!=9FG1?!36!BC8!7047!80395:!G1!981:!73!21=1?471! 14@0!5"H9":!F3:15I!D55!5"H9":8!@395:!G1!:"661?1=7"471:!6?3F!3=1!4=3701?!98"=2!&!BC8I!/01! @0?3F4732?4F8!<1?1!701=!8>5"7!"=73!7?4"="=2!Q=]%&!>1?!5"H9":S!4=:!7187!Q=]'!>1?!5"H9":S! 8178I!/01!7?4"="=2!@0?3F4732?4F8!<1?1!981:!73!21=1?471!4!BCD!F3:15!63?!14@0!"2="74G51! 5"H9":A!?18957"=2!"=!8"$!73745!F3:158I!/01!7187!@0?3F4732?4F8!<1?1!701=!@5488"6"1:!48! G153=2"=2!73!3=1A!F957">51!3?!=3!F3:158I!/01!@5488"6"@47"3=8!<1?1!>1?63?F1:!98"=2!1"701?!&! BC8!3?!701!=9FG1?!36!BC8!?1@3FF1=:1:!G;!701!8367<4?1!73!:18@?"G1!14@0!36!701!"2="74G51! ! %)%! 5"H9":!F3:158I!/0"8!>?3@188!<48!4583!?1>1471:!98"=2!639?!:"661?1=7!1$7?4@71:!"3=! @0?3F4732?4F8!QTPCSI!T4@0!TPC!@0381=!?1>?181=71:!4!:"661?1=7!@5488!36!@3F>39=:8I! ! ^8"=2!/PC8!701!@5488"6"@47"3=8!36!701!7187!@0?3F4732?4F8A!<0"@0!<1?1!>1?63?F1:! 98"=2!G370!&!4=:!701!?1@3FF1=:1:!=9FG1?!36!BC8A!<1?1!89@@188695!47!4!.I%RA!%RA!)RA!4=:! %.R!8"2="6"@4=@1!51#15I!_=1!24835"=1!7187!/PC!?1F4"=1:!9=@5488"6"1:!47!4!&)R!8"2="6"@4=@1! 51#15A!?124?:5188!36!701!=9FG1?!36!BC8!981:!73!:18@?"G1!701!F3:158I!W"2="6"@4=@1!51#158!5188! 704=!)R!4?1!F3?1!8747"87"@455;!8"2="6"@4=7`!701?163?1A!701!54@N!36!@5488"6"@47"3=!36!3=1! 24835"=1!?1>5"@471!47!&)R!<48!=37!36!2?147!@3=81H91=@1I!O9?701?F3?1A!701!?1483="=2!63?!701! 9=@5488"6"1:!24835"=1!?1>5"@471!47!&)R!<48!F4"=5;!:91!73!>33?5;!:18@?"G1:!5"H9":!F3:158A! <0"@0!<1?1!701!?18957!36!:1#153>"=2!F3:158!6?3F!4!0"205;!8"F"54?!817!36!?1>5"@471! @0?3F4732?4F8!63?!14@0!"2="74G51!5"H9":I!M01=!"2="74G51!5"H9":!F3:158!<1?1!:1#153>1:!G;! >1?63?F"=2!BCD!3=!4!817!36!>?4@7"@455;!":1=7"@45!@0?3F4732?4F8A!701!"=8"2="6"@4=7!=3=L @01F"@45!#4?"47"3=!<48!":1=7"6"1:!4=:!1F>048"K1:!"=!701!F3:158I!M0"51!70"8!<48!7?91!36!455! 36!701!5"H9":!F3:158!21=1?471:!63?!70"8!879:;A!701!24835"=1!<48!>4?7"@954?5;!>33?5;! :18@?"G1:!8"=@1!701!=3=L@01F"@45!#4?"47"3=!6?3F!701!G4815"=1!@3=7?"G971:!F3?1!73!701! 24835"=1!F3:15!704=!4=;!3701?!F3:15I!P7!<48!63?!70181!?1483=8!7047!701!24835"=1!?1>5"@471! <48!=37!@5488"6"1:!47!4!0"201?!8"2="6"@4=@1!51#15I! ! C5488"6"@47"3=!98"=2!WPECD!3=!TPC8!<48!4583!"=#187"2471:I!C5488"6"@47"3=!63?!1#1?;!7187! TPC!63?!701!639?!"3=8!"=#187"2471:!<48!89@@188695!4@?388!455!8"2="6"@4=@1!51#158!98"=2!1"701?!&! BC8!3?!701!=9FG1?!?1@3FF1=:1:!G;!701!8367<4?1I!C5488"6"@47"3=8!63?!14@0!"#$!TPC!:":!#4?;! 6?3F!98"=2!&!BC8!73!98"=2!701!?1@3FF1=:1:!=9FG1?A!7098!0"205"207"=2!701!"F>3?74=@1!36! @0338"=2!701!3>7"F45!=9FG1?!36!BC8!<"70!<0"@0!73!:18@?"G1!701!"2="74G51!5"H9":!F3:158I! ^8"=2!733!F4=;!BC8!@395:!?18957!"=!4!F3:15!7047!"8!8"2="6"@4=75;!"=6591=@1:!G;!@01F"@455;! ! %)&! "=8"2="6"@4=7!#4?"47"3=8!<01?148!98"=2!733!61371=7"45!36!9=89>1?#"81:!4=:!89>1?#"81:!8747"87"@45!>?3@1:9?18!048!G11=! :1F3=87?471:!"=!6"?1!:1G?"8!4=45;8"8!9=:1?!54G3?473?;!@3=:"7"3=8A!G97!G163?1!7047!>371=7"45! @4=!G1!?145"K1:!"=!63?1=8"@!54G3?473?;A!F3?1!?1814?@0!=11:8!73!G1!>1?63?F1:I!M0"51!BCD!048! G11=!981:!73!"=#187"2471!701!1661@78!36!1#4>3?47"3=A!F47?"$!"=71?61?1=@18A!4=:!701?F45! :12?4:47"3=!3=!701!4883@"47"3=!36!8"F95471:!6"?1!:1G?"8!73!701"?!?18>1@7"#1!874=:4?:8A!WPECD! 048!=37I!P=!701!>?336L36L@3=@1>7!879:;!>?181=71:!"=!70"8!7018"8A!701!>371=7"45!63?!WPECD!<48! :1F3=87?471:!G;!@5488"6;"=2!701!/PC8!4=:!TPC8!36!"2="74G51!5"H9":!874=:4?:8I!/0181! 874=:4?:8!@395:!148"5;!04#1!G11=!@5488"6"1:!G;!6"?1!:1G?"8!4=45;878!G;!#"89455;!4=45;K"=2!701! @0?3F4732?4F8I!Y1$7!701!>371=7"45!981!36!WPECD!3=!@0?3F4732?4>0"@!:474!7047!"8!5188! #"89455;!3G#"398!F987!G1!"=#187"2471:!G1@4981!"7!"8!63?!70"8!7;>1!36!:474!7047!WPECD!"8! =11:1:I! ! /01!4=45;8"8!36!6"?1!:1G?"8!6?3F!4!@?"F1!8@1=1!"8!45<4;8!@3F>5"@471:!G;!1#4>3?47"3=! 36!701!5"H9":A!"=71?61?1=@1!@3F>39=:8!6?3F!701!F47?"$!4=:!701?F45!:12?4:47"3=!36!G370!701! 5"H9":!4=:!F47?"$I!/0181!1661@78!3=!701!4=45;8"8!36!6"?1!:1G?"8!<1?1!45?14:;!"=#187"2471:! 98"=2!BCD!4=:!80395:!G1!"=#187"2471:!=1$7!98"=2!WPECDI!D8!<"70!BCDA!701!@3F>5"@47"=2! 64@73?8!80395:!G1!"=#187"2471:!<"70!WPECD!"=!4!>"1@1<"81!F4==1?A!98"=2!F957">51!:474!8178! 73!"55987?471!701!"=:"#":945!4=:!@3FG"=1:!1661@78!36!14@0!@3F>5"@47"=2!64@73?I!/01! 89@@188695=188!36!701!@5488"6"@47"3=!<395:!:1F3=87?471!<01701?!3?!=37!WPECD!048!4!6979?1! ! %)'! "=!6"?1!:1G?"8!4=45;8"8I!D::"7"3=455;A!70"8!7;>1!36!879:;!<395:!45535"@47"=2!64@73?!F387!=1247"#15;!"F>4@78!701!@5488"6"@47"3=!4=:A! >371=7"455;A!"=#187"2471!<4;8!73!F"="F"K1!70181!1661@78!89@0!48!98"=2!TPC8!4=:!1$7?4@71:!"3=! >?36"518!QTPB8SI! ! _=@1!701!"="7"45!879:;!:18@?"G1:!4G3#1!98"=2!WPECD!048!G11=!>1?63?F1:A! :1F3=87?47"=2!7047!701!@5488"6"@47"3=!36!8"F95471:!6"?1!:1G?"8!4@@3?:"=2!73!"2="74G51!5"H9":! 7;>1!"8!>388"G51A!WPECD!4=:!BCD!<"55!G1!47!701!84F1!87421!"=!?1814?@0!4=:!:1#153>F1=7!63?! 981!"=!6"?1!:1G?"8!4=45;8"8I!D7!70"8!>3"=7A!G370!>?3@1:9?18!@395:!G1!981:!73!"=#187"2471!701! 1661@78!36!F3?1!@3FF3=!F47?"@18!89@0!48!5"=3519F!6533?"=2A!:1@3?47"#1!>4=15"=2A!3?! @39=71?73>!89?64@18I!D::"7"3=455;A!6"?1!8@1=18!4?1!@3F>?"81:!36!701!G9?="=2!36!F957">51! F47?"@18!"=!@5381!>?3$"F"7;!73!3=1!4=3701?`!"7!"8!0"205;!9=5"N15;!7047!3=5;!3=1!G9?=1:!F47?"$! <395:!G1!>?181=7I!D8!4!?18957A!6979?1!879:"18!80395:!"=#187"2471!701!1661@78!36!F"$1:! F47?"@18I! ! /01!4883@"47"3=!4=:!@5488"6"@47"3=!36!6"?1!:1G?"8!84F>518!98"=2!3701?!"2="74G51!5"H9":8! 80395:!4583!G1!"=#187"2471:!"=!@3=X9=@7"3=!<"70!1#4>3?47"3=A!F47?"$!"=71?61?1=@18A!4=:! 701?F45!:12?4:47"3=I!V4835"=1!4=:!N1?381=1!<1?1!"=#187"2471:!"=!70"8!879:;!G1@4981!36! 701"?!>?1#451=7!981!:9?"=2!701!@3FF"88"3=!36!4?83=A!G97!3701?!5"H9":8!89@0!48!5"2071?!659":! 4?1!4583!@3FF3=5;!981:I!C3=81H91=75;A!F3?1!5"H9":8!8>4=="=2!F957">51!DW/E!P=71?=47"3=45! @548818!=11:!73!G1!"=#187"2471:!"=!701!>?181=@1!36!701!463?1LF1=7"3=1:!@3F>5"@47"=2!64@73?8I! D::"7"3=455;A!F"$1:!5"H9":8!80395:!G1!"=#187"2471:!G1@4981!4?83="878!981!"2="74G51!5"H9":8! 7047!4?1!4#4"54G51!73!701F!4=:!F4;!981!4!F"$79?1!36!F957">51!"2="74G51!5"H9":8!73!874?7!6"?18I!! ! /0"8!?1814?@0!:1F3=87?471:!701!>371=7"45!63?!981!36!9=89>1?#"81:!4=:!89>1?#"81:! F957"#4?"471!8747"87"@45!>?3@1:9?18!89@0!48!BCD!4=:!WPECD!"=!>1?63?F"=2!4=!3GX1@7"#1! ! %)(! 4=45;8"8!36!6"?1!:1G?"8I!M0"51!701!?189578!36!70"8!?1814?@0!4?1!>?3F"8"=2A!F9@0!F3?1! ?1814?@0!=11:8!73!G1!>1?63?F1:!G163?1!70181!>?3@1:9?18!@4=!G1!"F>51F1=71:!"=!63?1=8"@! 54G3?473?"18I!/01!?1814?@0!>1?63?F1:!"=!70"8!7018"8!"8!4!=1@1884?;!871>!"=!G?":2"=2!701!24>! G17<11=!?1814?@0!4=:!701!4>>5"@47"3=!36!70181!>?3@1:9?18!63?!701!3GX1@7"#1!4=45;8"8!36!6"?1! :1G?"8!"=!63?1=8"@!54G3?473?"18I! ! %))!