DEVELOPMENT OF CLASS REFERENCE STANDARDS FOR MULTIVARIATE
STATISTICAL ANALYSIS OF FIRE DEBRIS
By
Jordyn Geiger

A THESIS
Submitted to
Michigan State University
In partial fulfillment of the requirements
For the degree of
Forensic Science—Master of Science
2014

ABSTRACT
DEVELOPMENT OF CLASS REFERENCE STANDARDS FOR MULTIVARIATE
STATISTICAL ANALYSIS OF FIRE DEBRIS
By
Jordyn Geiger
Research in statistical analysis of fire debris evidence has grown increasingly as a result
of the 2009 National Academy of Sciences report. Multivariate statistical procedures such as
principal components analysis (PCA) have previously been investigated as a means for
associating simulated fire debris samples to the corresponding ignitable liquid standards. This
research investigated the development of class reference standards aimed to standardize a
multivariate statistical approach for the analysis of fire debris.
Three standards representative of ASTM chemical classes were developed in this
research to investigate the utility of an alternative standard data set. Standards were developed
based on major characteristic compounds of each class. Commercially available ignitable liquid
standards, developed class reference standards, and simulated fire debris samples were analyzed
by gas chromatography-mass spectrometry. The utility of developed reference standards was
investigated using PCA, hierarchical cluster analysis (HCA), and k-Nearest neighbors (k-NN) as
a means of generating a more standardized approach. Commercially available standards were
also used to investigate the impact of data set selection for successful association and
classification of simulated fire debris samples to the corresponding standard and address current
limitations of commercial ignitable liquid standards. Association and classification of class
reference standards was successful and showed some potential as an alternative to commercially
available ignitable liquid standards.

ACKNOWLEDGEMENTS
I would first like to thank everyone for all of the support I have received over the past
two years. Most importantly, I would like to thank Dr. Ruth Smith for all of her guidance and
encouragement throughout my time at Michigan State and for making this experience
challenging and fulfilling. Thank you to Dr. Chris Melde for dedicating time to sit on my
committee and to Dr. Victoria McGuffin for dedicating her time and expertise throughout my
time at Michigan State.
I would like to thank everyone in the MSU Forensic Science Program for their support,
assistance, and encouragement including John McIlroy, Christy Hay, Barb Fallon, Fanny Chu,
and Becca Brehe. I would particularly like to thank my fellow second year’s Mac Hopkins,
Ashley Mottar, and Ashley Doran for being great friends and making this experience much more
enjoyable.
I would also like to thank my family and friends who have been my greatest support
system throughout this entire experience especially my parents and grandparents who have
believed in me from the start. Finally, I would like to thank Kari and Jake for supporting me and
putting up with me each and every day throughout all the stress and craziness of this adventure!

iii

TABLE OF CONTENTS

LIST OF TABLES ....................................................................................................................... vi
LIST OF FIGURES .................................................................................................................... vii
1. Introduction ............................................................................................................................... 1
1.1 Background ...................................................................................................................... 1
1.2 Ignitable Liquid Classification ......................................................................................... 1
1.3 Current Methods in Fire Debris Analysis ........................................................................ 4
1.4 Limitations in Current Methods of Fire Debris Analysis ................................................. 5
1.5 Statistical Analysis of Fire Debris ................................................................................... 6
1.6 Literature Review ........................................................................................................... 10
1.6.1 Limitations in Current Methods of Fire Debris Analysis ....................................... 10
1.6.2 Statistical Analysis of Fire Debris .......................................................................... 12
1.7 Research Objectives ....................................................................................................... 25
REFERENCES ............................................................................................................................ 27
2.

Theory .................................................................................................................................. 30
2.1. Gas Chromatography-Mass Spectrometry ..................................................................... 30
2.2. Data Pretreatment .......................................................................................................... 39
2.3. Data Analysis ................................................................................................................. 41
2.3.1. Principal Component Analysis ............................................................................... 41
2.3.2. Euclidean Distance ................................................................................................. 42
2.3.3. Hierarchical Cluster Analysis................................................................................. 43
2.3.4. k-Nearest Neighbors ............................................................................................... 44
REFERENCES ............................................................................................................................ 48
3. Materials and Methods .......................................................................................................... 50
3.1. Commercial Ignitable Liquid Standards ........................................................................ 50
3.2. Class Reference Standards ............................................................................................. 50
3.3. Preparation of Simulated Fire Debris ........................................................................... 51
3.3.1. Burn Study ............................................................................................................... 53
3.3.2. Spike Volume Study ................................................................................................. 53
3.3.3. Simulated Fire Debris Samples .............................................................................. 54
3.4. Passive-Headspace Extraction ....................................................................................... 54
3.5. GC-MS Analysis ............................................................................................................. 56
3.6. Data Pretreatment .......................................................................................................... 56
3.7. Data Analysis ................................................................................................................. 57
3.7.1. Principal Components Analysis .............................................................................. 57
3.7.2. Euclidean Distance ................................................................................................. 57
3.7.3. Hierarchical Cluster Analysis................................................................................. 58
3.7.4. k-Nearest Neighbors ............................................................................................... 58
4. Investigation of Class Reference Standards for Association of Fire Debris to ASTM Class
using Principal Components Analysis....................................................................................... 59
4.1. Introduction .................................................................................................................... 59
iv

4.2. Commercially Available Standards and Corresponding Class Reference Standards ... 60
4.2.1. Gasoline .................................................................................................................. 60
4.2.2. Medium Petroleum Distillate .................................................................................. 62
4.2.3. Heavy Petroleum Distillate ..................................................................................... 62
4.3. Determination of Substrate Burn Times ......................................................................... 65
4.4. Simulated Fire Debris Samples ...................................................................................... 68
4.5. Association and Discrimination of Simulated Fire Debris using PCA .......................... 71
4.5.1. Commercial Ignitable Liquid Standards – Chemically Diverse Data Set .............. 72
4.5.2. Commercial Ignitable Liquid Standards – Refined Data Set.................................. 80
4.5.3. Class Reference Standards ..................................................................................... 87
4.6. Summary ......................................................................................................................... 91
5. Investigation of Class Reference Standards for Association and Classification of Fire
Debris to ASTM Class using Hierarchical Cluster Analysis and k-Nearest Neighbors ....... 93
5.1. Introduction .................................................................................................................... 93
5.2. Association of Simulated Fire Debris using HCA .......................................................... 94
5.2.1. Commercial Ignitable Liquid Standards- Chemically Diverse Data Set ................ 94
5.2.2. Commercial Ignitable Liquid Standards- Refined Data Set ................................. 102
5.2.3. Class Reference Standards ................................................................................... 106
5.3. Association of Simulated Fire Debris using k-NN ....................................................... 109
5.3.1. Commercial Ignitable Liquid Standards- Chemically Diverse Data Set .............. 109
5.3.2. Commercial Ignitable Liquid Standards- Refined Data Set ................................. 111
5.3.3. Class Reference Standards ................................................................................... 117
5.4. Summary ....................................................................................................................... 118
REFERENCES .......................................................................................................................... 121
6. Conclusions ............................................................................................................................ 123
6.1. Summary ....................................................................................................................... 123
6.1.1. Objectives and Goals ............................................................................................ 123
6.1.2. Association of Fire Debris using PCA .................................................................. 123
6.1.3. Association of Fire Debris using HCA ................................................................. 126
6.1.4. Association of Fire Debris using k-NN ................................................................. 127
6.2. Future Work ................................................................................................................. 128
REFERENCES .......................................................................................................................... 129

v

LIST OF TABLES

Table 1.1: ASTM Classification of Ignitable liquids ...................................................................... 2
Table 3.1: Composition of the gasoline class reference standard prepared in 15 mL of
dichloromethane ............................................................................................................................ 52
Table 3.2: Composition of the medium and heavy petroleum distillate class reference standards
prepared in 15 mL of dichloromethane ......................................................................................... 52
Table 3.3: Spike volumes used for simulated fire debris samples with respect to each commercial
ignitable liquid and substrate ........................................................................................................ 55
Table 4.1: Euclidean distances between fire debris scores and ignitable liquid standard scores for
the chemically diverse data set ..................................................................................................... 78
Table 4.2: Euclidean distances between fire debris scores and ignitable liquid standard scores for
the refined data set ........................................................................................................................ 83
Table 4.3: Euclidean distances between fire debris scores and ignitable liquid standard scores for
the refined data set containing diesel and excluding diesel .......................................................... 86
Table 4.4: Euclidean distances between fire debris scores and class reference scores for the class
reference data set........................................................................................................................... 90
Table 5.1: Percent classification of simulated fire debris containing carpet spiked with diesel to
the corresponding commercial diesel standard using 1, 3, 5, 7, and 9 nearest neighbors .......... 113

vi

LIST OF FIGURES

Figure 1.1: Representative total ion chromatograms of A) commercial paint thinner and B)
commercial upholstery protector to indicate differences within a given ASTM class ................... 9
Figure 2.1: Diagram of a gas chromatograph ............................................................................... 31
Figure 2.2: Diagram of a quadrupole mass analyzer .................................................................... 37
Figure 2.3: Diagram depicting Euclidean distance and the single-linkage process used during
HCA clustering ............................................................................................................................. 45
Figure 2.4: Diagram depicting k-NN classification based on the number nearest neighbors
selected .......................................................................................................................................... 47
Figure 4.1 Representative total ion chromatograms of A) commercial gasoline standard and B)
gasoline class reference standard with characteristic compounds identified ................................ 61
Figure 4.2: Representative total ion chromatograms of A) commercial torch fuel standard and B)
medium petroleum distillate class reference standard with characteristic compounds identified 63
Figure 4.3: Representative total ion chromatograms of A) commercial diesel standard and B)
heavy petroleum distillate class reference standard with characteristic compounds identified .... 64
Figure 4.4: Representative total ion chromatograms of A) 30-second burn time of the treated red
oak flooring substrate and B) 60-second burn time of the treated red oak flooring substrate with
characteristic compounds identified. Compounds from the wood treatment are indicated in red.66
Figure 4.5: Representative total ion chromatogram of 120-second burn time of nylon carpet with
carpet padding with characteristic compounds indicated ............................................................. 67
Figure 4.6: Representative total ion chromatograms for A) treated red oak flooring spiked with
75 µL of commercial diesel with substrate interferences indicated in red and B) commercial
diesel ignitable liquid standard. *C12 originates from both wood treatment and diesel................ 69
Figure 4.7: Representative total ion chromatogram of A) nylon carpet with carpet padding spiked
with 175 µL of commercial diesel with substrate interferences indicated in red and B)
commercial diesel ignitable liquid standard ................................................................................. 70
Figure 4.8: PCA scores plot of PC1 (32.5%) verses PC2 (24.4%) for the chemically diverse data
set with commercial standards only .............................................................................................. 73
Figure 4.9: Loadings plots for the chemically diverse data set with A) PC1 representing 32.5% of
the variance and B) PC2 representing 24.4% of the variance....................................................... 74

vii

Figure 4.10: PCA scores plot of PC1 (32.5%) verses PC2 (24.4%) for the chemically diverse data
set with commercial standards and simulated fire debris projected ............................................. 77
Figure 4.11: PCA scores plot of PC1 (60.2%) versus PC2 (24.4%) for the refined data set with
commercial standards and simulated fire debris projected ........................................................... 81
Figure 4.12: PCA scores plot of PC1 (63.5%) versus PC2 (25.1%) for the refined data set with
commercial standards (excluding diesel) and simulated fire debris projected ............................. 85
Figure 4.13: PCA scores plot of PC1 (72.8%) versus PC2 (27.1%) for the class reference
standards data set with the projected scores of the simulated fire debris ..................................... 89
Figure 5.1: Dendrogram of the chemically diverse data set with similarity levels indicated where
appropriate .................................................................................................................................... 95
Figure 5.2: Representative total ion chromatograms of A) commercial gasoline A standard, B)
commercial gasoline B standard, and C) commercial gasoline C standard with characteristic
compounds identified to highlight differences among each gasoline standard ............................ 97
Figure 5.3: Dendrogram of the chemically diverse data set with simulated fire debris consisting
of carpet spiked with diesel......................................................................................................... 100
Figure 5.4: Dendrogram of the refined data set with simulated fire debris consisting of carpet
spiked with diesel ........................................................................................................................ 103
Figure 5.5: Dendrogram of the class reference data set with simulated fire debris consisting of
carpet spiked with diesel ............................................................................................................. 107
Figure 5.6: Representative total ion chromatogram of A) simulated fire debris consisting of
carpet spiked with commercial diesel, B) commercial diesel standard, and C) commercial fuel
injector standard with compounds identified .............................................................................. 114

viii

1. Introduction
1.1 Background
Arson is a crime that involves the ignition of a fire with the intent to cause damage. The
damage caused can be intended solely for destruction or for the purpose of covering up another
crime. Arson is typically identified by the presence of an ignitable liquid in debris collected from
the scene of the crime. Perpetrators will use an ignitable liquid in order to speed up the spread of
the fire and increase the amount of damage caused; therefore, the ignitable liquid used is
commonly an easily accessible liquid such as gasoline. It is the job of the forensic scientist to
determine if there is an ignitable liquid present within the fire debris and to classify any ignitable
liquids present according to chemical class.
1.2 Ignitable Liquid Classification
ASTM International has classified ignitable liquids into eight different classes based on
chemical composition (1). These classes include gasoline, petroleum distillates, isoparaffinic
products, aromatic products, naphthenic paraffinic products, n-alkane products, oxygenated
solvents, and miscellaneous as seen in Table 1.1. All classes of ignitable liquid, with the
exception of gasoline, are further classified based on the distribution of normal alkanes present
(1). For example, a petroleum distillate can be a light petroleum distillate, which by definition,
contains normal alkanes ranging from four to nine carbons (denoted C4-C9). Similarly, a medium
petroleum distillate contains normal alkanes ranging from eight to 13 carbons (C8-C13) and a
heavy petroleum distillate contains normal alkanes ranging from eight to more than 20 carbons
(C8-C20+).

1

Table 1.1: ASTM Classification of Ignitable liquids
Class

Composition

Gasoline – all
brands, including
gasohol

C3- and C4-alkylbenzenes
and various aliphatic
compounds

Petroleum
Distillates

Homologous series of nalkanes; less significant
isoparaffinic,
cycloparaffinic, and
aromatic components

Petroleum Ether,
cigarette lighter fluids,
camping fuels

charcoal starters,
paint thinners,
dry cleaning solvents

Kerosene, diesel fuel,
jet fuels,
charcoal starters

Isoparaffinic
Products

Branched chain
(isoparaffinic); cyclic
(naphthalenic) alkanes and
n-alkanes insignificant or
absent

Aviation gas,
specialty solvents

Charcoal starters,
paint thinners,
copiers toners

Commercial specialty
solvents

Aromatic compounds;
aliphatic compounds
absent or insignificant

Paint and varnish
removers, automotive
parts cleaners,
xylenes, toluene-based
products

Automotive parts
cleaners, specialty
cleaning solvents,
insecticide vehicles,
fuel additives

Insecticide vehicles,
industrial cleaning
solvents

Aromatic
Products

Light (C4-C9)

Medium (C8-C13)

Heavy (C8-C20+)

Fresh gasoline is typically in the range C4-C12

Classification defined by ASTM International

2

Table 1.1 (cont’d)
Class

Composition

Light (C4-C9)

Medium (C8-C13)

Heavy (C8-C20+)

Naphthenic
Paraffinic
Products

Branched chain
(isoparaffinic) and cyclic
(naphthalenic) alkanes; nalkanes insignificant or
absent

Cyclohexane-based
solvents/products

Charcoal starters,
insecticide vehicles,
lamp oils

Insecticide vehicles,
lamp oils, industrial
cleaning solvents

n-Alkane
Products

Only n-alkanes, typically
containing 5 or less

Solvents, pentane,
hexane, heptane

Candle oils,
copier toners

Candle oils,
carbonless forms,
copier toners

Oxygenated
Solvents

Oxygenated products
including alcohols, esters,
ketones; major components
include toluene or xylene

Alcohol, ketones,
lacquer thinners, fuel
additives, surface
preparation solvents

Lacquer thinners,
industrial solvents,
metal cleaners/gloss
removers

Liquids that cannot
otherwise be classified

Single component
products,
blended products,
enamel reducers

Turpentine products,
blended products,
specialty products

OthersMiscellaneous

Classification defined by ASTM International

3

Blended products,
specialty products

1.3 Current Methods in Fire Debris Analysis
Fire debris evidence is commonly analyzed using a passive-headspace extraction
procedure. A passive-headspace extraction involves sealing the fire debris evidence in a clean,
unused metal paint can or a nylon bag. An activated carbon strip (ACS) is suspended over the
sample within the paint can or nylon bag and the sample is heated in an oven at a temperature
ranging from 50 °C to 80 °C for 2 to 24 hours, as recommended by ASTM International (2).
Over the elapsed time, volatile compounds from the sample enter the headspace of the container
or bag and adsorb onto the ACS. In order to extract any volatile compounds adsorbed by the
ACS, the ACS is eluted with an organic solvent, such as dichloromethane (CH2Cl2).
Subsequently, the extract is analyzed using gas chromatography-mass spectrometry (GC-MS).
GC allows for the separation of compounds in a complex mixture and MS allows for the
definitive identification of those compounds, making GC-MS a useful tool in fire debris analysis.
Using GC-MS, total ion chromatograms (TICs), extracted ion chromatograms (EICs),
and extracted ion profiles (EIPs) are generated and used to identify any ignitable liquid present in
the fire debris extract. An EIC contains only ions of a specific mass-to-charge (m/z) ratio that
may be of interest while an EIP is a profile of multiple ions that are considered to be
characteristic of the compounds of interest (3). For example, ions with m/z ratios of 57, 71, 85,
and 99 are indicative of the alkane class whereas, ions with m/z ratios of 91, 105, 119, and 133
are indicative of the aromatic class. EICs and EIPs are more specific than the TIC and can be
used to exclude background ions and, in some cases, interference ions from the substrate can also
be eliminated.

4

Currently, fire debris analysts visually compare the TIC and EICs or EIPs of an extract to
an in-house reference collection of ignitable liquids. Major compounds indicative of specific
ignitable liquid classes are identified in order to characterize any ignitable liquid present in the
fire debris. However, the visual comparison of chromatograms from fire debris and those of
reference standards is challenging due to interferences from the fire debris substrate, evaporation
of the liquid that occurs during burning, and thermal degradation or pyrolysis of the substrate or
ignitable liquid that may potentially occur during the fire (3).
1.4 Limitations in Current Methods of Fire Debris Analysis
Interferences from the fire debris substrate make visual interpretation of TICs more
difficult as hydrocarbons inherent to the substrate may resemble an ignitable liquid, such as a
petroleum distillate. Ultimately, this can increase the risk of false positive identification of an
ignitable liquid in the debris. During the fire, volatile compounds in the ignitable liquid are lost
due to evaporation. This chemically changes the liquid so visual comparison of the
chromatogram of the evaporated liquid to unevaporated reference standards is more challenging;
however, the in-house reference collection can be expanded to include reference standards
evaporated to different levels. As a result of thermal degradation or pyrolysis, compounds are
broken down and may no longer be present in fire debris. In addition, the compounds are broken
down into several new compounds that are introduced to the chromatogram. Each of these
factors (i.e., substrate interferences, evaporation, and thermal degradation/pyrolysis) affects the
appearance of the chromatogram of the fire debris that is compared to the chromatograms in the
standard reference collection. These differences increase the risk for false positive and false
negative identification, as well as misclassification of an ignitable liquid. The issue of

5

subjectivity arises due to the complexity of visual comparisons and stresses the need for a more
objective method of comparison.
To help overcome the challenges encountered in fire debris analysis, the National Center
for Forensic Science (NCFS) maintains an ignitable liquid reference collection (ILRC) database
(4). The ILRC database currently consists of 695 ignitable liquid chromatograms, most of which
are unevaporated, although chromatograms of evaporated and biologically degraded liquids are
also included. Each ignitable liquid within the database is classified according to ASTM class
and has major peaks identified (4). In addition to the ILRC database, a substrate database is also
maintained by NCFS (5). This database currently consists of approximately 60 different
substrates with chromatograms of both unburned and burned substrates available. Similar to the
ILRC database, major compounds and the dominant ion profile for each substrate are indicated.
These databases are continuously growing to assist fire debris analyst with the current challenges
faced during fire debris analysis.
1.5 Statistical Analysis of Fire Debris
In 2009, the National Academy of Sciences (NAS) released a report entitled
Strengthening Forensic Science in the United States: A Path Forward (6). This report
emphasized the need for a more objective approach for the analysis of forensic evidence. The
report indicated that a statistical assessment of forensic evidence would reduce false positive or
negatives and would be more suitable for satisfying the Daubert standard (6).
As a result of the NAS report (6), there has been increasing interest in statistical
procedures for the analysis of forensic evidence. For the analysis of fire debris evidence
specifically, multivariate statistical procedures, such as principal component analysis (PCA),
6

have been investigated for associating simulated fire debris to the corresponding ignitable liquid
reference standard. PCA has been utilized to reduce the subjectivity of fire debris analysis and to
increase confidence in visual comparisons of TICs, EICs, and EIPs. Successful association of
simulated fire debris to an ignitable liquid or ignitable liquid class has been achieved using PCA
(7, 8).
PCA generates two main outputs: scores and loadings plots. The scores plot is a scatter
plot that represents the association and discrimination of the samples based on positioning in the
plot. Samples that are chemically similar will be positioned closely while those that are
chemically different will be separated from one another on the scores plot. A loadings plot is also
generated for each principal component (PC). These plots show the variables contributing most
to the variance described by that PC. Loadings plots are used to explain the positioning of the
samples on the scores plot. However, there are some limitations inherent to PCA for applications
in fire debris analysis.
First, association of simulated fire debris samples to ignitable liquid standards is based on
visual interpretation of the PCA scores plot. However, it is common for similar ignitable liquids
to be positioned close to one another on the scores plot. As a result, it is difficult to determine
visually which ignitable liquid the fire debris samples is most closely associated.
To reduce subjectivity of visual interpretation, additional metrics or statistical procedures
such as Pearson product-moment correlation (PPMC) coefficients (7-9), Euclidean distances, and
hierarchical cluster analysis (HCA) can be implemented (9). PPMC coefficients measure the
linear correlation between two samples and can be used to determine the similarity between two
chromatograms. Euclidean distance is the distance between two given data points in a

7

multidimensional space and can be used to measure the distance between the ignitable liquid
standards and fire debris samples in the PCA scores plot. Hierarchical cluster analysis is a
complementary procedure to PCA that can be used to determine similarity between a fire debris
sample and ignitable liquid reference standard (8,9)
The second limitation of PCA for applications in fire debris analysis is based on the data
sets used for the analysis. Typically, PCA is performed on a data set containing commercially
available ignitable liquid reference standards. However, there is a large number of commercially
available ignitable liquids on the market (the ILRC database currently contains 695 ignitable
liquids) and it is not practical to include all ignitable liquids in a given data set. Further, the
chemical composition of ignitable liquids within the same ASTM class can also vary
substantially, as shown in Figure 1.1. The commercially available paint thinner (Figure 1.1A)
and the commercially available upholstery protector (Figure 1.1B) are both classified as
isoparaffinic products containing branched alkanes and cyclic alkanes. Although both ignitable
liquids contain the same classes of compounds, the two liquids still appear substantially
different, as upholstery protector contains branched alkanes ranging from C5-C7 and paint thinner
contains branched alkanes ranging from C7-C12.
Given the number of ignitable liquids that are commercially available, as well as the
chemical variation within an ASTM class, determining which commercial ignitable liquids to
include within a data set for PCA is challenging. To correctly associate an ignitable liquid in a
fire debris sample to the corresponding standard, the appropriate reference standard must be
present in the data set. However, as the liquid present in the fire debris sample will not be
known, the corresponding ignitable liquid reference standard may not be present in the data set.
This problem could be overcome with the development of reference standards that are more
8

Normalized Abundance

A
C7-C12
Branched and cyclic alkanes

0

10

20

30

Retention Time (min)

Normalized Abundance

B

0

C5-C7
Branched and cyclic alkanes

10

20

Retention Time (min)

Figure 1.1: Representative total ion chromatograms of A) commercial paint thinner and B)
commercial upholstery protector to indicate differences within a given ASTM class

9

30

representative of each ASTM class, thus providing a more standardized approach for statistical
analysis of fire debris evidence.
1.6 Literature Review
1.6.1 Limitations in Current Methods of Fire Debris Analysis
Research has been conducted in order to potentially move away from the subjective
visual comparison method toward a more objective comparison method. Lentini et al. analyzed
common household materials using ASTM procedures for passive-headspace extraction and GCMS analysis (10). The goal of this research was to demonstrate how similar the chromatograms
of household substrates can be to ignitable liquids. Common household items, such as colored
newspaper, spandex shorts, and tennis shoes contain petroleum-based liquids that are used during
the manufacturing of the item (10). Additional household items may also inherently contain
petroleum-based liquids such as stain used on wood flooring. During the investigation of
common household items, Lentini et al. also determined that ignitable liquids, such as toluene
and a heavy petroleum distillate, used during manufacturing, were detectable for at least five
years and up to 19 years after production in some materials (10).
Often, fire debris analysts also assess the ratios of characteristic compounds present in the
chromatogram of the fire debris sample and compare the ratios to those in the chromatogram of
the reference standard. Lentini et al. demonstrated the importance of also considering these ratios
(10). For example, even when toluene, xylenes, and C3-alkylbenzenes, that are characteristic of
gasoline, were present in a household material, the ratios of these compounds did not resemble
the ratios expected in gasoline, thereby reducing the risk of false positive identifications.
However, when ignitable liquids undergo extensive evaporation, the expected compound ratios
10

begin to vary making identification more difficult (10). Knowing the extent to which many
common household items may resemble an ignitable liquid allows a trained analyst to consider
this when visually comparing the fire debris evidence to a standard reference collection.
Nevertheless, the subjectivity of visual comparison remains an issue.
The evaporation and thermal degradation/pyrolysis that occur due the high temperatures
encountered during a fire also contribute to the difficulty in identifying any ignitable liquid. In
order to account for evaporation, Keto and Wineman created a library of neat and evaporated
liquid standards. The goal of this research was to develop a method for identifying ignitable
liquids regardless of evaporation levels and substrate interferences using target compound
chromatograms (TCCs). Ignitable liquids were evaporated to 4%, 20%, and 80% by volume (v/v)
and analyzed by GC-MS (11). A library of TCCs representative of each ignitable liquid (i.e.,
gasoline and petroleum distillates) was created. The TCCs were generated by reforming the
chromatogram to include only target compounds of interest based on corresponding m/z ratios.
The generated TCCs were slightly different from EICs and EIPs as the m/z ratios included were
more specific to the characteristic compounds in the ignitable liquids rather than m/z ratios of a
general class of compounds. The selected target compounds were compounds still present
(greater than 30% abundance relative to the base ion) after excessive evaporation and still
identifiable after extensive burning. For example, target compounds selected for gasoline
included trimethylbenzene (TMB), indane, and naphthalene, while target compounds selected for
a medium petroleum distillate (MPD) included the normal alkanes, C9-C12 (11).
To account for pyrolysis, items such as nylon carpeting, vinyl floor tile, and plywood
were heated to high temperatures in a metal can and volatile compounds were collected using a
charcoal adsorption tube (11). The extracts of the pyrolysis products were then spiked with a
11

0.1% (v/v) dilute ignitable liquid, and subsequently analyzed by GC-MS. The TICs and TCCs of
the pyrolysis/ignitable liquid samples were visually compared to the neat and evaporated
ignitable liquid standard library.
Using the TICs alone, Keto and Wineman determined it was not apparent whether an
ignitable liquid was present or if the compounds were a product of the substrate. However, using
the TCCs, the presence of an ignitable liquid could be determined, as well as the ASTM class
(11). For example, when looking at a TIC for fire debris containing gasoline it was not readily
apparent that gasoline was present, but when the target compounds were extracted, 13 of the 15
target compounds were present including TMB, indane, and naphthalene. The use of specific
target compounds helped to eliminate some potential for false negative identification when the
effects of pyrolysis were present; however, in sample preparation, the ignitable liquids in the
samples were not directly introduced to the heat and so did not undergo extensive evaporation
(11). More importantly, the method reported involved visual comparisons of TCCs, which
remains subjective and selecting only a series of compounds has the potential for the loss of
discriminatory information.
1.6.2 Statistical Analysis of Fire Debris
After the release of the NAS report, researchers have had an increasing interest in
statistical procedures for the analysis of forensic evidence. Specifically, research on statistical
procedures for the analysis of fire debris has increased as the subjectivity of visually comparing
chromatograms suggests a need for a more objective approach.
Sigman and Williams utilized covariance mapping to compare TICs of ignitable liquids
and generated fire debris samples for the purpose of automated database searching (12). Fifteen
12

different ignitable liquid references standards from nine different ASTM classes and simulated
fire debris were analyzed using three different GC-MS configurations. All three configurations
used the same instrument conditions (i.e., temperature program) but different instruments and
columns of varying length. The three configurations were used to develop an automated database
searching method that could be used universally among laboratories (12).
Covariance mapping was performed whereby, covariance matrices were calculated for
each standard and fire debris sample based on mass spectral data and the Manhattan distance
between two matrices was calculated. Pairwise comparisons between each of the ignitable
liquids and between the ignitable liquid standards and fire debris samples were calculated (12). A
distance of 0 indicated that the two matrices were similar, while a distance of 1 indicated
dissimilarity. Using covariance mapping, Sigman and Williams were able to discriminate
neat/lightly evaporated gasoline from heavily evaporated gasoline and were able to discriminate
among the light, medium, and heavy petroleum distillates. Additionally, association of the
simulated fire debris to the corresponding ignitable liquid was achieved (12). Covariance
mapping could be beneficial for screening ignitable liquids that are similar to the fire debris
sample; however, with the use of pairwise comparisons interpretation of the data would become
much more time consuming.
Turner and Goodpaster utilized PCA to investigate the microbial degradation of gasoline
over time and in two different types of soil (13). Four Molotov cocktails were constructed by
filling both wine and beer bottles to the neck with gasoline. The Molotov cocktails were then
ignited and tossed into either lawn soil or potting soil during the months of July and January. Soil
samples were collected and stored at room temperature then passive-headspace extracted, and
analyzed by GC-MS at 0, 2, 7, 11, 22, 45, and 60 days. PCA was performed on summed EIPs
13

based on characteristic compounds (i.e., n-alkane and aromatic). From the PCA scores plot, both
soil samples had similar levels of microbial degradation at day 0, but different levels of
degradation starting at day 11. Using PCA, Turner and Goodpaster determined that levels of
microbial degradation were dependent on the type of soil and the season; however, the effects of
degradation on the association to the corresponding ignitable liquid were not investigated.
Baerncopf et al. used statistical procedures to associate fire debris samples back to an
ignitable liquid standard (7). One liquid from six different ASTM classes was analyzed and
simulated fire debris was generated using nylon carpet spiked with 750 µL of an ignitable liquid
used as a standard. Samples were extracted using a passive-headspace extraction and analyzed by
GC-MS.
Similarities between the simulated fire debris chromatograms and corresponding
ignitable liquid chromatograms were assessed using PPMC coefficients (7). PPMC coefficients
indicated strong correlation between the simulated fire debris samples and the corresponding
ignitable liquid standard. For example, the PPMC coefficient for the comparison of fire debris
spiked with torch fuel and the torch fuel standard was 0.9609 ± 0.0102. PPMC coefficients can
be useful in determining similarities between samples; however, this method only allows for
pairwise comparisons. The larger the data set, the more comparisons need to be made, resulting
in more PPMC coefficients that have to be compared. As a result, this method can be time
consuming.
In addition, Baerncopf et al. also performed PCA on the TICs to investigate association
of the fire debris back to the ignitable liquid standard (7). A majority of the simulated fire debris
was successfully associated back to the ignitable liquid standard; however, due to the similarity

14

of some ignitable liquids, only association to class rather than specific liquid was possible. For
example, due to the similar chemical composition of petroleum distillates, the simulated fire
debris containing diesel could only be associated to the petroleum distillate class, but further
association could not be made because of the close clustering of ignitable liquid reference
standards within the petroleum distillate class. When this occurred, PPMC coefficients could be
used to associate the fire debris samples and ignitable liquid standards. Therefore, successful
association of the simulated fire debris and ignitable liquid standard was possible using both
PPMC coefficients and PCA. Moving towards using statistical procedures such as PPMC
coefficients and PCA eliminates the issue of subjectivity. However, due to the large spike
volume (750 µL) used with the simulated fire debris in this study, few substrate interference
compounds were observed and had little influence on the positioning of standards and samples
on the scores plot. Using a smaller spike volume would increase the amount of substrate
interference compounds and would be more representative of actual fire debris evidence.
By generating scores and loadings plots for the liquid standards only and then projecting
fire debris samples onto the scores plot, PCA can be used to account for substrate interferences
within fire debris samples. In this way, the positioning of the fire debris samples on the scores
plot is based solely on the compounds present in the standards used to generate the original
scores plot. PCA therefore acts as a filter, essentially filtering out any contributions from the
substrate. Prather et al. used PCA in this way to associate simulated fire debris samples to the
corresponding ignitable liquid (8). Neat and evaporated gasoline and kerosene were used as the
ignitable liquid standards. Simulated fire debris was generated by spiking 20 µL of the neat and
evaporated ignitable liquid standards onto nylon carpet and burning. All extracts were obtained
using a passive-headspace extraction and analyzed by GC-MS.
15

Performing PCA on the TICs, Prather et al. generated a scores plot and loadings plot
using only the neat and evaporated ignitable liquid standards (8). The simulated fire debris was
then projected onto the scores plot in order to filter out interferences from the substrate. The
simulated fire debris samples associated to the ignitable liquid class, but the majority of the
samples were not associated back to the corresponding evaporation levels. This could be due to
the fact that the simulated fire debris was spiked with evaporated ignitable liquid and then
burned which would result in further evaporation of the liquid. However, using such a small set
of ignitable liquid standards limits the utility of this research. The two ignitable liquids used are
substantially different from one another; therefore, it would be expected for appropriate
association and differentiation to occur. As more ignitable liquids are introduced, successful
association will become more challenging. Therefore, the selection of the data set for analysis
affects the success of associating and discriminating samples properly. But, careful selection of a
data set may be interpreted as manipulating the data and further highlights the need for a more
standardized approach.
Hierarchical cluster analysis (HCA) has been studied as an additional statistical
procedure to investigate forensic evidence and associate unknown samples to samples of known
origin (8). HCA accounts for all dimensions of the data and highlights patterns and similarities
that may have not otherwise been obvious based on visual assessment alone. In addition, the
similarity of the samples is only relative to the data set as a result, HCA will highlight
similarities, but there will always be samples with no similarity in the data set. Therefore, with a
limited data set, samples that appear visually similar may show little to no similarity after HCA,
depending on the other members of the data set.

16

Goodpaster et al. used HCA in two different studies to associate electrical tape samples
(13,14). In the first study, 67 different rolls of black electrical tape from 34 different brands were
analyzed using scanning electron microscopy and energy dispersive spectroscopy (13). HCA was
performed based on the elemental profile of the tape adhesive using Euclidean distance and
Ward’s algorithm to determine clustering. Successful association according to tape grade (i.e.,
general, mid-range, and premium) was possible and in some cases, association to manufacturing
year was possible based on prior manufacturing knowledge of tape.
In a second study, Goodpaster et al. used HCA to associate 79 different rolls of electrical
tape from 36 different brands to the corresponding brand (14). The electrical tapes were analyzed
using attenuated total reflectance-Fourier transform infrared spectroscopy (ATR-FTIR) and HCA
was applied to the resulting spectra using the previously specified metrics for clustering.
Association of electrical tape with black adhesive according to brand was successful, but
association of clear adhesive was only partially successful. However, similarity levels were not
utilized during the study of electrical tape, but instead distances were reported. Distances are
limiting as the distance can change based on the distance metric used.
Mat-Desa et al. utilized HCA to compare a variety of different lighter fuels in an effort to
associate the lighter fuels based on brand (16). Fifteen different lighter fuel refills from a total of
five different brands were analyzed by GC-MS. HCA was performed on both raw and pretreated
data using 51 characteristic peaks across the full data set. Pretreated data included normalized
data, normalized and square root transformed data, and normalized and fourth root transformed
data. The characteristic peaks were selected based on peaks with similar retention times and
peaks that had a relative standard deviation of less than 5% based on triplicate analyses. Based
on visual assessment of the characteristic peaks in the TICs, three brands appeared to be similar
17

and the remaining two brands appeared similar to one another, but different from the other
brands. HCA was performed on all pretreated data sets using complete linkage; however, the
distance metric used was not specified.
Mat-Desa et al. determined that using the raw, normalized, and normalized with square
root transformation data, the samples did not correctly associate by brand (16). However, using
the normalized with fourth root transformation data, both neat and evaporated lighter fuels were
correctly associated to the corresponding brands. Two brands that were visually different from
one another based on the TICs appeared to be similar based on the given dendrogram. However,
little similarity was implied between two brands with visually similar TICs.
Although the research conducted by Mat Desa et al. accounted for varying levels of
evaporation that may occur during the burning process, substrate interference compounds were
not accounted for. Introducing simulated fire debris samples into the analysis might introduce the
problems from both evaporation and interference compounds making the analysis even more
challenging. In addition, only using 51 characteristic peaks instead of a full TIC removes
variables that could otherwise influence the association and could result in the loss of
discriminatory information.
In a second study, Mat-Desa et al. utilized HCA to associate different brands of medium
petroleum distillate (MPD) samples (17). Three different MPDs (white spirit, paint brush
cleaner, and lamp oil) from varying brands were used to give a total of eight MPD samples.
Samples were prepared and analyzed similar to the previous study.
Characteristic peaks (85 in total) were selected using the previously mentioned criteria
and the data were pretreated using normalization, sixteen square root transformation, and row
18

scaling; however, the need for this data pretreatment selection was not elucidated (17). HCA was
performed using Euclidean distance and complete linkage. Successful association of six of the
MPDs by brand regardless of evaporation level was possible while, for two of the samples, it was
not possible to associate the 70, 90, and 95% evaporated samples. Poor association of these
samples was likely due to extensive evaporation and volatility of compounds in the samples
compared to the other samples used.
Mat-Desa et al. did demonstrate some successful association of MPDs at different
evaporation levels; however, MPDs have a relatively low volatility and do not change
substantially through evaporation, as more volatile ignitable liquids such as gasoline do (17).
Interference compounds were still not accounted for and characteristic peaks were used over full
TICs, which could result in the loss of discriminatory information. Once again, similarity levels
were not used during the statistical analysis of lighter fluids and medium petroleum distillates,
but instead distances were used. As stated before, this can be limiting as the distance will change
based on the distance metric used.
Prather et al. also performed HCA to determine if evaporated ignitable liquids and
simulated fire debris would associate to the corresponding neat liquid (8). Neat, 10 and 90%
evaporated gasoline and neat, 10 and 70% evaporated kerosene were prepared. Simulated fire
debris samples were prepared by spiking the neat and evaporated samples onto nylon carpet with
carpet padding and burning for a predetermined burn time. All ignitable liquids and simulated
fire debris samples were passive-headspace extracted and analyzed by GC-MS.
HCA was performed using Euclidean distance and complete linkage first on the neat and
evaporated ignitable liquids alone and then, on the simulated fire debris samples and ignitable

19

liquids (8). In both iterations of HCA, the replicates of each ignitable liquid grouped together
first at similarity levels ranging from 0.80 to 0.99. The next clustering occurred between the neat
and 10% gasoline (0.71) and a second cluster between the neat and 10% kerosene (0.84). The
kerosene standards clustered together at a higher similarity than the gasoline, as expected due to
the higher volatility of some of the compounds in gasoline. The 70% evaporated kerosene and
90% evaporated gasoline clustered last to the corresponding neat standards due to the differences
in chemical composition as a result of evaporation. All evaporation levels of gasoline clustered
to the neat gasoline standard at lower similarity levels than the evaporated kerosene samples to
the neat kerosene standard. These lower similarity levels of evaporated gasoline are once again
due to the volatility of the components within gasoline and due to greater evaporation of gasoline
than kerosene. Lastly, all gasoline and kerosene samples clustered to one another with no
similarity, as expected due to the distinctly different chemical composition of these liquids.
HCA was performed a second time including the simulated fire debris samples to
determine if association to the corresponding liquid was possible in the presence of substrate
interference compounds and evaporation (8). Successful association of the simulated fire debris
to the corresponding ignitable liquid was possible, but not to the correct evaporation level.
Association to the proper evaporation level may have been limited by the process in which the
fire debris was generated. By spiking previously evaporated ignitable liquids onto the carpet and
then burning, the ignitable liquid evaporates further making the evaporation level unknown.
Unfortunately, it is difficult to generate simulated fire debris of known evaporation levels due to
the inability to control the burning process completely. As a result, it is difficult to generate
consistent simulated fire debris samples. In addition, using a small data set containing two very
different ignitable liquids makes association simpler.
20

While exploratory procedures including PCA and HCA have previously been used for the
successful association of fire debris evidence (7,8). These exploratory procedures have some
limitations, which have been previously highlighted. Another approach to statistical analysis
includes classification procedures such as soft independent model classification analogy
(SIMCA), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and
k-Nearest neighbors (k-NN). Classification procedures can be used to classify samples of
unknown origin to a defined class or sets of classes of known origin. SIMCA, LDA, and QDA
have previously been used to classify simulated fire debris samples to ignitable liquid standards
(18, 19), while k-NN has not yet been used for this application.
Tan et al. utilized SIMCA for classifying ignitable liquids in fire debris samples (18).
Fifty-one different ignitable liquids belonging to five different ASTM classes were analyzed
using GC-MS. Simulated fire debris samples were generated using the ignitable liquids and
either wood or carpet as the substrate. TICs of the ignitable liquids and simulated fire debris
were divided into 19 sections. These 19 sections were then summed generating 19 variables that
were used in subsequent data analysis. For model development in SIMCA, the training set
contained the ignitable liquid standards and some of the simulated fire debris samples.
Classification of the fire debris samples to the corresponding ignitable liquids was successful;
however, including simulated fire debris samples within the training set requires a prior
knowledge of the samples being tested. For example, if fire debris samples containing different
substrates or ignitable liquids than those present in the training set were included, correct
classification may not occur. In a forensic laboratory, it would not be possible to know what
types of simulated fire debris samples to include in the training set and therefore including fire

21

debris samples in the training set would be limiting. Additionally, using only 19 variables has the
potential for the loss of discriminatory information.
Sigman et al. used LDA and QDA as a multistep hard classification procedure for the
purpose of identifying the presence of ignitable liquids in fire debris and classifying any ignitable
liquids present based on ASTM class (19). The ultimate goal of this research was to be able to
use LDA and QDA to classify fire debris data collected from various laboratories. Sharing a
standard database for statistical classification among laboratories can be challenging as the
retention time of compounds in chromatographic data can shift based on the instrument and
conditions applied. As a result, Sigman et al. generated total ion spectra (TIS), or an average
mass spectrum across an entire chromatogram, to avoid complications with retention time
alignment (19).
Total ion spectra were generated using data from the ILRC and substrate database.
Statistical models were generated using 460 ignitable liquids from the ILRC database and 88
substrates from the substrate database (19). In addition, TIS were combined using software to
generate 4600 TIS samples containing an ignitable liquid and two substrates and 4400 TIS
samples containing three different substrates. These models were used to cross-validate the
multistep classification procedure and to classify simulated fire debris samples (19).
Principal component analysis was used for feature extraction in which principal
components (PCs) accounting for 50, 70, 90, and 95% of the variance were used (19). Using the
number of PCs that accounted for 95% of the variance resulted in the most successful
classification. Using LDA and QDA as a multistep classification procedure, a positive

22

classification rate of 70.9% was achieved for the fire debris samples, with a false positive rate of
8.9% (19).
k-Nearest Neighbors is another multivariate statistical tool used for hard classification
purposes. k-NN has been previously been utilized in forensic applications to classify unknown
samples to defined classes; however, the procedure has not been previously utilized for the
analysis of fire debris.
Said et al. utilized k-NN on a variety of handwriting samples in order to try to classify
them back to the original writer (20). A total of 40 different writers were used to write 25
documents each for a total of 1000 handwriting samples. Texture analysis was performed on the
handwriting using two different texture recognition algorithms. These algorithms were used to
identify the text features and the variables that were used for k-NN; however, the number of
nearest neighbors used was not specified. Of the 25 documents from each writer, 15 were used as
the training set and the remaining 10 were used as the test set. Using one of the texture
recognition algorithms, successful classification ranged from 77 - 86% and using the second
algorithm, successful classification ranged only from 66 - 74%.
Kumar et al. also investigated k-NN during a nondestructive ink analysis technique to
identify alterations made using 10 different blue ink ballpoint pens (21). Forty-five different
combinations of two intersecting pen strokes were generated. These combinations were repeated
again so that the intersections occurred in the opposite order and then all combinations were
doubled generating 180 combinations. Two different imaging models were used to identify
minor differences in color of the ink and a texture recognition algorithm was used to identify
differences in texture among the ink. k-NN was performed using the data generated from these

23

algorithms using 1, 3, 5, 7, 9, and 11 nearest neighbors. The most accurate classification
occurred using 5 nearest neighbors with an accuracy range of 80.00 - 97.56% and an average
accuracy of 85.51%.
Jiang et al. performed k-NN with other statistical procedures in an attempt to identify
items according to type of drug or explosive that had been concealed using body packaging (22).
An anthropomorphic phantom consisting of a head, chest, and abdomen was used to conceal a
variety of drugs, drug precursors, and explosives (6 different samples in total). The samples were
placed inside the stomach of the anthropomorphic phantom and analyzed 40 times using energy
dispersive X-ray diffraction (EDXRD). k-NN was performed on the resulting spectra based on
features extracted using positive matrix factorization (PMF), PCA, and robust PCA using only
one nearest neighbor. When k-NN was performed following feature extraction using PMF, the
highest classification rate of 99.5% was achieved. When k-NN was used following robust PCA
and PCA for feature extraction, classification rates of 98.8% and 98.1%, respectively, were
achieved. However, the use of one nearest neighbor could be problematic as a value of one is
susceptible to outliers and could result in misclassification.
k-NN has had some success for correct classification and shows some potential in
forensic applications (20-22), but has not been widely used in fire debris analysis. In fire debris
analysis, ignitable liquid reference standards could be used as defined classes and simulated fire
debris as the samples to be classified. However, there are a large number of ignitable liquids on
the market and chemical variability within each ASTM that make it challenging to determine
which ignitable liquids to use in the data set.

24

1.7 Research Objectives
For multivariate statistics to be practical and applicable in a forensic laboratory during
fire debris analysis, the methods need to be rapid, simple, and standardized for high throughput
and reproducibility. In order to obtain this outcome and remain objective, a revised approach
needs to be considered. In this research, the impact of the data set composition on successful
association of simulated fire debris samples to the corresponding reference standard was
investigated, using a variety of statistical procedures. Further, reference standards characteristic
of ASTM classes were developed and investigated as a more standardized approach for
subsequent statistical analysis.
PCA was used as the initial statistical method to compare the impact of data set selection
based on chemical diversity and to demonstrate the utility of class reference standards. However,
PCA can be limiting because visual interpretation of the scores plot remains subjective. As a
result, Euclidean distances were utilized to quantitatively assess the association of the fire debris
samples to the ignitable liquid standards. Although all dimensions of the data are accounted for
using PCA, interpretation of the data can be limiting as only two or three dimensions can be
compared simultaneously. Additional statistical procedures including HCA, which has
previously been used for fire debris analysis, and k-NN that has not yet been used for this
application, were investigated due to the limitations of PCA.
Commercial ignitable liquid standards and class reference standards were compared using
each of the statistical procedures and the advantages and disadvantages of HCA and k-NN were
investigated. Ultimately, the comparison of a chemically diverse data set, a refined data set, and
a class reference data set using several multivariate statistical procedures were performed in

25

order to determine a more standardized approach for reliable fire debris analysis. Developing a
set of chemical class references standards useful for a statistical approach would make fire debris
analysis more reliable, would help reduce the potential of false positive and negatives, would aid
in convincing a jury, and would satisfy the Daubert standard.

26

REFERENCES

27

REFERENCES

1. ASTM International, ASTM E 1618-06e1. Annual Book of ASTM Standards 14.02.
2. ASTM International, ASTM E 1412-07. Annual Book of ASTM Standards 14.02.
3. Baerncopf JM, McGuffin VL, Smith RW. Effect of Gas Chromatography Temperature
Program on the Association and Discrimination of Diesel Samples. Journal of Forensic
Sciences 2010; 55: 185-192.
4. National Center for Forensic Science. Ignitable liquids reference collection. Available at:
http://ilrc.ucf.edu (Accessed on 3 July 2014).
5. National Center for Forensic Science. Substrate Database. Available at: http://ilrc.ucf.edu
(Accessed on 3 July 2014).
6. Committee on Identifying the Needs of the Forensic Sciences Community, National
Research Council. Strengthening Forensic Science in the United States: A Path Forward.
Washington, D.C.: National Academies Press, 2009.
7. Baerncopf JM, McGuffin VL, Smith RW. Association of ignitable liquid residues to neat
ignitable liquids in the presence of matrix interferences using chemometric procedures.
Journal of Forensic Sciences 2011; 56: 70-81.
8. Prather KR, McGuffin VL, Smith RW. Effect of evaporation and matrix interferences on
the association of simulated ignitable liquid residues to the corresponding liquid standard.
Forensic Sciences International 2012; 222: 242-251.
9. Willard MAB, McGuffin VL, Smith RW. Forensic analysis of Salvia divinorum using
multivariate statistical procedures. Part I: discrimination from related Salvia species.
Analytical and Bioanalytical Chemistry 2012; 402: 833-842.
10. Lentini JJ, Dolan JA, Cherry C. The Petroleum-Laced Background. Journal of Forensic
Sciences 2000; 45(5):968-989.
11. Keto RO, Wineman PL. Detection of Petroleum-Based Accelerants in Fire Debris by
Target Compound Gas Chromatograph/Mass Spectrometry. Analytical Chemistry 1991;
63: 1964-1971.
12. Sigman ME, Williams MR. Covariance Mapping in the Analysis of Ignitable Liquids by
Gas Chromatography/Mass Spectrometry. Analytical Chemistry 2006; 78: 1713-1718.

28

13. Turner DA, Goodpaster JV. The effects of season and soil type on microbial degradation
of gasoline residues from incendiary devices. Analytical and Bioanalytical Chemistry
2013; 405: 1593-1599.
14. Goodpaster JV, Sturdevant AB, Andrews KL, Brun-Conti L. Identification and
Comparison of Electrical Tapes Using Instrumental and Statistical Techniques: I.
Microscopic Surface Texture and Elemental Composition. Journal of Forensic Science
2007; 52: 610-629.
15. Goodpaster JV, Sturdevant AB, Andrews KL, Briley EM, Brun-Conti L. Identification
and Comparison of Electrical Tapes Using Instrumental and Statistical Techniques: II.
Organic Composition of the Tape Backing and Adhesive. Journal of Forensic Science
2009; 54: 328-338.
16. Mat-Desa WNS, NicDaeid N, Ismail D, Savage K. Application of Unsupervised
Chemometric Analysis and Self-organizing Feature Map (SOFM) for the Classification
of Lighter Fuels. Analytical Chemistry 2010; 82: 6395-6400.
17. Mat-Desa WNS, Ismail D, NicDaeid N. Classification and Source Determination of
Medium Petroleum Distillates by Chemometric and Artificial Neural Networks: A Self
Organizing Feature Approach. Analytical Chemistry 2011; 83: 7745-4454.
18. Tan B, Hardy JK, Snavely RE. Accelerant classification by gas chromatography/mass
spectrometry and multivariate pattern recognition. Analytica Chimica Acta 2000; 422:
37-46.
19. Waddell EE, Song ET, Rinke CN, Williams MR, Sigman ME. Progress Toward the
Determination of Correct Classification Rates in Fire Debris Analysis. Journal of
Forensic Sciences 2013; 58: 887-896.
20. Said HES, Tan TN, Baker KD. Personal identification based on handwriting. Pattern
Recognition 2000; 33: 149-160.
21. Kumar R, Pal NR, Chanda B, Sharma JD. Forensic Detection of Fraudulent Alteration in
Ball-Point Pen Strokes. IEEE Transactions on Information Forensics and Security 2012;
7: 809-820.
22. Jiang Y, Liu P. Feature extraction for identification of drug and explosive concealed by
body packaging based on positive matrix factorization. Measurement 2014; 47: 193-199.

29

2. Theory
2.1. Gas Chromatography-Mass Spectrometry
Chromatography is a technique used to separate a chemical mixture by means of two
individual phases known as the stationary phase and the mobile phase. Frequently gas
chromatography is coupled with a mass spectrometer (GC-MS) to provide additional data for a
more conclusive means of identification. For this reason, GC-MS is frequently used in forensic
science laboratories and more specifically, it is commonly used in fire debris analysis.
Samples suitable for GC-MS analysis are gases and volatile liquids and must be
thermally stable within the operating temperature of the instrument. For analysis of fire debris
evidence, the sample must first be extracted; commonly a passive-headspace extraction with an
activated carbon strip (ACS) is used. The ACS accumulates the volatile compounds and is then
eluted with a volatile organic solvent usually dichloromethane, methanol, or carbon disulfide. In
GC-MS analysis, a sample is introduced to a heated inlet, carried through a column within an
oven, and then exits to a detector. A basic diagram for a gas chromatograph (GC) is illustrated in
Figure 2.1.
Once the sample is prepared for analysis the sample is introduced to the inlet using a
syringe. The inlet is heated (typically 250-280 °C) so liquids are instantly volatilized upon
introduction. A nominal flow rate of an inert gas, commonly helium, is set to carry the
volatilized sample into the column. A split or splitless injection can be selected based on the type
of sample. Using a splitless injection, the entire sample injected is introduced onto the column.
Splitless injections are commonly used when the analytes are present in the sample at low

30

concentration. For a split injection, only a portion of the sample is introduced onto the column
and the remainder is transferred to waste. A split ratio, commonly 50:1 or 100:1, is

Syringe
Injection Port

Detector

Column

Oven
Gas Cylinder

Figure 2.1: Diagram of a gas chromatograph

31

selected to determine what portion of the sample is introduced to the column and what portion is
transferred to waste. Split injections are commonly used when a sample is highly concentrated to
prevent contamination of the GC column.
As previously mentioned, chromatographic separation of a sample mixture utilizes two
phases. These two phases are known as the stationary phase and the mobile phase and their
nature varies based on the type of chromatography. In GC, the stationary phase is coated inside a
column comprised of fused silica. Modern GC utilizes capillary columns that are narrow in
diameter. The stationary phase is then coated to the inner walls of the capillary column allowing
the mobile phase to easily flow through the center of the column with little obstruction. The
mobile phase, commonly known as the carrier gas, is the previously mentioned helium and its
purpose is to carry the sample through the column.
The chemical composition of the stationary phase varies based on the type of separation
desired. For example, a 100% dimethyl polysiloxane stationary phase is strictly nonpolar and is
beneficial for the separation of nonpolar compounds such as a range of hydrocarbons. Other
stationary phases such as a 5% diphenyl, 95% dimethyl polysiloxane stationary phase contain
phenyl groups. This type of stationary phase is still relatively nonpolar, but can be beneficial for
separating hydrocarbon mixtures that also contain aromatic compounds. Additionally, a polar
stationary phase such as a 50% cyanopropyl, 50% phenylmethyl polysiloxane can be used to
separate a mixture of polar compounds.
As the sample mixture travels through the column, separation of different analytes within
the sample occurs through partitioning to the stationary phase based on affinity and due to
differences in boiling points of the analytes. As these analytes come off the column, they are

32

detected by a detector and reported as a series of peaks in a chromatogram based on the amount
of time spent in the column, known as the retention time. If the analyte has a high affinity for the
stationary phase it will interact with the stationary phase longer, resulting in a longer retention
time, while those with little to no affinity for the stationary phase will continue through the
column via the flow of helium resulting in a shorter retention time. Similarly, if the analyte has a
low boiling point it will elute from the column faster than an analyte with a higher boiling point.
Analytes with low boiling points will result in shorter retention times. Alternatively, if the
analyte has a low boiling point but a high affinity for the stationary phase the analyte will move
through the column quickly, but will also interact with the stationary phase and, as a result, a
slight increase in retention time will occur.
The column used for separation is housed inside a temperature-controlled oven. The
temperature of the oven can be selected based on the analytes of interest. An isothermal
temperature program can be used, where the temperature remains constant during the entire
analysis. Isothermal temperature programs can be useful when the boiling point range of the
analytes in a sample is known and is within a small range. However, if the sample contains
analytes with a large range of boiling points, temperature programming the oven may be
necessary. Using a temperature program, an initial oven temperature can be set followed by an
increased in temperature at a specified rate.
An initial oven temperature (40-50 °C) is selected based on the type of solvent the
samples are prepared. An ideal initial oven temperature is typically 10-20 °C lower than the
boiling point of the solvent. As a result, the solvent will condense and focus at the head of the
column. As temperature increases, the focused solvent volatilizes and begins to move through
the column by having a focused starting point the resolution of separation is increased,
33

Additionally, the initial over temperature is typically held for several minutes (1-3 minutes) in
order to achieve better resolution and avoid detecting the solvent in which the sample was
prepared. A ramp rate (5-10 °C/min) can be programmed so analytes with a broader range of
boiling points can be analyzed simultaneously. A slow ramp rate can be beneficial because it
allows for better resolution of more analytes, but also results in a longer analysis time, broader
peaks, and poorer resolution as the longer the analytes spend in the column, the greater the
diffusion and mass transfer effects. In order to increase analysis time, a faster ramp rate can be
used, but will result in poorer resolution of analytes as there is less time for interactions with the
stationary phase to occur. Temperature programming is beneficial for separating more
compounds simultaneously; however, a compromise between good resolution and fast analysis
time must be made.
Following the separation of analytes mass analysis and detection of the individual
analytes is conducted using a detector. A variety of detectors can be used based on the type of
analysis being performed; however, a mass spectrometer is a common detector used in the bench
top instruments in forensic laboratories. Mass spectrometers are especially useful in forensic
science as they can be utilized for definitive identification of unknown samples. The column
containing the separated analytes enters into the mass spectrometer via the transfer line, which is
held at a high temperature (280-300 °C) to ensure all analytes remain in the gaseous phase. The
column is held within the transfer line and the tip of the column ends at the ion source of the
mass spectrometer, delivering the separated analytes directly into the ion source. The ion source
ionizes the analytes as they enter the mass spectrometer; the ions are then separated using a mass
analyzer and the separated ions are then detected.

34

The analysis must be carried out under vacuum conditions to prevent ions from colliding
with one another before detection and to pump away any additional molecules that were not
ionized. Upon introduction to the ion source, the analytes undergo ionization and, while there are
numerous ionization methods available, electron ionization is the most common for GC-MS
analysis. During electron ionization, a filament is heated generating electrons. The electrons
produced are then accelerated to a high energy (typically 70eV) generating a beam of highenergy electrons. These high-energy electrons interact with the analytes causing a loss of an
electron from the analyte resulting in a positively charged ion.
Electron ionization is known as a hard ionization method because fragmentation
commonly occurs during the ionization process. Typically, bonds within organic molecules are
substantially less than 70 eV and, as a result, there is sufficient excess energy for fragmentation
to occur. Fragmentation can be beneficial for definitive identification of a molecule as it allows
structural information to be obtained based on fragmentation patterns unique to that specific
compound under those ionization conditions. Commonly, compounds are identified based on
molecular mass and unique fragmentation patterns; however, excessive fragmentation could
result in the loss of the molecular ion making identification more difficult.
Once the ions are produced, they are directed into a mass analyzer. In GC-MS, the most
common type of mass analyzer is a single quadrupole mass analyzer. The ions are directed into
the quadrupole by the presence of a repeller plate (positively charged) and an ion focusing plate
(negatively charged) in the ion source. The positively charged ions are focused towards the
negatively charged plate and into the mass analyzer to be separated based on individual mass-tocharge (m/z) ratios.

35

A quadrupole mass analyzer consists of two sets of parallel rods positioned to form a
square orientation, as shown in Figure 2.2. A direct current (DC) potential is applied to two rods
opposite each other and a radio frequency (RF) potential is applied to the remaining adjacent
rods. The DC and RF potentials are applied so that they are 180° out of phase with the other.
Therefore, the two rods opposite each other always have the same charge and the two adjacent
rods always have the equal but opposing charge. These charges alternate as a function of time
causing ions that enter the quadrupole to have a wave-like trajectory.
At a given DC-RF ratio, only a narrow range of m/z ratio ions has a stable trajectory and
all other m/z ratio ions are unstable. The unstable ions hit the quadrupole rods, neutralize, and
are pumped away by the vacuum system. Those ions with a stable trajectory will pass through
the quadrupole and reach the detector. In order to analyze more than one m/z ratio, the DC and
RF can be scanned over time, but the DC-RF ratio always remains constant.
As the ions with stable trajectories exit the quadrupole mass analyzer, they are detected
using an electron multiplier, a common detector used in GC-MS analysis. Ions enter the hornshaped detector and hit the walls of the multiplier tube, which consist of glass doped with lead
allowing the tube to be slightly conductive. A voltage (1.8-2 kV) is applied across the multiplier
tube generating a voltage gradient (1). When the ions strike the surface, secondary electrons are
generated and move towards the higher potentials further into the tube. As the electrons move
further into the detector, they continue to strike the walls of the multiplier tube generating a
cascade of electrons. As a result a signal amplification of approximately 105 to 108 occurs. The
amplified signal is then digitized using an analog-to-digital converter and the data are processed.

36

To Detector
Quadrupole rods

Ions

-

Stable Ion
(Detected)

Unstable Ion
(Not Detected)

Figure 2.2: Diagram of a quadrupole mass analyzer

37

The data generated from GC-MS analysis are a total ion chromatogram (TIC) of the
sample mixture and a mass spectrum for each separated analyte in the sample mixture. The TIC
results in peaks at a range of retention times. As previously mentioned the analytes that elute
from the column first (e.g., analytes with low boiling points and low affinity for the stationary
phase) are detected first, and therefore have a shorter retention time. Analytes with high boiling
points and high affinity for the stationary phase will interact with the column longer and a longer
retention time will result. In forensic laboratories, retention time is used to help identify
unknowns by comparing the retention time of the unknown to the retention times of reference
standards. As retention time will vary from instrument to instrument due to differences such as
column integrity, column length, and oven temperature programs, it is important that reference
standards are analyzed on the same instrument and under the same conditions. However,
retention times are not unique to specific analytes and this information by itself cannot be used as
for definitive identification of an unknown.
With GC-MS analysis, each peak in the chromatogram has its own mass spectrum that
contains peaks based on the m/z ratios present. In addition to retention times, molecular mass
information and the fragmentation patterns of an analyte can be used to aid in identification. If
the molecular ion is present, it is used to determine the molecular mass of the analyte and the
fragmentation pattern is used to determine the structure of the analyte. The mass spectrum of an
analyte is also compared to mass spectra of reference standards and, as the fragmentation pattern
of the analyte is unique under specific instrument conditions, definitive identification is possible.
Using retention time alone, definitive identification is not possible; however, using both
retention time from the chromatographic data and m/z ratios from the mass spectral data

38

definitive identification in forensic analysis is possible. Additionally, these data can be used to
apply multivariate statistical procedures for further analysis.
2.2. Data Pretreatment
Data pretreatment procedures are applied to chromatographic data in attempts to reduce
instrumental variation between samples while preserving chemical differences. Background
subtraction is used to eliminate compounds not characteristic of a given sample, but instead are
introduced during the analysis process. For example, caprolactam a compound originating from
the nylon bags used during the passive-headspace extraction process is eliminated as this
compound does not originate from the simulated fire debris samples. Smoothing is used to help
increase the signal-to-noise (S/N) ratio caused from instrumental noise by reducing background
noise and preserving and amplifying the desired signal. As a result, peaks that do not attribute to
the characteristics of the sample are eliminated and the remaining peaks are smoother with less
noise and an amplified signal.
Retention time alignment is applied to TICs as retention times of the same analyte in
different chromatograms may drift when analyzed over a long period. Over time, the retention
time at which a peak elutes may vary due to minor differences such as GC temperatures,
differences in the chemical composition of the stationary phase because of aging, and differences
in the flow rate of the carrier gas. As a result, differences in retention time could lead to improper
identification of an unknown. Although many alignment algorithms are available, the correlation
optimized warping (COW) algorithm was used in this research. The COW algorithm was used to
align the simulated fire debris TICs to the target TICs of each set of standards (commercial and
class). Ideal target TICs contain a majority of the peaks within the samples being aligned. For
this research, an average TIC was used for alignment. The average TIC was generated using one
39

sample TIC replicate (randomly selected) from each ignitable liquid standard. The abundance of
all selected TICs was averaged at each retention time.
To perform a COW alignment, two parameters are selected for alignment, the segment
size and the slack, or warp. Selecting a segment size determines how many segments the TIC
will be divided. The warp is the number of data points that can be added or subtracted from each
segment of the TIC. The algorithm starts as the end of the chromatogram and interpolation is
used to stretch or compress the segment so that the peaks within the segment of the sample
chromatogram align with the peaks in the same segment of the target chromatogram. Pearson
product-moment correlation (PPMC) coefficients are then used to calculate the correlation
between the sample and target segments. For example, if a warp of two is selected, two data
points can be added or subtracted from the segment; however, it is also possible that one or zero
data points can be added or subtracted from the segment. PPMC coefficients are then calculated
between the sample and target segments using each of the possibilities of a given warp setting.
The optimal warp and segment size are then determined based on which combination gives the
highest PPMC coefficient indicating the highest similarity. The algorithm then moves on to the
next segment and the process is repeated.
Once the TICs are aligned, they are typically normalized to reduce variation in peak
abundance caused by minor differences in volume of sample injected during analysis. There a
variety of different methods for normalization of chromatographic data; however, in this research
constant-sum normalization was utilized. To perform constant-sum normalization, the abundance
of each variable (retention time point) within a TIC is summed to obtain the total abundance or
total area of the TIC. Each individual variable is then divided by the sum of the total area of the

40

corresponding TIC. As a result, replicates of the same sample have peak abundances more
similar to one another as expected from replicate samples.
2.3. Data Analysis
2.3.1. Principal Component Analysis
Principal component analysis (PCA) is an exploratory multivariate statistical procedure
commonly used to associate and discriminate samples within a data set. In this research, PCA is
used to associate simulated fire debris samples back to the corresponding ignitable liquid. PCA
reduces the dimensionality of the data set, which is especially useful in data sets that contain a
large number of variables, as is the case with chromatographic data. Reducing the dimensionality
of the data is beneficial as it allows for the identification of patterns within the data that may not
have originally been apparent due to the large number of variables.
To perform PCA using the pretreated TICs, the covariance matrix of the data is first
calculated where the size of the matrix is based on the number of dimensions in the data set.
During the process of calculating the covariance, the data are also mean centered. The covariance
is the measure of variance between two dimensions; when multiple dimensions are present a
pairwise comparison of each variable is calculated in the form of a covariance matrix. This
allows for the measure of variance between multiple dimensions.
Eigenanalysis of the covariance matrix is performed to calculate eigenvectors and
eigenvalues for the data set. Eigenvectors are unit vectors that produce a multiple of the vector
that results from the product of the original vector and calculated covariance. The maximum
number of eigenvectors that can be calculated is equivalent to the maximum number of
dimensions in a given data set. Eigenvalues represent the variance that a particular eigenvector
describes and are the values by which the eigenvector was originally multiplied.
41

In order to calculate the eigenvectors and eigenvalues the covariance matrix must be
square. Based on eigenvectors and eigenvalues, principal components (PC) are derived, where
the eigenvector with the largest eigenvalue describes the most variance known as the first
principal component (PC1). The second principal component (PC2) describes the next greatest
variance and is positioned orthogonally to PC1, and so on.
PCA generates two main outputs: loadings and scores plots. For chromatographic data,
the loadings plot can be generated by plotting the eigenvector for a given PC versus retention
time. The loadings plot describes the variables (compounds) contributing to the variance
described by the PC and the retention times can be used to identify the variables. The scores plot
is a scatter plot that represents the association and discrimination of the samples. The score for a
sample on a given PC is the sum of the product of the mean-centered data for the sample and the
relevant eigenvector. Samples that are chemically similar will be positioned closely on the scores
plot and those that are chemically different will be separated from one another on the scores plot.
Further, the loadings plots can be used to explain the positioning of the samples on the scores
plot
2.3.2. Euclidean Distance
Euclidean distance is the distance between two given data points in a multidimensional
space. In this research, Euclidean distance was utilized to measure the distance between the
scores of sample pairs based on multiple PCs. Euclidean distance (d) is calculated using
Equation 2.1

√∑( ̅

42

̅)

where, ̅ represents the average score of sample x on a given dimension i, ̅̅̅ represents the
average score of sample y on a given dimension i, and n represents the total number of
dimensions. In this research, x represents the average score of the simulated fire debris and y
represents the average score of an ignitable liquid standard. Subscripts indicate how many PCs
are being used; additional PCs can be accounted for as desired. In this research, the number of
PCs used was based on the number of PCs that accounted for at least 95% of the variance in the
data set.
2.3.3. Hierarchical Cluster Analysis
Hierarchical cluster analysis (HCA) is an exploratory multivariate statistical procedure
that generates a hierarchy of clusters based on the similarity of samples. HCA can be performed
on a complex data set in order to observe patterns of similarity within the data. In this research,
agglomerative HCA was used to cluster samples. Using this type of clustering, each sample starts
as its own individual cluster. The individual clusters are then grouped to the sample it is most
similar to resulting in a cluster of two samples. Clustering of the samples continues until all
samples are grouped into one cluster.
To cluster samples, a distance metric is used to measure the distance between each
individual sample. In this research, Euclidean distance was used. In the first iteration of HCA,
Euclidean distances are calculated between all individual samples in multidimensional space,
resulting in a distance matrix. The two samples with the shortest distance, which indicates the
greatest similarity, are clustered together. In the next iteration, Euclidean distances are again
calculated but now, the distance between groups containing more than one sample must be
calculated.

43

The method by which this distance is calculated varies depending on the linkage method
used. In this research, the single linkage method was used in which the Euclidean distance is
calculated between the two nearest neighbors in the two groups being considered. This is
illustrated in Figure 2.3 which depicts a sample (red square) that can be clustered to class A (blue
circles) or class B (green circles). Euclidean distances (dA and dB) are calculated between the
sample and each class in multidimensional space. The shortest distance or nearest neighbor to the
sample is the class in which the sample will be clustered; therefore, the sample is clustered to
class A. The process repeats until all samples are members of a single cluster.
A dendrogram is the resulting output generated which includes a similarity level or
percent similarity. The similarity level is calculated by dividing the Euclidean distance of a given
sample by the maximum Euclidean distance in the data set and subtracting from one. The greater
the similarity level, the more similar the samples are.
2.3.4. k-Nearest Neighbors
k-Nearest neighbors (k-NN) is an example of a hard classification method. Using this
method, samples are placed into a defined class based on the measured similarity to that class.
Hard classification methods place a sample into one class and one class only. As a result,
classification is forced so even if there are no classes representative of the sample, the sample
will still be classified to a class.
In k-NN, defined classes consist of samples of known origin that form the training set. In
this research, a set of standards are used for the training set and are placed into defined classes. A
sample is then projected into multidimensional space and Euclidean distance is used to determine

44

Class A
dA

dB
Class B

Samples to be clustered

Figure 2.3: Diagram depicting Euclidean distance and the single-linkage process used during
HCA clustering

45

the distance between the sample and each of the individual defined classes in multidimensional
space.
A maximum of nearest neighbors (k) is selected to determine which class the sample will
be placed. In k-NN, k is a user-defined value. Typically, an odd number of nearest neighbors is
selected to avoid classification ties. k-NN can be performed using a range of k values and the
total number of misclassifications that occurs at each k value can be determined to assist with
selecting an optimal value for k. The class with the majority of standards closest to the sample
(based on the number of nearest neighbors selected) will be the class in which the sample is
placed. Classification may differ based on the number of nearest neighbors selected. For
example, Figure 2.4 depicts a sample to be classified (red square) to class A (blue circles) or
class B (green circles). If three nearest neighbors (k=3) are selected, illustrated by the inner
circle, the sample would be classified into class A. If five nearest neighbors (k=5) are selected,
illustrated by the outer circle, the sample would be classified into class B.
Although classification of a sample is forced, the classification fit can be analyzed to
determine how similar the sample is to that class. Class fit can be calculated for each sample
(class or projected) by subtracting the smallest distance in each class from an individual distance
and dividing by the standard deviation. Each sample has its own calculated class fit that can be
compared to a given threshold. If the sample falls within the given threshold, it is considered a
good fit. This can be used to determine if the standards in a given class contain any outliers or if
the projected samples fit well under the given classification.

46

Class A

k=3
Sample to be classified
Class B
k=5

Figure 2.4: Diagram depicting k-NN classification based on the number nearest neighbors
selected

47

REFERENCES

48

REFERENCES

1. Skoog DA, Holler FJ, Crouch SR, Principals of instrumental analysis. 6th edition.
Belmon, CA: Thompson, 2007.

49

3. Materials and Methods
3.1. Commercial Ignitable Liquid Standards
The commercial ignitable liquids used were available in the laboratory and were obtained
from gas stations and local stores. These standards were comprised of ignitable liquids from the
gasoline, isoparaffinic, and petroleum distillate ASTM classes. Ignitable liquids from the
gasoline class included three different gasoline samples collected from the East Lansing area
(Meijer, British Petroleum, and Marathon) during late 2009 and early 2010. The isoparaffinic
products were comprised of odorless paint thinner (Sunnyside Corp., Wheeling, IL) and
upholstery protector (Scotch Gard™, 3M Protective Materials and Consumer Health Care
Division, St. Paul, MN) and the petroleum distillate class included diesel (Mobil), kerosene
(Meijer), fuel injector (STP® Products Co., Oakland CA), charcoal lighter (ACE® Hardware
Corp., Oak Brook, IL), and torch fuel (Tiki®, Menomonee Fall, WI). The neat commercial
ignitable liquids were diluted 1:10 (v/v) in dichloromethane (Honeywell International Inc.,
Morristown, NJ) for a passive-headspace extraction followed by GC-MS analysis.
3.2. Class Reference Standards
The class reference standards included a gasoline standard, a medium petroleum distillate
standard, and a heavy petroleum distillate standard. Each standard was comprised of compounds
characteristic of each respective chemical class and all compounds for the standards were
available in the laboratory. The gasoline standard included toluene (MCB Manufacturing
Chemists, Inc., Cincinnati, OH), ethylbenzene, m-xylene, o-xylene, propylbenzene, and 1, 2, 4trimethylbenzene (all from Aldrich Chemical Company, Inc., Milwaukee, WI). The medium
petroleum distillate standard included the normal alkanes octane, dodecane, tridecane (all from
Aldrich Chemical Company, Inc., Milwaukee, WI), nonane, decane (Alfa Aesar, Ward Hill,
50

MA), and undecane (ACROS, NJ). The heavy petroleum distillate standard included the normal
alkanes octane, dodecane, tridecane, tetradecane (all from Aldrich Chemical Company, Inc.,
Milwaukee, WI), nonane, decane, pentadecane, hexadecane, heptadecane, nonadecane, eicosane
(all from Alfa Aesar, Ward Hill, MA), undecane (ACROS, NJ), and octadecane (Sigma
Chemical Co., St. Louis, MO).
The class reference standards were prepared in dichloromethane with a final volume of
15 mL. Compound volumes for each standard were aliquoted so that the ratios and abundances
of each compound were similar to those present in each respective commercial ignitable liquid
standard (Table 3.1 and Table 3.2). Specifically, for the gasoline standard, a 1:3:2 ratio for the
C2-alkylbenzenes and a 1:3 ratio for the C3-alkylbenzenes, propylbenzene and 1,2,4trimethylbenzene, was desired. A distribution of normal alkanes similar to the distribution in
petroleum distillates was desired for the petroleum distillate class reference standards. Once
prepared, the class reference standards were passive-headspace extracted in triplicate and
analyzed by GC-MS in triplicate using the same procedures as used for the commercial ignitable
liquid standards.
3.3. Preparation of Simulated Fire Debris
Simulated fire debris samples were prepared for two substrates: red oak flooring treated
with a golden oak finish (WATCO™ Danish oil, Rust-oleum® Corporation, Vernon Hills, IL)
and nylon carpet with carpet padding (source unknown). Prior to preparing simulated fire debris
samples, it was necessary to determine the appropriate time to burn each substrate for, as well as
the volume of ignitable liquid to spike onto each sample.

51

Table 3.1: Composition of the gasoline class reference standard prepared in 15 mL of
dichloromethane
Compound
Toluene
Ethylbenzene
m-xylene
o-xylene
Propylbenzene
1, 2, 4- trimethylbenzene

Volume (µL)
25
10
40
35
15
60

Table 3.2: Composition of the medium and heavy petroleum distillate class reference standards
prepared in 15 mL of dichloromethane
Compound
Octane
Nonane
Decane
Undecane
Dodecane
Tridecane
Tetradecane
Pentadecane
Hexadecane
Heptadecane
Octadecane
Nonadecane
Eicosane

Volume/Mass
Medium Petroleum Distillate
Heavy Petroleum Distillate
10 µL
3.3 µL
15 µL
5 µL
20 µL
6.6 µL
20 µL
6.6 µL
15 µL
8.3 µL
10 µL
10 µL
N/A
11.6 µL
N/A
10 µL
N/A
8.3 µL
N/A
6.6 µL
N/A
0.0040 g
N/A
0.0025 g
N/A
0.0018 g

52

3.3.1. Burn Study
The burn study was carried out by burning treated wood samples using a propane torch
for 30 and 60 seconds with only direct flame to char the surface. Nylon carpet with carpet
padding samples were burned for 30 seconds with direct flame and allowed to burn for an
additional 90 seconds before being extinguished using an overturned beaker. Each sample was
then passive-headspace extracted in triplicate and analyzed by GC-MS in triplicate.
An appropriate burn time for the wood samples was selected based on the presence and
abundance of interference compounds from both the wood and the treatment. A high abundance
of interference compounds was desired so that the simulated fire debris would closely resemble
typical fire debris evidence; as a result, a 30-second burn time for the wood samples was
selected. The burn time selected for the carpet samples was based on a previous burn study and
was analyzed to confirm the burn time was sufficient. Characteristic compounds from the carpet
were present and the burn time was used for the remainder of the project.
3.3.2. Spike Volume Study
The spike volume study was carried out by spiking a range of volumes of each of the four
commercial ignitable liquids (gasoline, paint thinner, diesel, and torch fuel) on the two
substrates. All ignitable liquids were diluted 1:10 (v/v) in dichloromethane prior to spiking onto
each substrate.
For gasoline (Meijer), 100 and 125 µL were spiked onto the carpet samples and 100, 125,
150, 200, 225, and 250 µL were spiked onto the treated wood samples. For paint thinner, 125 µL
was spiked onto the carpet samples and 75, 125, and 150 µL were spiked onto the treated wood
samples. For diesel, 125 and 175 µL were spiked onto the carpet samples and 50 and 75 µL were
spiked onto the treated wood samples. For torch fuel, 50, 75, 100 and 115 µL were spiked onto
53

the carpet samples and 50, 75, 100, and 125 µL onto the treated wood samples. Each sample was
then passive-headspace extracted in triplicate and analyzed by GC-MS in triplicate.
Appropriate spike volumes were selected based on the abundance of the interference
compounds relative to the ignitable liquid compounds. Compounds of the commercial ignitable
liquid were present but at a lower abundance compared to the interference compounds.
3.3.3. Simulated Fire Debris Samples
The final ignitable liquid spike volumes used for each substrate are listed in Table 3.3.
Fire debris samples were prepared by spiking the substrate (4 × 4 cm2) using the appropriate
spike volume then burning for the appropriate time and placing each into separate nylon bags for
passive-headspace extraction followed by GC-MS analysis.
3.4. Passive-Headspace Extraction
For passive-headspace extraction of all sets of standards, a 20 µL aliquot was spiked onto
individual 4 × 4 cm2 Kimwipes™ (Kimberly-Clark Global Sales, LLC, Roswell, GA), which
were subsequently placed into a nylon bag (Grand River Products, LLC, Grosse Pointe Farms,
MI) with a suspended activated carbon strip (Albrayco Technologies, Inc., Cromwell, CT) and
sealed using masking tape. For passive-headspace extraction of the simulated fire debris samples,
the samples were placed directly into a nylon bag with a suspended activated carbon strip and
sealed. The passive-headspace extraction was performed at 80 °C for 4 h. Following extraction,
the carbon strips were eluted with 200 µL of dichloromethane and analyzed by GC-MS. All
ignitable liquid standards and simulated fire debris samples were extracted in triplicate.

54

Table 3.3: Spike volumes used for simulated fire debris samples with respect to each commercial
ignitable liquid and substrate
Ignitable Liquid

Volume Spiked onto Carpet
Substrate (µL)

Volume Spiked onto Treated
Wood Substrate (µL)

Gasoline

100

N/A

Paint thinner

125

125

Diesel

175

75

Torch fuel

100

50

N/A: An appropriate spike volume was not determined and further analysis was not completed
due to contamination of the instrument from the wood treatment.

55

3.5. GC-MS Analysis
An Agilent 6890N gas chromatograph coupled to an Agilent 5975 mass spectrometer
(Agilent Technologies, Santa Clara, CA) with an auto sampler containing a 10 µL Hamilton
syringe was used. The column was a 30.0 m x 0.25 mm x 0.25 µm Agilent capillary column
comprised of 5% phenyl methyl siloxane (HP-5). The inlet temperature was set to 250 °C with a
pulsed splitless injection of 15.0 psi for 0.25 minutes. Helium gas with a nominal flow rate of
1.0 mL/min was used to carry 1 µL of injected sample into the column and an oven temperature
program of 40 °C for 3 minutes, followed by a 10 °C ramp per minute to 280 °C with a 4 minute
hold was used. The transfer line was set to 280 °C and electron ionization (70eV) was used. The
mass spectrometer was set to 2.91 scans/s over a mass range of 50-500u.
3.6. Data Pretreatment
All total ion chromatograms (TICs) were caprolactam background subtracted and
smoothed using functions available in Agilent ChemStation software (version E01.02.16).
Simulated fire debris samples were retention time aligned to the appropriate reference standards
(commercial or class) using a correlation optimized warping algorithm available in the data
analysis software (The Unscrambler® X, version 10.2, Camo Software Inc., Woodbridge, NJ).
The target used for alignment to the commercial standards was an average TIC of all commercial
ignitable liquid standards and the target used for alignment to the class reference standards was
an average TIC of all class standards. Alignment to both the commercial and class standards was
performed using a segment size of 125 data points and a warp of 11 data points. All
chromatograms were then constant-sum normalized in Microsoft Excel (Microsoft Office
Professional Plus 2010 version 14.0.7116.5000, Microsoft Corp., Redmond, WA) before data
analysis.
56

3.7. Data Analysis
3.7.1. Principal Components Analysis
Principal components analysis was performed on the TICs of three different sets of
standards using The Unscrambler® X. The first data set was referred to as the chemically diverse
data set and was comprised of the three commercial gasoline standards, six commercial
petroleum distillate, and two isoparaffinic products. The second data set was referred to as the
refined data and was comprised of the three gasoline standards and four petroleum distillate
standards. The final data set was referred to as the class reference data set and consisted of the
class reference standards. Scores for each standard were generated and plotted as a scatter plot
using Microsoft Excel. Loadings plots of each significant principal component were also plotted
using Microsoft Excel. Scores for the simulated fire debris samples were generated by
multiplying the mean-centered data for the debris sample by the eigenvectors for the first
principal component (PC1) and summing the product. Scores for additional principal
components were calculated similarly using the respective eigenvectors. The scores of the
simulated fire debris samples were then projected onto the scores plots generated for the
commercial liquid standards and the class reference standards.
3.7.2. Euclidean Distance
Euclidean distances were calculated between the scores for the simulated fire debris
samples and the scores for the commercial standards and class reference standards to evaluate
association. Distances were calculated in Microsoft Excel using equation (Equation 2.1). The
number of principal components used to calculate the Euclidean distance was based on the
number of principal components that accounted for at least 95% of the variance in the data set.
57

3.7.3. Hierarchical Cluster Analysis
Agglomerative hierarchical cluster analysis (HCA) was performed on each of the three
data sets and simulated fire debris samples in Pirouette® (version 4.0, Infometrix Software, Inc.,
Bothell, WA),using the Euclidean distance and single linkage methods. Resulting dendrograms
were assessed for degree of similarity between fire debris samples and commercial standards and
between fire debris and class reference standards.
3.7.4. k-Nearest Neighbors
k-Nearest Neighbors (k-NN) was performed on each of the three standard data sets and
simulated fire debris in Pirouette®. HCA was performed using the method above in order to
assign known classes of ignitable liquids. Once the classes were defined, the k-NN algorithm was
performed on the ignitable liquid standards. A prediction was then performed on the simulated
fire debris in order to classify the samples to an ignitable liquid class. Predictions were made
using 1, 3, 5, 7, and 9 nearest neighbors.

58

4. Investigation of Class Reference Standards for Association of Fire Debris to ASTM Class
using Principal Components Analysis
4.1. Introduction
Principal components analysis (PCA) has previously been utilized to associate simulated
fire debris samples back to the commercial ignitable liquid used as an accelerant. While PCA has
shown potential for this association, it can be limited due to the variety of commercial ignitable
liquids on the market. It would not be feasible for a forensic laboratory to analyze every ignitable
liquid on the market to include in a data set. PCA may also be considered limiting because the
interpretation of the scores plot remains subjective. Furthermore, the diversity within a data set
can influence the success of association.
This chapter demonstrates the limitations of associating simulated fire debris back to the
corresponding commercial ignitable liquid using PCA and highlights the need for a more
standardized approach. As an alternative, the utility of class reference standards for association
has been investigated. By generating class reference standards that are representative of an
ASTM class as a whole, the number of standards required for an in-house data set in a forensic
laboratory may substantially be reduced. In addition, having standards more representative of
each class could potentially account for the high variability in chemical composition observed
within an ASTM class.
In this chapter, the subjectivity of visual interpretation of the scores plots as well as
limitations of diversity within a commercial ignitable liquid data set were also investigated. To
address the limitations of subjectivity, Euclidean distances were used to quantitatively assess the
extent of association between the ignitable liquid standards and the simulated fire debris in the
59

PCA scores plot. To examine the limitations of diversity and the impact of data set selection, two
commercial ignitable liquid data sets were compared. One data set was more chemically diverse
in nature and consisted of gasoline, petroleum distillate, and isoparaffinic standards and the
second data set was a refined data set and consisted of only gasoline and petroleum distillate
standards.
To investigate association to an ASTM class, rather than a specific commercial ignitable
liquid, using multivariate statistical procedures, class reference standards for two different
ASTM classes and simulated fire debris samples were generated and used. Common household
items such as treated red oak flooring and nylon carpet with carpet padding were used as
substrates. Appropriate burn times for each substrate and appropriate spike volumes for each
commercial ignitable liquid were determined. Burn times and spike volumes were selected so
that the fire debris was not readily identifiable as the ignitable liquid used and so that sufficient
interference compounds from each substrate were observed. Common commercial ignitable
liquids such as gasoline, diesel, paint thinner, and torch fuel were used to generate the simulated
fire debris samples.
4.2. Commercially Available Standards and Corresponding Class Reference Standards
4.2.1. Gasoline
Total ion chromatograms (TICs) of a representative commercial gasoline and the gasoline
class reference standard are shown in Figure 4.1. The commercial gasoline (Figure 4.1A)
contains toluene, C2-alkylbenzenes (e.g., ethylbenzene, m-xylene, o-xylene), C3-alkylbenzenes
(e.g., propylbenzene, and 1,2,4-trimethylbenzene), and C4-alkylbenzenes (e.g., 1,2,4,5-

60

Normalized Abundance

A

C2-alkylbenzenes
C3-alkylbenzenes

Toluene

C4-alkylbenzenes

Napthalenes
0

10

20

30

20

30

Retention Time (min)

Normalized Abundance

B

0

C3-alkylbenzenes

C2-alkylbenzenes

Toluene

10

Retention Time (min)

Figure 4.1 Representative total ion chromatograms of A) commercial gasoline standard and
B) gasoline class reference standard with characteristic compounds identified

61

tetramethylbenzene) and some naphthalene compounds. Compounds for the class reference
standard were chosen based on the characteristic compounds of gasoline and were prepared in
similar abundance and ratios to the commercial gasoline. The gasoline class reference standard
contains toluene, ethylbenzene, m-xylene, o-xylene, propylbenzene, and 1,2,4-trimethylbenzene
and a representative TIC is shown in Figure 4.1B.
4.2.2. Medium Petroleum Distillate
Total ion chromatograms of a representative commercial torch fuel and the medium
petroleum distillate (MPD) class reference standard are shown in Figure 4.2. The commercial
torch fuel (Figure 4.2A) is classed as a medium petroleum distillate due to the presence of
normal alkanes (C11-C14) with some isoparaffinic, cycloparaffinic, and aromatic compounds.
Compounds for the class reference standard were determined based on characteristic compounds
that dominated the TIC and were prepared in a similar distribution to most petroleum distillates.
The MPD class reference standard contains C8-C13 and a representative TIC is shown in
Figure 4.2B.
4.2.3. Heavy Petroleum Distillate
Total ion chromatograms of a representative commercial diesel and the heavy petroleum
distillate (HPD) class reference standard are shown in Figure 4.3. The commercial diesel (Figure
4.3A) is classed as a heavy petroleum distillate and contains normal alkanes (C12-C20) with some
isoparaffinic, cycloparaffinic, and aromatic compounds. Compounds for the class reference
standard were determined based on characteristic compounds that dominated the TIC and were
prepared in a similar distribution. The HPD class contains C8-C20 and a representative TIC is
show in Figure 4.3B.

62

C13

Normalized Abundance

A

C12

C14

C11

0

10

20

30

20

30

Retention Time (min)

B
Normalized Abundance

C11
C10

C12
C13

C9

C8
0

10

Retention Time (min)

Figure 4.2: Representative total ion chromatograms of A) commercial torch fuel standard and
B) medium petroleum distillate class reference standard with characteristic compounds identified

63

Normalized Abundance

A
C14

C15
C16

C13
C17
C12

C18
C19

0

10

C20

20

30

Retention Time (min)
C14

B
Normalized Abundance

C13

C15

C16

C12
C11

C17

C10
C18

C9

C19

C8
0

10

20

30

Retention Time (min)
Figure 4.3: Representative total ion chromatograms of A) commercial diesel standard and
B) heavy petroleum distillate class reference standard with characteristic compounds identified

64

4.3. Determination of Substrate Burn Times
Burn times were determined for each of two substrates in order to maximize the
abundance of interference compounds. Representative TICs of the treated red oak flooring
substrate are shown in Figure 4.4. The 30-second burn time (Figure 4.4A) consisted of a high
abundance of interference compounds from the wood treatment, specifically C9-C12
corresponding to a medium petroleum distillate. The 60-second burn time (Figure 4.4B)
consisted of a lower abundance of interference compounds from the wood treatment, but
included compounds from the wood such as benzaldehyde, and benzophenone. As the treatment
resembled a petroleum distillate, a burn time of 30 seconds was used in order to make
association more challenging, similar to typical fire debris evidence an analyst may receive.
A representative total ion chromatogram of the 120-second burn time of the nylon carpet
is shown in Figure 4.5. The 120-second burn time was optimized in a previous study in the
laboratory. Compounds characteristic to the burned carpet included styrene (retention time (tR):
7.167 minutes), 1,2,3-trichloropropane (tR: 7.705 minutes), and biphenyl (tR: 15.097 minutes).
These compounds originate from the carpet backing, padding, and adhesive used for the
production of carpet. Interference compounds from the carpet and carpet padding substrate have
similar retention times to some of the ignitable liquids used which could result in coelution
making interpretation more challenging. For example, o-xylene from gasoline has a retention
time of 7.241 and has a tendency to coelute with styrene from the substrate.

65

A

Abundance

C10 C
11

C12

C9
0

10

20

30

Retention Time (min)

Abundance

B

C10

C11

Benzaldehyde

0

Benzophenone

C12

C9
10

20

30

Retention Time (min)
Figure 4.4: Representative total ion chromatograms of A) 30-second burn time of the treated red
oak flooring substrate and B) 60-second burn time of the treated red oak flooring substrate with
characteristic compounds identified. Compounds from the wood treatment are indicated in red.

66

Styrene

Abundance

1,2,3-trichloropropane

0

Biphenyl

10

20

30

Retention Time (min)

Figure 4.5: Representative total ion chromatogram of 120-second burn time of nylon carpet with
carpet padding with characteristic compounds indicated

67

4.4. Simulated Fire Debris Samples
The simulated fire debris containing treated red oak flooring spiked with commercial
diesel (Figure 4.6A) has characteristics similar to a petroleum distillate, but due to the
interference compounds from the surface treatment, the sample is not readily identified as diesel
(Figure 4.6B). For example, the diesel used in this research has a unimodal distribution of normal
alkanes with C15 as the maximum peak; however, the simulated fire debris containing diesel has
a bimodal distribution with C11 and C13 as the maximum peaks. In the unimodal distribution, a
rise in the baseline is observed between approximately tR 12.00 and 21.00 minutes, while two
rises in the baseline of the bimodal distribution are observed from approximately tR 8.00 to 12.00
minutes and tR 12.00 to 21.00 minutes. This change in abundances and distributions is a result of
the interference compounds from the wood treatment causing identification based on visual
assessment of the chromatograms to become more challenging.
An example of a simulated fire debris TIC for nylon carpet containing commercial diesel
is shown in Figure 4.7. The simulated fire debris containing nylon carpet with carpet padding
spiked with commercial diesel (Figure 4.7A) contains the typical distributions of normal alkanes
in a petroleum distillate such as diesel (Figure 4.7B). However, the rise in the baseline between
approximately tR 12.00 and 21.00 minutes is significantly reduced due to the low abundance of
ignitable liquid and significantly higher abundance of interference compounds from the carpet
indicating that the substrate is dominating the sample.

68

A

C13 C14

Normalized Abundance

C11
*C12

C15

C10

C16

C17

C9
0

10

20

30

Retention Time (min)

Normalized Abundance

B
C14

C15
C16

C13
C17
C12

C18
C19

0

10

20

C20
30

Retention Time (min)

Figure 4.6: Representative total ion chromatograms for A) treated red oak flooring spiked with
75 µL of commercial diesel with substrate interferences indicated in red and B) commercial
diesel ignitable liquid standard. *C12 originates from both wood treatment and diesel

69

Biphenyl

Normalized Abundance

A

Styrene
C12

C13
C14
C15

C11

C16
0

10

20

30

Retention Time (min)

Normalized Abundance

B
C14

C15
C16

C13
C17
C12

C18
C19

0

10

20

C20
30

Retention Time (min)

Figure 4.7: Representative total ion chromatogram of A) nylon carpet with carpet padding spiked
with 175 µL of commercial diesel with substrate interferences indicated in red and B)
commercial diesel ignitable liquid standard

70

4.5. Association and Discrimination of Simulated Fire Debris using PCA
Principal components analysis was performed using three data sets. The first data set was
a chemically diverse data set and contained all eleven commercial ignitable liquid standards
consisting of three commercial gasoline standards, six petroleum distillates, two isoparaffinic
products, and simulated fire debris samples. The second data set was a refined data set and
contained seven commercial ignitable liquid standards consisting of the three commercial
gasoline standards, four petroleum distillate standards, and the simulated fire debris. The
chemically diverse and refined data sets were used to demonstrate the limitations of PCA with
commercial ignitable liquids, investigate the effects of chemical diversity within a data set, and
demonstrate the impact of data set selection for successful association and discrimination. The
third data set was the class reference data set and contained the three class reference standards
and the simulated fire debris. The class reference data set was used to investigate the utility of
class reference standards as an alternative to the commercial ignitable liquids for PCA
association and discrimination.
For each data set, PCA was performed initially on the standards alone and scores for the
simulated fire debris samples were calculated from the resulting eigenvectors and then projected
onto the scores plot. The simulated fire debris samples for each substrate were projected
separately to realistically represent how PCA may be applied in the analysis of fire debris in a
forensic laboratory. Projecting scores for the fire debris reduced the influence of interference
compounds on the positioning of the samples on the scores plot as only the variables accounting
for the characteristic compounds of the ignitable liquid were considered.

71

4.5.1. Commercial Ignitable Liquid Standards – Chemically Diverse Data Set
Principal components analysis was first performed on the chemically diverse data set
with only the gasoline, petroleum distillate, and isoparaffinic commercial standards. The scores
plot is shown in Figure 4.8. The scores plot illustrates principal component one (PC1) versus
principal component two (PC2) where PC1 accounts for 32.5% of the variance within the data
set and PC2 accounts for 24.4% of the variance. The gasoline standards, positioned positively,
and upholstery protector, positioned negatively, are distinguished along PC1, and all other
standards are positioned approximately at zero. Along PC2, the gasoline standards and the
upholstery protector are positioned negatively and distinguished from all other standards, which
are positioned slightly positive.
The positioning of the standards on the scores plot can be explained using the loadings
plots for PC1 and PC2 (Figure 4.9). Compounds characteristic of gasoline are weighted
positively on the PC1 loadings plot (Figure 4.9A) and compounds characteristic of the upholstery
protector are weighted negatively. As a result, all gasoline standards are positioned positively on
PC1 in the scores plot and are distinguished from the upholstery protector standard, which are
positioned negatively on the scores plot. All petroleum distillate standards are positioned close to
zero on PC1, as the characteristic compounds of those standards such as the heavier alkanes (C15C20) are less volatile and less variable and as a result, are not described by PC1. The majority of
the variance along PC1 originates from the chemical differences of the compounds present in
upholstery protector (tR: 3.00 to 8.00 minutes) from all other standards which contain compounds
that do not begin eluting until much later (8.00 to 20.00 minutes).

72

0.15

PC 2 (24.4%)

-0.15

0.15

-0.15

PC 1 (32.5%)

Gas A

Gas B

Gas C

Paint Thinner

Upholstery Protector

Diesel

Kerosene

Fuel Injector

Torch Fuel

Charcoal Lighter

Figure 4.8: PCA scores plot of PC1 (32.5%) verses PC2 (24.4%) for the chemically diverse data
set with commercial standards only

73

Normalized Abundance

A

C3-alkylbenzenes
C2-alkylbenzenes
C4-alkylbenzenes
Toluene

0

10

20

30

20

30

Branched C5-C7

Retention Time (min)

Normalized Abundance

B

C11

Toluene

C12 C13

Branched C7-C12
C14

0

C15 C
16

10

C4-alkylbenzenes
Branched C5-C7
C2-alkylbenzenes
C3-alkylbenzenes

Retention Time (min)

Figure 4.9: Loadings plots for the chemically diverse data set with A) PC1 representing 32.5% of
the variance and B) PC2 representing 24.4% of the variance

74

Compounds characteristic of paint thinner (branched C7-C12) and the petroleum distillate
standards (C11-C16) are weighted positively on the PC2 loadings plot (Figure 4.9B). While
compounds characteristic of upholstery protector (branched C5-C7) and the gasoline standards
(C2, C3, and C4-alkylbenzenes) are weighted negatively on the PC2 loadings plot. Therefore, the
paint thinner and petroleum distillate standards are positioned positively on PC2 in the scores
plot and are distinguished from the upholstery protector and gasoline standards, which are
positioned negatively on PC2 in the scores plot.
In addition, paint thinner and upholstery protector are both classified as isoparaffinic
products due to the presence of branched alkanes, and cyclic alkanes. However, the two samples
are significantly different where upholstery protector contains branched alkanes ranging from
C5-C12 and paint thinner contains branched alkanes ranging from C7-C12. Chemical differences
within the isoparaffinic product class results in paint thinner associating closely to the petroleum
distillates while upholstery protector is distinguished from all other standards.
In order to differentiate the standards positioned close to zero, additional principal
components would need to be investigated; however, distinguishing standards within a class was
not the purpose of this research and additional principal components were not investigated. The
gasoline standards are not distinguished from one another because there is little chemical
variation within the three gasoline standards; however, there is more variability within the
gasoline standard replicates due to the volatility of compounds present in gasoline such as
toluene, ethylbenzene, and m-xylene. Furthermore, the remainder of the scores plots in this
chapter can be explained similarly using the corresponding loadings plots.
Next, one set of simulated fire debris samples containing carpet spiked with diesel were

75

projected onto the chemically diverse data set scores plot as shown in Figure 4.10. From visual
assessment of the scores plot, the fire debris samples are positioned approximately at zero and
are positioned closely to the petroleum distillate standards, specifically the diesel, kerosene, fuel
injector, and charcoal lighter standards, all of which are classified as heavy petroleum distillates
with the exception of charcoal lighter.
Euclidean distances were calculated between the simulated fire debris scores and each of
the ignitable liquid standard scores to quantitatively assess association on the scores plot. A short
Euclidean distance indicates that the samples are more similar, while a longer Euclidean distance
indicates that the two samples are dissimilar. The Euclidean distances between the simulated fire
debris scores and the ignitable liquid scores for the chemically diverse data set, based on seven
PCs are shown in Table 4.1
The shortest Euclidean distance was calculated between the fire debris and the kerosene
standard, rather than the corresponding diesel standard. The calculated Euclidean distance
between kerosene standard and the fire debris was 0.01572 and the next closest was the diesel
standard with a distance of 0.02554.
Visually the simulated fire debris associates to all three heavy petroleum distillate
standards, but quantitatively the fire debris is most closely associated to the kerosene standard.
After the burning process and introduction of interference compounds, the fire debris samples
contain a normal alkane range more similar to kerosene than diesel. For example, the commercial
diesel standard contains normal alkanes in the range C12-C20, while the commercial kerosene
standard and simulated fire debris contain alkanes in the range C11-C16. Overall, having a more

76

0.15

PC 2 (24.4%)

-0.15

0.15

-0.15
Gas A

Gas B

Upholstery Protector
Torch Fuel

PC 1 (32.5%)
Gas C

Paint Thinner

Diesel

Kerosene

Fuel Injector

Charcoal Lighter

Lamp Oil

Fire Debris

Figure 4.10: PCA scores plot of PC1 (32.5%) verses PC2 (24.4%) for the chemically diverse data
set with commercial standards and simulated fire debris projected

77

Table 4.1: Euclidean distances between fire debris scores and ignitable liquid standard scores for
the chemically diverse data set
Ignitable Liquid Standard
Gasoline A

Chemically Diverse Data Set
0.08718

Gasoline B

0.07551

Gasoline C

0.08032

Diesel

0.02554

Fuel Injector

0.02871

Torch Fuel

0.05890

Charcoal Lighter

0.06414

Kerosene

0.01572

Lamp Oil

0.08662

Paint Thinner

0.1194

Upholstery Protector

0.1455

78

chemically diverse data set can make both visual association challenging and can influence
association.
Using calculated Euclidean distances can help determine which standard the fire debris is
most closely associated to when visual association becomes challenging. Euclidean distances are
also beneficial because using this calculation, multiple dimensions can be considered at once,
while only two or three dimensions are visualized at a time in a scores plot. For example, the
simulated fire debris appears to be positioned similar distance from lamp oil, a medium
petroleum distillate, when compared to paint thinner, an isoparaffinic. However, the calculated
Euclidean distances (based on seven PCs) indicate that the fire debris samples are positioned
more closely to the lamp oil (0.08662) than the paint thinner (0.1194) which may not be initially
apparent.
Euclidean distances are nevertheless still limiting, as selecting how many PCs to use
could be considered subjective. Selecting a different number of PCs could result in association to
different standards but will increase the ability to discriminate from other standards, as additional
discriminatory information is included. Distances do not typically differ by orders of magnitude
unless they are substantially different. As there is no distinction between a ‘short’ and ‘long’
distance, an indiscriminate interpretation of the calculated distances could occur.
In this iteration of PCA, the fire debris samples do not correctly associate to the specific
ignitable liquid, but association to the corresponding chemical class is possible. Association of
all other simulated fire debris samples containing petroleum distillates spiked onto both nylon
carpet with carpet padding and treated wood flooring were properly associated to the petroleum
distillate class, but not necessarily to the specific ignitable liquid standard. Fire debris samples

79

containing paint thinner on nylon carpet were improperly associated to the diesel standard;
however, these samples were positioned closed to zero indicating that they are not largely
influenced by the variance in the data set. Although commercial paint thinner was included in the
data set, loss of characteristic compounds through the burning process resulted in improper
association. Fire debris samples containing paint thinner spiked onto treated wood flooring were
improperly associated to charcoal lighter. Improper association occurred due to the addition of
normal alkanes (C9-C12) from the wood treatment that are similar to charcoal lighter. Fire debris
samples consisting of nylon carpet spiked with gasoline associated to the diesel standard.
However, due to extensive evaporation, few compounds characteristic of gasoline were present.
4.5.2. Commercial Ignitable Liquid Standards – Refined Data Set
Principal components analysis was performed on the refined data set with the simulated
fire debris containing carpet spiked with diesel projected and the resulting scores plot is shown in
Figure 4.11. The refined data set was used to investigate the impact of data set selection on the
success of association and discrimination. Standards selected for this data set was based from
common ignitable liquids found in fire debris and was selected to correlate with the class
reference data set. In this refined data set, the gasoline standards, positioned positively, and
petroleum distillate standards, positioned negatively, are distinguished along PC1. Along PC2,
the ignitable liquids within the petroleum distillate class are distinguished where charcoal lighter
is positioned positively, torch fuel is positioned negatively, and all other standards are
approximately zero.
All other petroleum distillate standards are positioned close to zero on PC2, as the

80

0.15

PC 2 (22.4%)

-0.15

0.15

-0.15

PC 1 (60.2%)

Gas A

Gas B

Gas C

Fuel Injector

Diesel

Torch Fuel

Charcoal Lighter

Fire Debris

Figure 4.11: PCA scores plot of PC1 (60.2%) versus PC2 (24.4%) for the refined data set with
commercial standards and simulated fire debris projected

81

characteristic compounds of those standards such as the heavier alkanes (C15-C20) are less
volatile and less variable. These less variable compounds are not influenced significantly on PC2
but the more volatile and more variable alkanes (C10-C14) do contribute to the variance on PC2.
The positioning of the gasoline and petroleum distillate standards differs from the
chemically diverse data set as there are less standards and less chemical diversity within the
refined data set. Differentiating gasoline from the petroleum distillates along PC1 is more
straightforward once the isoparaffinic products (i.e., paint thinner and upholstery protector) are
removed. Removing standards from the data set distinguishes the petroleum distillate standards
from one another along PC1 and PC2.
From visual assessment of the scores plot, the fire debris samples are positioned
negatively along PC1, approximately zero on PC2, and are positioned close to the diesel and fuel
injector standards, both of which are classified as heavy petroleum distillates.
Euclidean distances were calculated between the simulated fire debris scores and each of
the ignitable liquid standard scores based on four PCs (Table 4.2). The shortest Euclidean
distance was calculated between the fire debris scores and the diesel standard, indicating the
greatest similarity to the corresponding standard. These calculated Euclidean distances confirm
the visual association of the scores plots. However, Euclidean distances can be limiting, as the
number of principal components to include must be designated. Selecting a different number of
principal components will result in different calculated distances and could change the
association as more discriminatory information (additional PCs) is added to the calculation.
Using a refined data set association to the corresponding ignitable liquid and ASTM class
was possible, but the simulated fire debris can only be associated to the specific ignitable liquid
82

Table 4.2: Euclidean distances between fire debris scores and ignitable liquid standard scores for
the refined data set
Ignitable Liquid Standard
Gasoline A

Refined Data Set
0.08695

Gasoline B

0.07509

Gasoline C

0.07992

Diesel

0.02653

Fuel Injector

0.02867

Torch Fuel

0.05852

Charcoal Lighter

0.06370

83

if it is present in the data set. PCA was performed again, this time excluding the commercial
diesel standard from the refined data set. This iteration of PCA was performed to demonstrate
that the specific ignitable liquid must be present and to establish the potential utility of class
reference standards. The resulting scores plot with the simulated fire debris projected is shown in
Figure 4.12.
From visual assessment of the scores plot excluding the diesel commercial standard, the
fire debris samples are most closely positioned to the fuel injector standard. Additionally, the
shortest Euclidean distance was calculated between the fire debris scores and the fuel injector
standard, indicating greatest similarity not to the corresponding standard as the standard is not
present in the data set, but to a standard of similar chemical composition within the same ASTM
class. The Euclidean distances between the simulated fire debris scores and the ignitable liquid
scores for the refined data set, excluding diesel, based on three PCs are shown in Table 4.3.
Visual association and Euclidean distances indicate that the fire debris can be associated
to a specific ignitable liquid when it is present in the given data set. If the specific ignitable
liquid is not present in the data set, the fire debris samples can be associated to a standard that is
similar in chemical composition. However, if diesel and fuel injector were both removed, the fire
debris may not associate well with other petroleum distillates present in the data set due to the
chemical variations within this ASTM class. This demonstrates the limitations of using
commercial ignitable liquids in PCA, as chemical variability within an ASTM class can be
limiting and it is not practical to include every commercial ignitable liquid in the data set.
Ultimately, association to ASTM class is still possible if chemical composition is similar
indicating that the use of class reference standards has potential for successful association.

84

0.15

PC 2 (25.1%)

-0.15

0.15

-0.15
Gas A

Gas B

Gas C

PC 1 (63.5%)

Fuel Injector

Torch Fuel

Charcoal Lighter

Fire Debris

Figure 4.12: PCA scores plot of PC1 (63.5%) versus PC2 (25.1%) for the refined data set with
commercial standards (excluding diesel) and simulated fire debris projected

85

Table 4.3: Euclidean distances between fire debris scores and ignitable liquid standard scores for
the refined data set containing diesel and excluding diesel
Ignitable Liquid Standard
Gasoline A

Refined Data Set Excluding
Diesel
0.08549

Gasoline B

0.07318

Gasoline C

0.07736

Diesel

N/A

Fuel Injector

0.02281

Torch Fuel

0.05581

Charcoal Lighter

0.06126

N/A indicates the standard was not present in the scores plot

86

Association of other simulated fire debris samples was also successful using the refined data set.
All simulated fire debris samples consisting of nylon carpet with carpet padding and treated
wood flooring spiked with a petroleum distillate (diesel or torch fuel) associated to the petroleum
distillate class. In the case of the torch fuel spiked onto the treated wood flooring, improper
association to the fuel injector standard occurred due to the additional normal alkanes from the
wood treatment. Association of fire debris samples containing paint thinner was not performed,
as no isoparaffinic products were included in the data set. Similar to the chemically diverse data
set, fire debris samples consisting of nylon carpet spiked with gasoline associated to the diesel
standard due to extensive evaporation.
Comparing the chemically diverse data set to the refined data set demonstrated that the
success of association was highly dependent on the composition of the data set used. Considering
there is a range of commercially available ignitable liquids within each ASTM class and
chemical variability within a class, selecting an ideal data set for analysis is problematic.
Specifically this is problematic in forensic science as it could be interpreted as manipulating
results. Overall, this emphasizes the need for a more standardized approach that may be provided
by generating reference standards that are representative of a given ASTM class. This would
eliminate problems associated with changing the size and/or chemical diversity of the data set.
4.5.3. Class Reference Standards
Principal components analysis was performed on the class reference data set with the
simulated fire debris containing carpet spiked with diesel projected and the resulting scores plot
is shown in Figure 4.13. The class reference data set was used to investigate the effects of
association and discrimination using standards based on chemical class composition rather than
commercial standards. In this data set, PC1 accounts for 72.9% of the variance and distinguishes
87

the petroleum distillate standards, loading positively, from the gasoline standard, loading
negatively. PC2 accounts for 27.1% of the variance and further distinguishes the heavy
petroleum distillate standard, loading positively, from the medium petroleum distillate standard,
loading negatively. The medium and heavy petroleum distillates are well distinguished from one
another, particularly on PC2, despite the chemical overlap of normal alkanes where the MPD
standard contains C8-C13 and the HPD standard contains C8-C20.
From visual assessment of the scores plot, the fire debris samples are most closely
positioned to the heavy petroleum distillate standard. Euclidean distances were calculated
between the simulated fire debris scores and each of the class reference standard scores based on
two PCs and are shown in Table 4.4. The shortest calculated Euclidean distance was between the
fire debris and the heavy petroleum distillate standard with a distance of 0.09245, indicating the
greatest similarity. The next shortest distance was between the fire debris and the gasoline
standard with a calculated distance of 0.1426.
These calculated Euclidean distances confirmed the visual association of the scores plot.
Despite chemical differences between the ignitable liquid present in the debris and the heavy
petroleum distillate reference standard, successful association was possible. Association of the
fire debris in this scenario is relatively easy due the presence of fewer standards that are more
representative resulting in the standards being well distinguished from one another along both
PCs. Additionally, the precision of the class references standards is higher when compared to the
precision of commercial standards. Using class reference standards does show potential in PCA
analysis; however, further exploration of additional standards is necessary.

88

0.16

PC 2 (27.1%)

-0.16

0.16

-0.16
Medium Pet. Dist.

PC 1 (72.8%)

Heavy Pet. Dist.

Gasoline

Fire Debris

Figure 4.13: PCA scores plot of PC1 (72.8%) versus PC2 (27.1%) for the class reference
standards with the projected scores of the simulated fire debris

89

Table 4.4: Euclidean distances between fire debris scores and class reference scores for the class
reference data set
Ignitable Liquid Standard
Gasoline

Class Reference Data Set
0.1426

Medium Petroleum Distillate

0.1445

Heavy Petroleum Distillate

0.09245

90

All other simulated fire debris samples containing a petroleum distillate spiked onto
nylon carpet and treated wood flooring associated to the heavy petroleum distillate class.
Association of the fire debris containing paint thinner was not performed as no isoparaffinic
standards were present in the data set. Once again, the fire debris samples containing gasoline
associated to the heavy petroleum distillate class due to extensive evaporation and loss of
characteristic compounds.
4.6. Summary
The use of PCA as an additional tool in fire debris analysis is currently limited by the
arbitrary selection of the data set where no designated number or type of standards to use is
specified. Given the variety of ignitable liquids that are commercially available, including each
one in the data set is not practical. Successful association and discrimination can be affected by
the composition of the data set where association may change depending on the size and
chemical diversity of the data set. This could be problematic in forensic science as it may be
interpreted as manipulating the data set to get results that are more desirable.
In addition to selecting the appropriate data set, visual interpretation of association can
be challenging and subjective. Utilizing Euclidean distance as an additional metric to
quantitatively assess the association reduces the subjectivity when interpreting scores plots.
However, there are some limitations to using Euclidean distances such as selecting the number
of PCs to use and interpreting a short distance versus a long distance as distances may not
typically differ substantially.
Class reference standards have demonstrated some potential to associate fire debris
samples to the corresponding ASTM class using PCA despite the chemical differences between
the class reference standard and the commercial ignitable liquid. However, other class reference
91

standards should be generated to investigate the utility of these standards further. Overall, using
these standards for PCA may help to standardize the current approach and overcome problems
associated with altering the size or nature of the data set.

92

5. Investigation of Class Reference Standards for Association and Classification of Fire
Debris to ASTM Class using Hierarchical Cluster Analysis and k-Nearest Neighbors
5.1. Introduction
Hierarchical cluster analysis (HCA) was used to examine the similarity between
simulated fire debris samples and corresponding commercial ignitable liquids. Previously, HCA
has been investigated as a tool to assist with determining which ignitable liquids the simulated
fire debris are most closely associated to on a principal component analysis (PCA) scores plot by
observing which standard the fire debris associates to first on the corresponding HCA
dendrogram. HCA has some advantages and disadvantages over PCA and shares some of the
limitations associated with using commercial ignitable liquids as standards.
This chapter highlights the advantages and disadvantages of HCA when compared to
PCA and continues to demonstrate the need for a more standardized approach for associating
simulated fire debris to ignitable liquids. To continue to establish the utility of class reference
standards as an alternative to commercial standards, HCA was performed on both sets of
commercial ignitable liquid standards, as well as the generated class reference standards.
Furthermore, HCA was performed using the chemically diverse and refined commercial ignitable
liquid data sets to investigate how HCA is impacted by data sets with different chemical
diversity.
Additionally, k-Nearest neighbors (k-NN) was investigated as a classification procedure
for classifying simulated fire debris samples based on ASTM class. Previously, k-NN has not
been used to classify simulated fire debris, but has been used in other forensic applications (1-3)
In this chapter, k-NN was used to examine how well simulated fire debris is classified using class
93

reference standards. k-NN was performed using commercial ignitable liquid standards and class
reference standards and the percent of successful classifications, as well as misclassifications,
was observed.
5.2. Association of Simulated Fire Debris using HCA
5.2.1. Commercial Ignitable Liquid Standards- Chemically Diverse Data Set
Hierarchical cluster analysis was first performed on the chemically diverse data set
containing the commercial gasoline, petroleum distillate, isoparaffinic standards, and simulated
fire debris. The resulting dendrogram is shown in Figure 5.1. The dendrogram illustrates the
clusters generated along the left axis where replicates of each standard are first grouped to one
another. Similarity level is indicated along the top axis and by the branching within the
dendrogram where a similarity level of 1.0 indicates the greatest similarity and a similarity level
of 0.0 indicates no similarity.
Based on the dendrogram containing only the commercial ignitable liquid standards,
replicates of each standard were clustered at similarity levels ranging from 0.791 to 0.997. This
large range of similarity levels between replicates was observed due to differences in abundance
between some replicates. For example, one replicate of charcoal lighter clustered to all other
charcoal lighter replicates at a much lower similarity level (0.621) than the others. This one
replicate has a much higher abundance of C10 and C11 than all other replicates, which are the two
dominating normal alkanes in the TIC; as a result, this replicate is still clustered correctly, but at
a lower similarity level. Additionally, the replicates of diesel are considered the most similar
compared to replicates of all other ignitable liquids because of the low volatility of the heavier

94

1.0
Carpet, Diesel
Fire Debris
Fuel Injector
Kerosene
Diesel
Charcoal Lighter
Torch Fuel

0.8

Similarity Level
0.6
0.4

0.2

0.0

0.543

Gasoline C
Gasoline B

0.865

Gasoline A
Lamp Oil
Paint Thinner
Upholstery
Protector

0.186
0.000

Figure 5.1: Dendrogram of the chemically diverse data set with similarity levels indicated where
appropriate

95

alkanes in this liquid (C12-C20). These compounds are less volatile than compounds found in
other ignitable liquids making the replicates of diesel more reproducible.
All gasoline standards cluster to one another at a similarity level of 0.865. Despite being
similar in chemical composition, there are minor differences in abundance of the characteristic
compounds among the three gasoline standards, as shown in Figure 5.2. Gasoline A (Figure
5.2A) has a higher abundance of C2-alkylbenzenes than the other gasoline standards while
gasoline B (Figure 5.2B) has a low abundance of toluene and a higher abundance of C3alkylbenzenes. Toluene is highly volatile compound while the C3-alkylbenzenes are less volatile.
Therefore, it would be expected for gasoline B to be less variable than the other gasoline
standards that contain high levels of toluene. Similarly, gasoline C (Figure 5.2C) has a higher
abundance of toluene and a moderate to high abundance of C2-alkylbenzenes. These compounds
are the most volatile therefore gasoline C would be expected to be the most variable gasoline
standard. This was confirmed in Figure 5.1 where the gasoline C replicates cluster at a lower
similarity level (0.837) than the other gasoline standards (0.930 and 0.933 for gasolines A and B,
respectively).
A majority of the petroleum distillate standards cluster to one another at a similarity level
of 0.543, with the exception of lamp oil. This liquid has a more narrow range of normal alkanes
(C11-C12) than any other petroleum distillate standard in the data set; consequently, the lamp oil
standard does not cluster exclusively to the other petroleum distillates.
An isoparaffinic liquid, paint thinner, clustered to the gasoline and petroleum distillate
standards at a similarity level of 0.186 while the other isoparaffinic liquid, upholstery protector,
clustered at a similarity level of 0.0. Paint thinner contains branched and cyclic alkanes, gasoline

96

Normalized Abundance

A

C2-alkylbenzenes
C3-alkylbenzenes

Gasoline A

C4-alkylbenzenes

Toluene

Napthalenes
0

10

20

30

Retention Time (min)

C3-alkylbenzenes

Normalized Abundance

B

C2-alkylbenzenes

C4-alkylbenzenes

Napthalenes

Toluene
0

Gasoline B

10

20

30

Retention Time (min)
Figure 5.2: Representative total ion chromatograms of A) commercial gasoline A standard, B)
commercial gasoline B standard, and C) commercial gasoline C standard with characteristic
compounds identified to highlight differences among each gasoline standard

97

Figure 5.2 (cont’d)

Normalized Abundance

C

Gasoline C

C3-alkylbenzenes

C2-alkylbenzenes

Toluene

C4-alkylbenzenes

Napthalenes
0

10

20

Retention Time (min)

98

30

contains mostly alkylbenzenes, and the petroleum distillates consist predominantly of normal
alkanes. As paint thinner is chemically different from the gasoline and petroleum distillate class,
the paint thinner standard was the penultimate standard that was clustered at a low similarity
level. As previously mentioned, upholstery protector does not have any compounds in common
with the other standards or any compounds that elute at similar retention times to those
compounds in the other standards. As a result, upholstery protector was not considered similar to
any standards and was clustered last, at a similarity level of 0.0.
Although the chemical differences are known and observed in the large range of
similarity levels, it is not possible to determine what variables are contributing to the clustering.
In PCA, a loadings plot is used to determine the variables contributing the most to the variance;
however, in HCA no such output is provided. This can be disadvantageous for understanding the
clustering of samples; however, the raw data can be interpreted to hypothesize why specific
clustering is occurring as demonstrated here.
Next, the simulated fire debris consisting of carpet spiked with commercial diesel was
included in the data set and the resulting dendrogram is shown in Figure 5.3.The simulated fire
debris samples clustered to each other at similarity levels ranging from 0.772 to 0.983. This
range of similarity levels was a result of the variability of the burning process. The fire debris
samples clustered first to the heavy petroleum distillate cluster containing fuel injector, kerosene,
and diesel at a similarity level of 0.705. However, it is important to note that this data set is
biased towards the petroleum distillate class, as over half of the standards are petroleum
distillates.

99

1.0
Carpet, Diesel
Fire Debris
Fuel Injector
Kerosene
Diesel
Charcoal Lighter
Torch Fuel

0.8

Similarity Level
0.6
0.4

0.2

0.0

0.705

Gasoline C
Gasoline B
Gasoline A
Lamp Oil
Paint Thinner
Upholstery
Protector

Figure 5.3: Dendrogram of the chemically diverse data set with simulated fire debris consisting
of carpet spiked with diesel

100

In this iteration of HCA, the specific ignitable liquid standard the simulated fire debris
was most similar to remains unknown; however, clustering to the appropriate ASTM class was
possible. Association of all other simulated fire debris samples containing petroleum distillates
spiked onto both nylon carpet with carpet padding and treated wood flooring were properly
associated to the petroleum distillate class, but not necessarily to the specific ignitable liquid
standard. Fire debris samples containing paint thinner spiked onto nylon carpet were incorrectly
associated first to a cluster containing all commercial petroleum distillates and gasoline standards
at low similarity level (0.399). Although commercial paint thinner was included in the data set,
loss of characteristic compounds through the burning process resulted in improper association.
Fire debris samples containing paint thinner spiked onto treated wood flooring were incorrectly
associated to charcoal lighter. Incorrect association occurred due to the presence of the same
normal alkanes between the wood treatment and charcoal lighter (C9-C12) and compounds with
overlapping retention times that were present in both paint thinner and charcoal lighter.
Fire debris samples consisting of nylon carpet spiked with gasoline clustered to the
standards with zero similarity; however, due to extensive evaporation, few compounds
corresponding to gasoline were present in the debris samples and mostly substrate interference
compounds (i.e., styrene, 1,2,3-trichloropropane, and biphenyl) dominated the TIC. As a result,
the simulated fire debris containing gasoline did not resemble gasoline. Association of treated
wood flooring spiked with gasoline could not be investigated, as these simulated fire debris
samples were not generated due to continual problems with instrument contamination resulting
from the surface treatment.

101

5.2.2. Commercial Ignitable Liquid Standards- Refined Data Set
Hierarchical cluster analysis was performed on the refined data set with the simulated fire
debris consisting of carpet spiked with diesel and the resulting dendrogram is shown in Figure
5.4. In this data set, replicates of each standard cluster at similarity levels ranging from 0.621 to
0.991. A large range of similarity levels would not typically be expected of replicates, but
occurred due to the previously mentioned charcoal lighter replicates (see section 5.2.1) and
because of the chemical nature of the ignitable liquids, as some ignitable liquids contain more
volatile compounds making those liquids more variable than others.
The gasoline standards cluster to the petroleum distillates at a similarity level of 0.0 due
to chemical differences present between classes. Each class has distinct chemical differences
where the gasoline class contains alkylbenzenes and aliphatic compounds and the petroleum
distillate class contains a homologous distribution of normal alkanes and some aromatic
compounds.
All gasoline standards cluster to one another at a similarity level of 0.756. While all
petroleum distillate standards cluster to one another at a similarity level of 0.110 indicating little
to no similarity. This similarity level is relatively low due to the diversity of the data set and due
to chemical differences within the petroleum distillate class. For example, fuel injector and diesel
cluster together at a relatively high similarity level (0.486) as these standards are classified as
heavy petroleum distillates and contain normal alkanes C10-C16 and C12-C20, respectively.
Alternatively, charcoal lighter clusters to fuel injector and diesel at a relatively low similarity

102

1.0
Carpet, Diesel
Fire Debris
Fuel
Injector
Diesel

0.8

Similarity Level
0.6
0.4

0.2

0.0

0.403

Charcoal
Lighter
Torch Fuel
Gasoline C
Gasoline B
Gasoline A

Figure 5.4: Dendrogram of the refined data set with simulated fire debris consisting of carpet
spiked with diesel

103

level (0.213) and torch fuel clusters to all of the petroleum distillates at an even lower similarity
level (0.110) due to differences in the range of normal alkanes. Both charcoal lighter and torch
fuel are classified as medium petroleum distillates with normal alkane ranges of C9-C12 and C11C14 respectively.
The three simulated fire debris samples cluster at similarity levels ranging from 0.586 to
0.970 indicating some variation within the generated fire debris samples. This variation between
samples was expected due to the inability to completely control the burning process. After the
clustering of samples, the simulated fire debris samples were first clustered to both of the heavy
petroleum distillate standards (commercial fuel injector and diesel) at a similarity level of 0.403.
From this dendrogram, it was unclear whether the simulated fire debris was most similar
to the diesel or fuel injector standard as the two standards cluster to one another first. This
outcome is similar to the results from the visual assessment of the PCA scores plot (see Figure
4.11) where it was unclear if the fire debris samples were more closely associated to the diesel or
to the fuel injector standard. Although the fire debris cannot be associated to a specific ignitable
liquid in this case, the corresponding ASTM class of ignitable liquid was properly associated
(i.e., petroleum distillate). As the purpose of this research is to associate fire debris to chemical
class rather than a specific ignitable liquid, this would not be considered a limitation of HCA.
Additionally, if the commercial diesel standard were not present in the data set, association of the
simulated fire debris to the heavy petroleum distillate cluster would still occur (similarity level:
0.403), assuming the removal of diesel was the only change to the data set.
Association of other simulated fire debris samples was somewhat successful. All
simulated fire debris samples consisting of nylon carpet with carpet padding and treated wood

104

flooring spiked with a petroleum distillate (diesel or torch fuel) associated to the petroleum
distillate class, but not necessarily to the specific ignitable liquid. All fire debris samples
containing paint thinner spiked onto nylon carpet with carpet padding and treated wood flooring
incorrectly associated to charcoal lighter, as no isoparaffinic standards were included in the data
set. Although paint thinner and charcoal lighter belong to two different ASTM classes, these
ignitable liquids have similar retention time ranges (8.52-11.96 min. and 7.42-12.50 min.
respectively) with overlapping compounds, which explains why such clustering occurs.
Similar to the chemically diverse data set, the gasoline fire debris samples clustered to the
standards with no similarity due to extensive evaporation of the ignitable liquid in the fire debris
in which the presence of an ignitable liquid was not detectable.
As the data set becomes more refined, the similarity levels of standards within the same
class decrease. For example, the three commercial gasoline standards clustered at a similarity
level of 0.865 in the chemically diverse data set while the same three standards clustered at a
similarity level of 0.756 in the refined data set. Similarity levels can be beneficial as a numerical
representation of the similarities observed in the data set, but these levels can also be limiting as
the similarity levels provided are only relative to the given data set. Attempting to associate the
same simulated fire debris samples using two different data sets will yield different similarity
levels. While HCA is used to highlight similarities in the data set there will always be a sample
denoted as different due to the way in which clustering occurs. As a result, it is difficult to
determine if the given association of the fire debris is a good fit based on similarity level alone.
Overall, selecting a data set with a different chemical diversity did not alter the
association of the fire debris, but did change the similarity level at which clustering occurred. As

105

a result, the same general trends of association were observed. This is an advantage of HCA for
fire debris analysis as association to the same ASTM class occurred regardless of the diversity of
the data set.
5.2.3. Class Reference Standards
Hierarchical cluster analysis was performed on the class reference data set with the
simulated fire debris containing carpet spiked with diesel and the resulting dendrogram is shown
in Figure 5.5. The class reference data set was used to determine if class reference standards have
utility for association using HCA.
In this data set, replicates of each class reference standard clustered at similarity levels
ranging from 0.948 to 0.985. Replicates of each simulated fire debris extract clustered at
similarity levels ranging from 0.732 to 0.983. Replicates of the class reference standards
clustered together at higher similarity level than the commercial standards indicating more
precision and less variability among the class reference standards. The gasoline reference
standards clustered to one another at a similarity level of 0.974 and clustered to the petroleum
distillates at a similarity level of 0.0. The medium petroleum distillate reference standard
replicates clustered to one another at a similarity level of 0.948 and clustered to the heavy
petroleum distillate standard at a similarity level of 0.063 while all HPD reference standard
replicates clustered to one another at a similarity level of 0.966. A low similarity level between
the MPD and HPD reference standards was not expected, but occurred because of the limited
chemical diversity and because the given similarity levels are only relative to the data set. As a
result, differences were highlighted even when the samples were somewhat similar. This

106

1.0

0.8

Similarity Level
0.6
0.4

0.2

0.0

Carpet, Diesel
Fire Debris
0.225
Heavy
Petroleum
Distillate
Medium
Petroleum
Distillate
Gasoline

Figure 5.5: Dendrogram of the class reference data set with simulated fire debris consisting of
carpet spiked with diesel

107

demonstrates the need to have a standardized data set containing standards representative of all
ASTM classes.
The simulated fire debris samples containing diesel first clustered to the HPD reference
standards at a similarity level of 0.225. Although the similarity level was relatively low, the
samples associated to the corresponding class standard first. In addition, the similarity level is
low due to the limited chemical diversity of the data set. If additional class reference standards
containing classes of different chemical composition were introduced to the data set, the
similarity level between the simulated fire debris and HPD reference standard would be expected
to increase. As previously mentioned for the chemically diverse data set, this data set was also
somewhat biased towards petroleum distillates.
All other simulated fire debris samples containing a petroleum distillate spiked on nylon
carpet and treated wood flooring associated to the heavy petroleum distillate class. Association
of the fire debris containing paint thinner was not attempted, as no isoparaffinic standards were
included in the data set. Once again, the fire debris samples containing gasoline clustered to the
standards with no similarity due to extensive evaporation of the liquid in these samples.
Class reference standards showed potential for association using HCA. Although ASTM
class association was achieved using both commercial standards and class reference standards,
the class reference standards would be beneficial to generate a more standardized approach. If a
full set of class reference standards were generated based on ASTM class and used during HCA,
the similarity levels for association would be much more representative of how similar the
simulated fire debris is to the standard in which it associates. For example, if the exact simulated
fire debris samples were introduced to two different data sets and association occurred to the

108

same ignitable liquid standard, the resulting similarity level will be different. Therefore, it is
difficult to determine the significance of a given similarity level. However, if all simulated fire
debris samples were consistently introduced to the same data set with varying levels of chemical
diversity, a high versus a low similarity level would be more indicative of the actual level of
association.
Additionally, HCA analysis could be considered advantageous over PCA because
association to a specific standard or specific cluster of standards does not change as the diversity
of the data changes. As a result, the idea of data manipulation previously mentioned during PCA
analysis is eliminated. Furthermore, HCA analysis takes into account all dimensions of the data
set and displays them in the form of a single dendrogram with similarity levels as a means of
numeric representation. During PCA analysis, all dimensions of the data are accounted for, but
only the dimensions that contain useful information are used and only two or three dimensions
can be compared simultaneously. As a result, looking at all dimensions or PCs would be very
time consuming and would require numerous scores plots. In addition, if numeric representation
is desired in PCA an additional step of calculating Euclidean distance or alternative metric is
required. However, PCA does have an advantage in identifying the variables responsible for the
association and discrimination whereas, in HCA, the original data must be interpreted to
hypothesize why clustering is occurring.
5.3. Association of Simulated Fire Debris using k-NN
5.3.1. Commercial Ignitable Liquid Standards- Chemically Diverse Data Set
k-Nearest neighbors was first performed on the chemically diverse data set containing the
commercial gasoline, petroleum distillates, and isoparaffinic standards. Each commercial
ignitable liquid was designated a defined class by specific ignitable liquid; however, all gasoline
109

standards were placed into one class due to the similarities of the ignitable liquids. Based on the
given threshold resulting from the t distribution at a 95% confidence level for each class, the
commercial class standards have a good class fit with the exception of approximately one to two
replicates per ignitable liquid that do not fit within the threshold. Standards that fall outside the
given threshold range are similar to standard replicates that cluster at a low similarity level to
other replicates using HCA. For example, one gasoline replicate and one charcoal lighter
replicate did not fall within the threshold and clustered to the corresponding replicates at lower
similarity levels using HCA (see Figure 5.1 and Figure 5.3). However, as a distributional statistic
is used to determine the threshold, it is anticipated that approximately 5% of the standards will
exceed the given threshold. Given this expectation and the fact that the replicates clustered to the
appropriate standard using HCA, the replicates were not removed and were used to represent the
variance found within ignitable liquids.
Classification of the fire debris samples was carried out using 1, 3, 5, 7, and 9 nearest
neighbors. For all values of nearest neighbors, the fire debris samples containing diesel were all
misclassified as kerosene. This misclassification is similar to the results observed during PCA
(see section 4.5.1) in which the fire debris samples were incorrectly associated to the kerosene
standard.
All other simulated fire debris samples containing petroleum distillates spiked on nylon
carpet and treated wood flooring were classified within the petroleum distillate class although
correct classification to the specific ignitable liquid was not always achieved. Fire debris samples
containing paint thinner spiked on nylon carpet were incorrectly classified as diesel. Incorrect
classification may due to the loss of characteristic compounds of paint thinner from the burning
process and because of the presence of substrate interference compounds that have overlapping
110

retention times with compounds present in diesel. Fire debris samples containing paint thinner
spiked onto treated wood flooring were incorrectly classified as fuel injector (33% of samples)
and charcoal lighter (67% of samples) using all values of nearest neighbors. Incorrect association
most likely occurred due to the presence of the same normal alkanes between the wood treatment
and charcoal lighter (C9-C12) and overlapping retention times of compounds present in paint
thinner and charcoal lighter. Fire debris containing gasoline spiked on nylon carpet was
dominated by substrate interferences (i.e., styrene, 1,2,3-trichloropropane, and biphenyl), with
few of the characteristic gasoline compounds present. In addition, biphenyl (tR: 15.24 min.), a
substrate interference compound from the carpet, has a similar retention time to C14 (tR: 15.25
min.) that is present in diesel, which may be why classification to the commercial diesel
occurred.
5.3.2. Commercial Ignitable Liquid Standards- Refined Data Set
k-Nearest neighbors was performed on the refined data set and classification of the
simulated fire debris consisting of carpet spiked with diesel samples was investigated. Based on
the given threshold for each class, the commercial class standards have a good class fit with the
exception of approximately one replicate per class that did not fit within the given threshold.
Similar to the replicates in the chemically diverse data set, some replicates that did not fit within
the threshold range were clustered to the other replicates at a lower similarity level using HCA.
However, other replicates that did not fit within the threshold appeared to be clustered well using
HCA because in k-NN 5% of the population is expected to fall outside of the threshold range
when a 95% confidence level is selected. Additionally, fewer replicates fell outside the
threshold; as the number of samples within the data set decreases, it is expected that fewer
samples will fall outside the threshold range. Similar to the chemically diverse data set, the
111

replicates were not removed but were used to represent the variance found within ignitable
liquids.
The simulated fire debris samples containing carpet spiked with commercial diesel were
then classified based on the defined classes. Classification was performed using 1, 3, 5, 7, and 9
nearest neighbors and the percent correctly classified for each nearest neighbor is indicated in
Table 5.1.
Using only one nearest neighbor, only 11% of the simulated fire debris samples were
correctly classified to the corresponding ignitable liquid and 89% were misclassified as fuel
injector, a heavy petroleum distillate. Using just one nearest neighbor is susceptible to
misclassification due to the presence of outliers and therefore using one neighbor for
classification is not recommended. The percent of correctly classified samples increases as the
number of nearest neighbors increases until the percent correctly classified maximizes at 67%
using seven nearest neighbors. In this research, selecting a higher number of nearest neighbors,
and therefore considering a larger number of standards, improved the classification success
because each class is well defined. However, 33% of samples were still misclassified and, in
each case, were misclassified as fuel injector, which is a heavy petroleum distillate.
Unfortunately, the ability to determine if the classification of the simulated fire debris
was a good fit was not possible. The inability to determine if a good class fit has been made is
disadvantageous as k-NN is a hard classification procedure and classification will always be
forced. However, it was possible to interpret the raw data and hypothesize why the observed
misclassification occurred. A representative TIC of the simulated fire debris containing diesel,
the commercial diesel standard, and the commercial fuel injector standard is shown in Figure 5.6.

112

Table 5.1: Percent classification of simulated fire debris containing carpet spiked with diesel to
the corresponding commercial diesel standard using 1, 3, 5, 7, and 9 nearest neighbors

Nearest Neighbors
1

Percent Classification to
Diesel Standard (%)
11

3

33

5

44

7

67

9

67

113

Normalized Abundance

A

C12

C13
C14
C15

C11

C16
0

10

20

30

Retention Time (min)

Normalized Abundance

B
C14

C15
C16

C13
C17
C12

C18
C19

0

10

20

C20
30

Retention Time (min)

Figure 5.6: Representative total ion chromatogram of A) simulated fire debris consisting of
carpet spiked with commercial diesel, B) commercial diesel standard, and C) commercial fuel
injector standard with compounds identified
114

Figure 5.6 (cont’d)

Normalized Abundance

C

C12

C13

C11

C14

C10
C15
C16

0

10

20

Retention Time (min)

115

30

The simulated fire debris containing diesel (Figure 5.6A) contains normal alkanes in the
range C11-C16, while the commercial diesel standard (Figure 5.6B) contains normal alkanes in the
range C12-C20. The commercial fuel injector standard (Figure 5.6C) contains normal alkanes in
the range of C10-C16. After the burning process and introduction of interference compounds,
some of the fire debris samples contain a normal alkane range more similar to fuel injector.
Interference compounds dominate resulting in some of the heavier normal alkanes being masked
by the larger abundance of interference compounds. As a result, some fire debris samples
correctly classify to diesel, while others misclassify as fuel injector due to the change in normal
alkanes observed because of the dominating substrate interferences.
Similar to PCA when diesel was removed from the data set, 100% of the fire debris
samples associate to the commercial fuel injector standard. This once again becomes a limitation,
as not every commercial ignitable liquid on the market can be included in a given data set.
Although classification to the specific ignitable liquid was not always achieved, classification to
the proper ASTM class was still possible as diesel and fuel injector are both heavy petroleum
distillates.
All other simulated fire debris samples containing petroleum distillates spiked on nylon
carpet and treated wood flooring were classified within the petroleum distillate class although
not always to the corresponding specific ignitable liquid. Classification of fire debris samples
containing paint thinner was not performed with this data set, as no isoparaffinic standards were
present, while fire debris containing gasoline classified 100% to the commercial diesel standard.
Similar to the chemically diverse data set, the fire debris samples containing gasoline associated
to the commercial diesel standard.

116

Using a chemically diverse data set, all fire debris samples were misclassified (k=7)
based on specific ignitable liquid, but were all classified to the corresponding ASTM class as
kerosene is a heavy petroleum distillate. As a more refined data set was introduced, 67% of fire
debris samples were properly classified (k=7) to the specific ignitable liquid used while others
(33%) were only properly classified by ASTM class. These results indicate the need for a more
standardized approach and demonstrate that the use of class references standards representative
of different ASTM classes have potential for association and classification purposes.
5.3.3. Class Reference Standards
k-Nearest neighbors was performed on the class reference data set and classification of the
simulated fire debris consisting of carpet spiked with diesel samples was investigated Based on
the given threshold for each class, the commercial class standards each have a good class fit with
the exception of a few replicates that fall outside of the threshold range. As the data set is small,
only a few replicates are expected to fall outside of the threshold range; however, no replicates
appear to be substantially different when compared to the corresponding HCA dendrogram.
Classification of the fire debris samples was carried out again using 1, 3, 5, 7, and 9
nearest neighbors. For all values of nearest neighbors, the fire debris samples containing diesel
were all properly classified to the heavy petroleum distillate class reference standard. However,
as previously mentioned the class fit of the samples could not be analyzed. Using class reference
standards for k-NN analysis allows for fire debris samples to be classified based on ASTM class.
Generating a larger class reference standard set could potentially be utilized to help standardize a
useful data set for fire debris classification.

117

Using class reference standards and k-NN, all simulated fire debris samples containing
petroleum distillates (both carpet and wood substrates) were classified as heavy petroleum
distillates. Classification of the fire debris containing paint thinner was not performed, as no
isoparaffinic class standard was present. Fire debris containing gasoline was misclassified as a
heavy petroleum distillate; however, using 3-nearest neighbors, 22% of the simulated fire debris
correctly classified to the gasoline standard, while the remaining 78% misclassified as a heavy
petroleum distillate. Unsuccessful classification of gasoline may be considered disadvantageous,
as it is one of the most common ignitable liquids used. However, extensive burning of the
gasoline fire debris samples did not contain many characteristic compound of the commercial
gasoline. In addition, a compound from the nylon carpet, biphenyl, had overlapping retention
times with C14, a characteristic compound in diesel (see section 5.3.1).
5.4. Summary
The use of HCA and k-NN as multivariate statistical procedures to associate and classify
simulated fire debris samples to corresponding ASTM class has some advantages and
disadvantages. HCA is beneficial compared to PCA because similarity levels are calculated
within the analysis and all dimensions are accounted for and displayed simultaneously. In
contrast, in PCA, all dimensions are accounted for, but only two or three dimensions can be
observed simultaneously making interpretation of many dimensions time consuming. Further, if
numeric representation of association is desired in PCA, an additional metric (e.g. Euclidean
distances) must be calculated. However, there are some disadvantages to HCA such as the
inability to determine which variables are contributing to the clustering and that the similarity
levels calculated are only relative to a given data set. That is, the similarity between the same
two samples will change depending on the content of the data set. This makes interpreting a high
118

versus a low similarity level difficult. However if a standard data set were used every time,
similarity levels would be more representative.
Using HCA, association of the fire debris to the specific commercial ignitable liquid was
not possible, but association to the corresponding ASTM class was possible using both
commercial and class reference standards. Association using HCA was not largely affected by
introducing a more diverse data set; however, the similarity level at which the simulated fire
debris associated was affected. Using k-NN, classification of the fire debris to the specific
commercial ignitable liquid was possible with the refined data set, but as the data set became
more diverse classification success was minimal. However, classification to the corresponding
ASTM class was possible using k-NN, which is all that is required in fire debris analysis. In
addition, one major disadvantage of k-NN is that classification is always forced and there is no
way to assess class fit of unknown samples. This is disadvantageous because a sample will
always be classified to one class even if it is dissimilar from all of the given classes and, as there
is no way to assess class fit it is difficult to determine if the classification was a good fit or if the
samples are not similar.
Class reference standards have demonstrated potential to associate and classify fire debris
samples to the corresponding ASTM class using HCA and k-NN. Although both commercial and
class reference standards were successful in class association, the class reference standards could
be beneficial for generating a more standardized data set. If a more standardized data set is used
in HCA, the given similarity levels will be more representative of similarity between the
simulated fire debris and ignitable liquid and therefore would be more beneficial when
associating samples to different ASTM classes. As a result, the problem that a similarity level is
only relative to a given data set will be eliminated. Additionally, a more standardized data set for
119

k-NN classification would be beneficial as successful classification can be affected by the
composition of the data set. Using class reference standards in both statistical procedures would
reduce the idea of manipulating the data set in order to obtain results that are more desirable.

120

REFERENCES

121

REFERENCES

1. Said HES, Tan TN, Baker KD. Personal identification based on handwriting. Pattern
Recognition 2000; 33: 149-160.
2. Kumar R, Pal NR, Chanda B, Sharma JD. Forensic Detection of Fraudulent Alteration in
Ball-Point Pen Strokes. IEEE Transactions on Information Forensics and Security 2012;
7: 809-820.
3. Jiang Y, Liu P. Feature extraction for identification of drug and explosive concealed by
body packaging based on positive matrix factorization. Measurement 2014; 47: 193-199.

122

6. Conclusions
6.1. Summary
6.1.1. Objectives and Goals
Previous work using statistical analysis has been carried out to determine a more
objective approach to fire debris analysis (1-3); however, the methods studied consistently used
commercially available ignitable liquids as reference standards. While these statistical
procedures have shown some success, the use of commercially available ignitable liquids as
standards can be a limitation. Association and classification of simulated fire debris samples can
be affected by the number and chemical diversity of reference standards in the data set.
Additionally, the large number of commercially available ignitable liquids means that including
each one in a data set is not practical. Further, selecting appropriate liquids to include is difficult
because of the chemical variation within an ASTM class. This research focused on developing
class reference standards that are characteristic of ASTM chemical classes with the intention of
using the standards in subsequent multivariate statistical analysis to provide a more standardized
approach for the analysis of fire debris evidence and overcome problems associated with
selecting suitable reference standards to include in the data set for analysis. In addition, the
impact of the data set composition on successful association of simulated fire debris samples to
the corresponding standard using a variety of multivariate statistical procedures was investigated.
6.1.2. Association of Fire Debris using PCA
Principal components analysis (PCA) has previously been used to associate simulated fire
debris samples to ignitable liquid standards (1,2). In this study, PCA was used as the initial

123

statistical procedure to investigate the utility of class reference standards and evaluate the impact
of data set selection for statistical analysis.
Commercially available ignitable liquid standards from three different ASTM classes
were used and class reference standards from the gasoline and petroleum distillate class were
developed all of which were spiked onto Kimwipes® for analysis. Class reference standards
representative of the gasoline and petroleum distillate classes were developed as these ignitable
liquids are commonly found in fire debris evidence. In addition, simulated fire debris samples
containing ignitable liquids spiked onto either nylon carpet with carpet padding or treated wood
flooring were generated. All standards and simulated fire debris samples were passive-headspace
extracted following procedures recommended by ASTM International and analyzed by gas
chromatography-mass spectrometry (GC-MS).
First, PCA was performed on two data sets containing commercially available ignitable
liquids. One data set, referred to as the chemically diverse data set, contained three gasoline, six
petroleum distillate, and two isoparaffinic standards while the second data set, referred to as the
refined data set, contained three gasoline and four petroleum distillate standards. The simulated
fire debris samples were then projected onto the resulting scores plots. In addition, Euclidean
distances were calculated between the scores of the fire debris samples and the ignitable liquid
standards. Using the chemically diverse data set, association of the simulated fire debris to the
corresponding ignitable liquid (diesel) was not possible; however, successful association to the
corresponding ASTM class was possible. When PCA was performed using the refined data set,
association to the corresponding ignitable liquid (diesel) was successful.

124

A third iteration of PCA was performed using the class reference standards to generate
scores and loadings plots and then projecting the simulated fire debris samples onto the scores
plot. Association to the corresponding ASTM class was possible despite chemical differences of
the class standards when compared to commercially available ignitable liquids. However, the
class reference standards were well distinguished using PCA due to the use of fewer more
representative standards making association relatively easy.
Association to the corresponding ignitable liquid or ASTM class was possible using both
commercially available reference standards and the generated class reference standards.
However, the association was affected by the data set selected where different results occurred
with different data sets. Due to the large number of commercially available ignitable liquids,
including each one in a data set is neither practical nor feasible. While association was possible
using class reference standards, PCA does have some limitations for this application. Visual
interpretation of the PCA scores plot may be subjective and an additional metric (in this case, the
Euclidean distance) was required to quantitatively confirm association. When calculating
Euclidean distance, the number of PCs used was based on 95% of the variance and was used to
determine which standard the fire debris samples was most closely associated. Although PCA
takes all of the dimensions of the data set into account to calculate scores, only two or three
dimensions can be assessed simultaneously. However, PCA is beneficial as the loadings plots
that are generated for each PC can be used to identify the variables contributing the most to the
variance.

125

6.1.3. Association of Fire Debris using HCA
Hierarchical cluster analysis (HCA) is another statistical procedure that has previously
been used to associate simulated fire debris samples to the corresponding ignitable liquid (2). In
this research, HCA was also used to investigate the impact of data set selection on association of
fire debris samples to corresponding reference standards and to investigate the utility of class
reference standards for this purpose.
First, HCA was performed on the two data sets containing commercially available
ignitable liquids with varying levels of chemical diversity and the simulated fire debris samples.
Using both of these data sets, association of the simulated fire debris to the corresponding ASTM
class was successful; however, association to the specific ignitable liquid could not be
determined due to the order of clustering. For example, the simulated fire debris samples first
grouped to a cluster containing the heavy petroleum distillate standards (diesel and fuel injector).
As diesel and fuel injector are chemically similar, these standards clustered to one another first
before the simulated fire debris was clustered. As a result, the specific ignitable liquid the
simulated fire debris was most similar to was not determined. Using class reference standards,
successful association to the corresponding ASTM class was possible.
Using HCA, association was not largely affected by chemical diversity of the data set
although it was not possible to determine if association to the specific ignitable liquid was
affected. In addition, HCA provided a similarity level as a means of quantitative association and
no additional metrics had to be calculated. Furthermore, HCA is beneficial as all dimensions of
the data are displayed in a single dendrogram and the order in which clustering occurs is more
straightforward than visually interpreting PCA scores plots. However, the similarity levels are

126

only relative to a given data set and no output is provided in HCA to identify the variables
contributing to the clustering observed. As a result, raw data must be interpreted to hypothesize
why clustering occurs the way in which it does.
6.1.4. Association of Fire Debris using k-NN
k-Nearest Neighbors (k-NN) was also investigated as an additional statistical procedure
that has not been previously used for applications in fire debris analysis. k-NN is a classification
procedure that can be used to classify unknown samples to a defined class of samples of known
origin. In this research, k-NN was used to investigate the impact of data set selection for
classification and to demonstrate the utility of class reference standards.
Using the chemically diverse data set, the fire debris samples were incorrectly classified
to the specific ignitable liquid, but correctly classified to ASTM class using all values of nearest
neighbors. Using the refined data set, classification of simulated fire debris samples to the
corresponding ignitable liquid was possible to an extent. Using only one nearest neighbor, the
correct classification rate was 11%; however, as the number of nearest neighbors increased, the
classification rate increased to a maximum of 67%, using 7 nearest neighbors. The remaining
simulated fire debris samples that were incorrectly classified were still classified to the
corresponding ASTM class (i.e., petroleum distillate). However, when k-NN was performed
using class reference standards, 100% of fire debris samples were correctly classified using all
values of nearest neighbors.
Similar to PCA, k-NN classification was affected by variations in chemical diversity of
the data set. While classification was successful using class reference standards, no output was
provided to determine the variables contributing to classification Also the goodness of the class
127

fit for the standards could be assessed but could not be evaluated for the simulated fire debris
samples.
6.2. Future Work
The multivariate statistical procedures investigated in this research provided similar
association and classification results, but each had advantages and disadvantages over one
another. Although performing the proper statistical procedure is important, standardizing the data
set used as the reference standards is more important for an objective approach.
To further standardize the statistical analysis of fire debris evidence, the current class
reference standards need to be further developed to include additional characteristic compounds
in each of the standards. Moreover, additional class reference standards representative of other
ASTM classes should be generated to be more representative of the commercially available
liquids on the market. Multivariate statistical procedures then need to be applied using the newly
developed class reference standards to investigate the potential of a more standardized approach.
Overall, this research highlighted the potential utility of class reference standards for a
more objective and standardized approach. Developing a set of standards useful for a statistical
approach could make fire debris analysis more reliable, could help reduce the potential of false
positive and negatives, could aid in convincing a jury, and would satisfy the Daubert standard.
While multivariate statistical procedures are not currently used in forensic laboratories,
developing a reliable standardized approach would help if these methods were ever
implemented.

128

REFERENCES

129

REFERENCES

1. Baerncopf JM, McGuffin VL, Smith RW. Association of ignitable liquid residues to neat
ignitable liquids in the presence of matrix interferences using chemometric procedures.
Journal of Forensic Sciences 2011; 56: 70-81.
2. Prather KR, McGuffin VL, Smith RW. Effect of evaporation and matrix interferences on
the association of simulated ignitable liquid residues to the corresponding liquid standard.
Forensic Sciences International 2012; 222: 242-251.
3. Tan B, Hardy JK, Snavely RE. Accelerant classification by gas chromatography/mass
spectrometry and multivariate pattern recognition. Analytica Chimica Acta 2000; 422:
37-46.

130