_. :iu nag-r . r o guwwdvaafinvmxh« , s {5%}. i‘ .| v ‘ s. 1.. . r .1... 2.3., mHrs“. arty. iv - t. a , .1 : ¢ 7213:... L. i 2 ‘!‘~3‘mé . 6‘. x32: 23%." : nun: .. 1.3% a. Ll“!-l~ 3“: u i 5.1:... t $ fist: .1. .. .z,. 1. T3... n . i3. . ‘61.“ t . .9 van“: ‘3'?! . I‘fiIJflm . .t u‘ a, 1.5.9}! 1.. 11.5. saw?” .2. I- (5.5. 1;. »'l. I'- . {A , \u. 19 I... i 5.11 ‘9. vindl‘. u (z. :39... ‘5 2 .2 : . 5.: . 1... as. six V . I. is. ‘2. . a 32..., 0.1L 3‘1...— II. 1!... ..i.3 but. .11 (p. 2.‘ Juaflm mmwhr .1...» LIBRARY :4 I Michigan State <1 , 5* 1. 7 ‘ UI IiVBI‘SItY This is to certify that the thesis entitled A GENETIC ASSAY FOR ESTIMATING SCALP HAIR PIGMENTATION FROM FORENSIC SPECIMENS presented by Sara Lynn Jubelirer has been accepted towards fulfillment of the requirements for the MS. degree in Forensic Science Major Professor’s Signature MSU is an Affirmative Action/Equal Opportunity Employer PLACE IN RETURN Box to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 5/08 K:IProj/Aoc&PrelelRC/Date0ue.indd A GENETIC ASSAY FOR ESTIMATING SCALP HAIR PIGMENTATION FROM FORENSIC SPECIMENS By Sara Lynn Jubelirer A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE Forensic Science 2010 ABSTRACT A GENETIC ASSAY FOR ESTIMATING SCALP HAIR PIGMENTATION FROM FORENSIC SPECIMENS By Sara Lynn Jubelirer DNA profiling methods have undergone vast improvements in the past two decades, yet all still require a reference sample for comparison. To overcome this, research into using DNA analyses to create a physical profile for unknown suspects or victims that can aid in their identification has begun. The purpose of this thesis project was to develop a preliminary genetic assay for estimating an individual’s scalp hair color from a biological sample. Multiplex PCR and SNP primer extension assays were developed, and eight SNPs in three genes (rsl426654 in SLC24A5 ; rsl6891982 and r326722 in SLC45A2; rs7495174, rs4778138, rs4778241, r51800404, and rsl448484 in OCA 2) were genotyped for 85 DNA samples from Afiican American individuals. A statistical model for estimating scalp hair pigmentation from the SNP genotypes while controlling for ancestry was generated. ANCOVA analyses revealed that 0CA2 SNPs rsl800404, rs7495174, rs4778138, and rs4778241, and SLC45A2 SNPs rsl448484 and rsl6891982 had the most significant effects on hair pigmentation. SNP rs7495174 had the only significant effect individually (p _<_ 0.022), while the others correlated via SNP— SNP interactions (p-values ranging from 0.005 to 0.049). The corrected model explained 66.7% percent of the variation in hair pigmentation (p = 0.017; 1 - B = 0.932) in the individuals studied, providing groundwork for quantitatively predicting hair pigmentation in an unknown individual. This is dedicated to my parents, Donald and Aileen, who taught me to persevere, and in loving memory of my mother. iii ACKNOWLEDGMENTS This work would not have been possible without the support of my professor, Dr. David Foran. I am grateful for the years of patience and guidance he has provided during completion of this thesis and my graduate education in forensic science. 1 would also like to thank Dr. Mark Shriver and Doctoral Candidate Ellen Quillen of the Pennsylvania State University Anthropological Genomics Laboratory for their knowledge and insight throughout this project, as well as for providing the samples necessary for its completion. Thanks to Dr. Todd F enton and Dr. Vince Hoffman for their time and effort in sitting on my thesis committee. Finally, thanks to my friends and family for their love and support during my years at Michigan State University, without whom my sanity would have suffered tremendously. iv TABLE OF CONTENTS LIST OF TABLES ............................................................................................................ vii LIST OF FIGURES ......................................................................................................... viii INTRODUCTION .............................................................................................................. 1 Forensic DNA Analysis of Crime Scene Evidence ........................................................ l Forensic Identification of Unknown Remains ................................................................ 1 DNA Analysis to Estimate Characteristics of an Unknown Individual .......................... 2 A DNA-Based Biological Profile: Molecular Photofitting ............................................. 3 Biogeographical Ancestry ............................................................................................... 5 Correlation between Ancestry and Pigmentation ........................................................... 6 Melanogenesis in Human Hair ....................................................................................... 8 Classifying Hair Color and Measuring Hair Pigmentation ........................................... 13 Single Nucleotide Polymorphisms ................................................................................ 17 Forensic SNP Analysis ................................................................................................. 20 Pigmentation Genes of Interest ..................................................................................... 21 Solute Carrier Family 24, Member 5 ............................................................................ 22 Solute Carrier Family 45, Member 2 ............................................................................ 24 The P Gene .................................................................................................................... 25 Current Molecular Photofitting Technology ................................................................. 29 Goal of This Study ........................................................................................................ 30 MATERIALS AND METHODS ...................................................................................... 32 Blood Specimen and Data Collection ........................................................................... 32 DNA Purification .......................................................................................................... 34 Primer Design ............................................................................................................... 34 DNA Amplification ...................................................................................................... 38 Control DNA Sequencing ............................................................................................. 38 PCR Product Purification for SNP Analysis ................................................................. 40 SNP Primer Extension .................................................................................................. 40 Statistical Analyses ....................................................................................................... 42 RESULTS ......................................................................................................................... 44 PCR Optimization ......................................................................................................... 44 SNP Primer Extension Reaction Optimization ............................................................. 45 Positive Control DNA Sequencing ............................................................................... 49 DNA Purification, Amplification, and SNP Primer Extension .................................... 51 Interpretation of SNP Electropherogram Abnormalities .............................................. 61 Combining SNP Multiplexes ........................................................................................ 63 ANCOVA Results, Descriptive Statistics, and Model Formulation ............................. 64 DISCUSSION ................................................................................................................... 74 SNP Descriptive Statistics, SNP Correlation with Hair Pigmentation, and Experimental Model Statistical Significance ...................................................................................... 79 CONCLUSIONS ............................................................................................................... 94 BIBLIOGRAPHY ............................................................................................................. 96 vi LIST OF TABLES Table l: Amplification and SNP Interrogation Primers for Multiplex A ......................... 36 Table 2: Amplification and SNP Interrogation Primers for Multiplex B ........................ 37 Table 3: Concentrations for Multiplex A and B SNP Interrogation Primer Mixes .......... 41 Table 4: Validation of multiplex A and B SNP allele analysis ......................................... 51 Table 5: SNP genotypes and corresponding pigmentation and ancestry measurements for the 85 samples analyzed ...................................................................................... 54 Table 6: Descriptive statistics for each SNP ..................................................................... 66 Table 7: Singe variable ANCOVA results for each SNP ................................................. 66 Table 8: Factorial ANCOVA summary for each 7-SNP model ....................................... 67 Table 9: Summary of the terms with significant effects on M index from each factorial ANCOVA ........................................................................................................... 68 Table 10: Model M4 ANCOVA results ............................................................................ 70 vii LIST OF FIGURES Figure 1: Worldwide map of human skin color distribution ............................................... 7 Figure 2: Relationship between percentage West African ancestry and skin pigmentation in three ethnic groups .......................................................................................... 8 Figure 3: Internal structure of a human hair ....................................................................... 9 Figure 4: Basic hair follicle structure ................................................................................ 10 Figure 5: The melanogenesis pathway .............................................................................. 12 Figure 6: Transmission electron micrograph depicting varying degrees of melanosomal maturation in human hair bulb melanocytes ................................................... 14 Figure 7: Hair color differentiation by reflectance spectroscopy ..................................... 1.6 Figure 8: The DermaSpectrometer .................................................................................... 17 Figure 9: Example of a DNA segment containing a SNP ................................................. 19 Figure 10: Figure I 1: Figure 12: Figure 13: Figure 14: Figure 15: Figure 16: Figure 17: Figure 18: Figure 19: The SNP primer extension method ................................................................. 21 Model depicting the function of the SLC24A5, and the buildup of calcium in melanosomes. .................................................................................................. 24 Effect of pH on melanin synthesis ................................................................. 26 Melanosomal membrane proteins affecting pH .............................................. 27 Optimization of PCR multiplexes A and B ..................................................... 44 Electropherograms using the original multiplex A SNP interrogation primers ......................................................................................................................... 46 Electropherograms showing optimization of SNP multiplex A using DRF DNA ................................................................................................................ 47 Electropherograms showing initial optimization of SNP multiplex B using KG DNA ................................................................................................................ 48 Electropherograms showing artifact elimination by use of a shortened rs477824l primer ............................................................................................ 49 Sample electropherogram depicting location of a SNP within sequencing analysis results ................................................................................................ 50 viii Figure 20: Figure 21: Figure 22: Figure 23: Figure 24: Figure 25: Figure 26: Figure 27: Figure 28: Electrophoresis gel showing PCR results for sample 060369 ......................... 52 Example electropherograms depicting clean results for both SNP multiplexes ......................................................................................................................... 58 Electropherograms depicting further optimization of peak heights during sample analysis ............................................................................................... 60 Example electropherograms in which a primer extension was repeated for a single locus ...................................................................................................... 60 Example electropherograms from a sample with inconclusive rsl426654 genotype .......................................................................................................... 61 Example electropherogram from a sample injected with a broken capillary. .62 Example electropherograms where artifacts did not impede genotyping. ...... 63 Example electropherograms showing eight-plex analysis results ................... 64 Hair Color Sample Ring .................................................................................. 92 Images in this thesis are presented in color. ix INTRODUCTION Forensic DNA Analysis of Crime Scene Evidence Forensic DNA profiling methods have improved substantially in both the speed of analysis and power of discrimination, but one downfall remains; none of these methods are useful in identifying individuals without a reference sample for comparison, which greatly impedes investigations in which suspects are not immediately apparent. According to the Bureau of Justice Statistics (2004), an extensive police investigation is sometimes necessary before a suspect can be apprehended, and often no suspects are identified and no DNA reference sample obtained for comparison to crime scene samples, precluding successful identification. In cases without suspects, the Combined DNA Index System (CODIS, the F BI’s national DNA profile database used by law enforcement agencies) may be useful, but only to a certain extent. In 2007, crime scene evidence DNA profiles reached a total of 203,401 in the CODIS Forensic Index, and the corresponding offender hits summed to 49,813 (USDOJ 2008), indicating a large amount of crime scene evidence remained unmatched within the database. Forensic Identification of Unknown Remains When remains of an unknown individual are discovered, they are transported to a morgue for attempted identification and, potentially, autopsy. Friends or relatives may be asked to visually identify the decedent, although this is not the preferred method for a positive, secure identification. Distraught loved ones can make false identifications (reviewed by Rhine 1998; Tidball—Binz 2006), so for a more objective approach, medical examiners, forensic anthropologists, or forensic odontologists are called upon. Antemortem records, including dental records, fingerprint references, hospital treatment records, and x-rays may be used for comparison, along with known physical features such as scars and tattoos. However, instances of decomposed, disfigured, or partial remains sometimes impede identification. If the decedent is beyond visual recognition, or if only partial remains are found, visual inspection and antemortem records may not be useful in identification. When remains cannot be identified by conventional means, DNA analysis may be performed. In 2000, the FBI began development of the National Missing Person DNA Database (USDOJ 2008), creating CODIS indices for DNA profiles of missing persons, biological relatives of missing persons, and unidentified human remains. Once a DNA profile is obtained, a comparison can be made to profiles in CODIS to search for matches with unidentified or missing persons or their relatives. However, in most cases, DNA profiles generated from unknown remains do not match profiles in CODIS. According to The 2004 Bureau of Justice Statistics Census of Medical Examiner and Coroners' Offices, approximately 4,400 unidentified human decedents are discovered every year, about 1,000 of which remain unidentified (Ritter 2007). Even if DNA analysis is utilized, decedents can remain unidentified if there is no reference sample for comparison. DNA Analysis to Estimate Characteristics of an Unknown Individual The use of DNA to estimate the physical characteristics of an individual, producing a physical profile or “fuzzy photo”, is a fairly recent topic of research. Just as forensic anthropologists estimate an individual’s age, stature, ancestry, and sex from skeletal remains, DNA analysts could potentially estimate physical characteristics including sex, ancestry, stature, dental features, and hair, skin, and eye pigmentation from biological traces. A benefit of this is that no reference sample is required. When remains cannot be identified through conventional means, a DNA-based physical description of the individual could aid police in narrowing potential matches from a missing person’s list and help in identifying the deceased. A biological profile of an offender could also be beneficial to police in identifying or eliminating suspects. If short tandem repeats (STRs; sequences of DNA that vary by copy number between individuals and are used in DNA profiling) are analyzed but no CODIS hits occur, a physical description would narrow the suspect pool. Lastly, police can encounter problems when eyewitness reports are inaccurate or intentionally misleading, whereas DNA analysis provides an objective depiction of a person’s characteristics. A DNA-Based Biological Profile: Molecular Photofitting The techniques used to predict an individual’s physical characteristics by DNA analysis are termed molecular photofitting (Frudakis 2008). The most accurate way to infer phenotype from DNA is to analyze markers of genetic ancestry as well as the genes functionally underlying the traits of interest. Pigmentation gene alleles are distributed as a function of ancestry, so measures of ancestry can aid in inferring phenotype from genotype (Shriver et a1. 2003; Bonilla et a1. 2004; Lamason et a1. 2005; Norton et a1. 2007). For example, there may be a higher correlation between ancestry and hair pigmentation in one group of individuals than in another. Traits also vary within groups, so directly analyzing the underlying genes for each trait is necessary for an accurate, sensitive and powerful phenotype determination. Over the last 100,000—200,000 years, easily apparent physical traits such as hair, eye, and skin pigmentation have been affected by social and environmental influences, such as natural selection (the process in which the organisms best adapted to an environment survive and reproduce), sexual selection (selection for traits induced by competition for mates), founder effects (loss of genetic variation due to establishment of a new, isolated population from only a few members of a larger population), migration (relocation to a new region, which may differ in environment) and genetic drift (random changes in trait frequencies, especially in smaller populations), making them easier to study genetically (reviewed by Tishkoff and Kidd 2004; Harris and Meyer 2006; McEvoy et al. 2006; Parra 2007; Westerhof 2007). In comparison to more complex traits (e.g. predisposition to certain diseases), pigmentation and some other physical traits have lower locus heterogeneity (fewer loci contributing to expression) and the genes have greater penetrance (allele expression). If ( 1) a specific physical trait (e. g. hair color) can be associated with a gene, (2) polymorphisms in that gene can be correlated with a measure of phenotype (in this case, a measure of hair pigmentation), and (3) a method can be developed to infer the phenotype value from each of the most penetrant genetic polymorphisms, then molecular photofitting can be used to estimate phenotype (Frudakis 2008). For hair color, the genetic markers of interest are those affecting pigmentation of the hair shaft, and the correlation for each locus can be determined. Hair pigmentation measurements are then associated with hair color, allowing an estimate of the physical appearance of the hair. The two components of error for this task are the error in predicting hair pigmentation based on the marker genotype, and the error in infening hair color from the hair pigmentation measurements (F rudakis 2008). Biogeographical Ancestry The human species is distributed into populations, or groups of individuals who interbreed with each other to a greater degree than with members of another group. A population was further defined by Frudakis (2008) as a group of individuals whose similarity is greater than the overall similarity among groups, and whose geographical, historical, cultural, or genetic differences allow for the calculation of distinct allele frequencies. A measure of an individual’s genetic similarity to reference populations (e. g. Europeans, West Africans, East Asians, and Native Americans), gauged using various markers throughout the genome, is known as biogeographical ancestry (Frudakis 2008). While many human phenotypes (e. g. stature, handedness, shoe size, earlobe attachment, and some facial features) vary or overlap across populations, some phenotypes tend to vary with ancestry, resulting in populations in which members have similar physical characteristics. When individuals from two reproductively isolated populations interbreed, gene flow (transfer of genes between populations) or admixture occurs, resulting in a change in population allele frequencies. This process leads to admixture linkage disequilibrium (ALD), a nonrandom association of alleles at two or more loci, and admixture stratification (AS), a genetic structure of allele differences within admixed groups due to variation in individual ancestry (Halder and Shriver 2003). AS occurs when some individuals have more genomic ancestry from one parental population, while others have more genomic ancestry from the second parental population. The large amount of AS in admixed individuals allows the study of correlations between phenotype and genomic ancestry (Halder and Shriver 2003). For example, some genetic markers are in ALD with the alleles for skin color, enabling identification of genes producing skin color differences between populations (Shriver et al. 2003). Correlation between Ancestry and Pigmentation While it has been reported that humans are relatively undifferentiated in comparison to many other species (e. g. the great apes; Kaessmann et a1. 2001), many populations have divergent traits. One of the most obvious differences is human pigmentation. The fixation index (F,,), a measure of genetic differentiation between populations, is 0.6 at pigmentation loci versus at most 0.15 for selectively neutral autosomal loci (reviewed by F rudakis 2008). These different pigmentation levels came about through genetic adaptation, as humans migrated out of Africa to other environments. Populations residing in regions surrounding the equator adapted to produce more melanin (Figure 1), the pigment that imparts color to hair, skin and eyes. Melanin acts as a sunblock, absorbing ultraviolet (UV) radiation, especially the shorter wavelength ultraviolet B (UVB) radiation at around 300 nm, which is damaging to both DNA and proteins (reviewed by Parra 2007). Toward the polar regions of the earth, the sun remains low in the sky and ultraviolet rays are absorbed by more atmosphere before reaching the earth’s surface (Hess 1926; Reed 1927; Roberts and Kahlon 1976). The equator is exposed to over 1000 times more UVB radiation than the poles (Sparling 2001), so for those living in equatorial regions, darker pigmentation helps to prevent skin cancer. Additionally, melanin may protect against the photodegradation of nutrients such as folate (Jablonski and Chaplin 2000). Populations living closer to the poles are subject to other selective pressures, specifically that to produce sufficient vitamin D. UVB radiation in sunlight is essential for the conversion of 7-dehydrocholesterol into previtamin D, which is then isomerized into its active form (Jablonski and Chaplin 2000). If more melanin is present in the skin, longer exposure to UVB is necessary for previtamin D production. In lower UV environments, individuals are more susceptible to vitamin D deficiency, and consequently also to conditions such as rickets and osteoporosis (Jablonski and Chaplin 2000). These populations therefore adapted to produce less melanin, in order to prevent vitamin D deficiency (Figure l). represert darker sk'n color :1142 [:l12-14-24-26 1:]1547-27-29 [juice-30+ Figure 1: Worldwide map of human skin color distribution. Created from data on native populations collected by Biasutti prior to 1940. High numbers represent darker skin pigmentation. Source: (O’Neil 2007). Human Skin Color Distribution Research by Shriver et a1. (2003) demonstrated a significant positive correlation between skin pigmentation and West African ancestry (Figure 2); the higher the percentage of West African ancestry (compared to European ancestry), the greater was the tendency to have darker skin pigmentation. The range of pigmentation in those with greater West African ancestry was also much larger than those with higher European ancestry. Darkest- - 0African American 6 x forican Caribbean 0 - ' x oEuropean American (”a ‘9: X C ' C a. ”*0 O , 2 _ o x g ,- a *«Pff g. .9 ”x ’“30 e 0:) x o (5 xcéhfi“ , E x o C 2*dPEKj-i'3‘i" J K O) u X, I or. .{4 — C _'. 1 . ‘K J CL . o ‘3‘ ‘Eow .45.»... . C o O C r; u I 2 o o C (b C 0 ° ° 0 0 0 ° C :1 x o o (I) 0800 °V c- o x 8oo°§ ‘20 o : ° 0 Lightest Y Y . f o 10 20 30 4o 50 6O 70 80 90 100 °/o African Ancestry Figure 2: Relationship between percentage West African ancestry and skin pigmentation in three ethnic groups. Each symbol represents one individual. Skin pigmentation levels were quantified using a spectrophotometer and ancestry was determined using 34 genetic ancestry markers. Adapted from: (Shriver et a1. 2003). Melanogenesis in Human Hair Hair is composed of dead, comified cells, which form keratinized filaments. The main component of hair is keratin, an alpha-helical protein with high cysteine content, at approximately 80% w/w (de Cassia Comis Wagner et a1. 2007). A human hair is divided into three structural units (Figure 3): the translucent outer layer of scales known as the cuticle; the middle layer, or cortex, where the melanin granules are found; and the innermost core, known as the medulla, which is only present in some hairs (Deedrick and Koch 2004). Figure 3: Internal structure of a human hair. Shown are the cuticle (outermost layer), cortex (middle layer), and medulla (innermost layer). The cortex contains pigment granules, cortical fusi (spaces filled with air), and ovoid bodies (dark structures of unknown origin). The medulla may be fragmentary (shorter fragments with more space present than medulla), discontinuous (stretches of medulla with short spaces in between; pictured above), continuous (no spaces or breaks), or absent. The proximal end is in the direction of the root and the distal end is in the direction of the tip. Source: (Deedrick and Koch 2004). The cells that are responsible for hair pigmentation are follicular melanocytes. Active follicular melanocytes are located in the hair bulb above the dermal papilla, a small projection into the epidermis in the hair follicle, as well as in the wall of the infundibulum, the uppermost part of the follicle (Figure 4). The follicular melanocytes synthesize melanin and package it into cytoplasmic organelles called melanosomes. As they mature and accumulate pigment, these organelles are transported through dendritic processes (appendages) of the melanocytes and are secreted into the hair’s keratinocytes as it grows, resulting in pigmentation of the hair shaft (Okazaki et al. 1976; Singh et al. 2008). The melanosomes break open after they are secreted into the growing hair follicle, and the cytoplasmic components including melanin granules become embedded in the cortex of the keratin matrix. The number of melanocytes in individuals is approximately constant, but the degree of pigmentation can vary depending on the type of melanin and the size, shape and density of the melanosomes (studied thus far in skin and eyes; Staricco and Pinkus 1957; Wilkerson et al. 1996; Prota et a1. 1998; Alaluf et al. 2002; Thong et al. 2003). Figure 4: Basic hair follicle structure. Arrows point to active melanocytes. Melanocytes in the bulb and the infundibulum (upper wall of follicle) transfer melanosomes to the neighboring keratinocytes as the hair grows. Adapted from: (Slominski et al. 2005). There are two classes of melanin: eumelanin, the brown/black pigment contributing to the appearance of darker pigmentation, and pheomelanin, the red/yellow pigment contributing to the appearance of lighter pigmentation. Eumelanosomes are 10 larger, ellipsoidal organelles with a highly organized glycoprotein matrix in which organized fibers (on which melanin is deposited) allow the ordered aggregation and concentration of melanin. Pheomelanosomes are smaller, spherical organelles containing a glycoprotein matrix in which the fibers are more disordered, and melanin is more loosely aggregated (reviewed by Sturm et al. 2001 ). The initial precursor in melanogenesis (melanin biosynthesis) is tyrosine, and upon production of the intermediate product dopaquinone, the pathway (Figure 5) splits into eumelanogenesis and pheomelanogenesis (Ito 1993; Kobayashi et al. 1995). Greater eumelanogenesis results in darker hair colors, and greater pheomelanogenesis results in lighter hair colors. 11 A32 .3 8 £3.39on ”Sow 332% .5505 8a 2855? Ema—2:023 was £859? :Eflofism :33..an mica—035.0:— 2; "m 9.sz 55.22.85 5532.5 8...... + .228 2389.8 #535353. _. n... “28536.28... NEE 2 $3.6 5.25 SE 2 assessed a m wEoEu<¢Oo «225588.23 3255.8 m : 5.8.3.35 :08 R! H m a 8.8.5 accesses (moo 058.5 .3... s. a”; 1M... ”.19.. 14.... h amasmoihq m $239.3. w . 0 I I ole-In-alII-u-ouu00.uuncle-II-I-IOIIIlo-OOII-no.oInco-canoes-Increase...na...ColonialuuloaoeIIIc-on....uuniece-OI-uo-n-n‘n.o - ------------------------------------- 12 Classifying Hair Color and Measuring Hair Pigmentation One method used to differentiate human hair is high resolution microscopy (e. g. electron microscopy), with which each hair color is characterized by the size and shape of melanosomes and the deposition of melanin granules within the melanosomes (Figure 6; reviewed by Ortonne and Prota 1993, Frudakis 2008, and Tobin 2008). Red hair contains mainly pheomelanosomes (Liu et al. 2005), has a higher pheomelaninzeumelanin ratio than other hair (Borges et al. 2001), and exhibits melanin deposition that is granular and spotty (J imbow et al. 1983). Blond hair has an equal amount of melanosomes as brown and black hair, but they are smaller with a very low concentration of melanin; the melanocytes are not fully melanized (matured), which is apparent both in the soma and the dendrites (reviewed by Ortonne and Prota I993; Frudakis 2008; Tobin 2008), where melanosome maturation occurs. In light brown hair, follicular melanocytes contain melanosomes of intermediate size and melanin deposition. Brown or black hair has large, elliptical melanosomes that contain concentrated melanin; all developmental stages of melanosomes can be detected. In gray or white hair there are few melanocytes, and the melanosomes contain little or no melanin (Commo et al. 2004). While every hair color typically contains both eumelanin and pheomelanin, there is an inverse linear correlation between their concentrations from blond to black hair (Borges et al. 2001; Shekar et al. 2008 A). 13 Figure 6: Transmission electron micrograph depicting varying degrees of melanosomal maturation in human hair bulb melanocytes. Early pre-melanosomes (I and II), maturing melanosomes (III) and fully mature, ellipsoidal melanosomes (IV) are all shown. Source: (Tobin 2008). Hair color is ofien described by assigning a broad qualitative category. For example, on a Michigan State Police crime report form, an offender is classified as having “Blonde/Lt. Brown”, “Brown/Dark”, “Black”, “Gray/Partial Gray”, “Red/Aubum”, or “Dyed” hair color (Findlater et al. 2007). Although it is simple to categorize hair color qualitatively, this method is subjective and imparts a potential source of error, as hair color actually exists in a continuous spectrum of shades. The quantification of hair pigmentation is much more precise and accurate, and could also be utilized in conjunction with a categorical description for a strong characterization of hair color. Quantification is accomplished by the measurement of melanin, making the data, as biochemical quantities instead of color categories, more physiologically meaningful. Hair pigmentation measurement must be as accurate and precise as possible to limit error in molecular photofitting. One method of quantifying hair pigmentation is biochemical analysis, e. g. using high performance liquid chromatography to measure melanin degradation products. However, this method is relatively time-consuming, expensive, destructive, and requires equipment that may not be available in a forensic laboratory. Another objective method for quantifying hair pigmentation is spectrophotometry, which is easy to carry out, time efficient, and requires only a spectrophotometer. The main method used on hair is reflectance spectroscopy, in which the hair sample is illuminated with a controlled amount of full-spectrum light, and the light reflected back at the visible wavelengths is detected (reviewed by Shriver and Parra 2000; Parra 2007). Each hair color reflects light back at distinct wavelengths. Reflectance for hair peaks at approximately 650 nm, but the intensity at that wavelength differs for each hair color. Shades of blond have the highest reflectance, at 1,000—3,000 reflectance units, followed by light brown and red hair, which overlap between 800 and 1,500 reflectance units, while brown. and black hair exhibit the lowest reflectance at about 200—600 reflectance units. The calculation of apparent absorbance (which adjusts for the absorbance levels of hair containing no pigment) from reflectance data can also be utilized to study hair pigmentation, demonstrating spectrophotometric differences for distinct hair types (Figure 7). The spectral reflectance (reflectance at each wavelength over the visible spectrum) is measured, with gray hair used as a blank. Apparent absorbance is calculated using the equation: AA = 10 PRssoblank] PRssoobject where Pngoblank is the percentage reflectance of the gray hair sample, and PR5goobject is the percentage reflectance of the hair being studied. 15 1.65 ”-0 4‘_q—GV o—é—o—o . -, _° *1»—.-_ ”_° ,.- a . a-o—o—o u n o .. . +0 “3:3 ‘ ,Ar Jro '0 m. 1 45 .(P_: .u-“ PT” &%9 \r—9—. -( "—9 o.“ '0 . “— -e' ' — __‘_ 125 ' .4 “re-~— ' * - e _ '7 ’ § m2}; .‘T‘wx- -2__‘ .—-’ “-‘_ " --. T‘—\ 4 o—O—O—C—C ‘~ ' ~-;' , “~ g 1 05 a u L r. :“ 0.:‘..“ -\.§‘ ‘ -- _‘ g s—H baat‘;& ‘: ___‘_-‘--_‘ H 035 35‘5"“: H‘ - g - =8—fl+5-F ‘F .1“ h- 4B" #1. a ._ .58 f““'r+q ‘1 u 2' _‘*= 3 =0; 065 “‘9‘“ 9 QT; "'1:‘\ N. n‘ \ E g "is ‘.u_‘:;&_ 045 “LG...” 025 I J 500 550 600 650 Wavelength # 8 A 01 0 Figure 7: Hair color differentiation by reflectance spectroscopy. Apparent absorbance for thirteen individuals was measured: three individuals with black hair (open circles). four individuals with brown hair (lines with no symbols), three individuals with red hair (filled circles), and three individuals with blond hair (open squares). The green and red outlined areas represent the green (568 nm) and red (655 nm) wavelengths at which the spectrophotometer takes readings. (Unpublished result courtesy of Dr. Mark Shriver, Department of Anthropology. The Pennsylvania State University.) A convenient tool for measuring pigmentation is the narrow-band reflectometer. One model is the DermaSpectrometer (Cortex Technology, Hadsund, Denmark), a handheld instrument designed to measure skin and hair pigmentation (Figure 8). This instrument has two light-emitting diodes, each of which has a narrow band of emitted wavelengths (Shriver and Parra 2000; Frudakis 2008). The first band is green light, centered around 568 nm, and is used to detect hemoglobin, a second source of pigment in skin. Hemoglobin absorbs the most light in the green wavelengths and very little in the red ones, explaining why blood appears red The second band is red light, centered around 655 nm and is used to detect melanin. Melanin absorbs more evenly across the spectrum, so a reading at the red wavelengths is sensitive for measuring melanin, but not hemoglobin content (Shriver and Parra 2000; Frudakis 2008). The reflectance is gauged l6 by a photodetector (Park and Lee 2005). The DermaSpectrometer then computes and displays a measure of melanin content known as the M index value, which is calculated using the equation: M = rogm(_1_) % red reflectance Figure 8: The DermaSpectrometer. This instrument is a narrow-band reflectometer manufactured by Cortex Technology in Hadsund, Denmark. It measures reflectance and performs calculations to give readings for the erythema index, a measure of hemoglobin content, and M index. When measuring hair pigmentation. only the M index reading is used. Source: (Grove et al. 2007). Single Nucleotide Polymorphisms The genetic markers important to molecular photofitting consist of sequence variations occurring at a single base pair known as single nucleotide polymorphisms, or SNPs (Figure 9). Some SNPs underlie phenotypic variations and are usefiil in inferring the physical characteristics of an individual. SNPs are generally biallelic (having two possible alleles, e. g. adenine (A) and guanine (G)), yielding three possible genotypes per SNP (in this instance, AA, GG or AG). However, some SNPs are triallelic, and potentially even tetra-allelic (Hiiebner et al. 2007). SNPs comprise roughly 90% of all human genetic variation, and are found in both coding and non-coding regions of the genome (USDOE Office of Science 2007). About 63% of SNPs involve a nucleotide transition (comprising A/G and thymine/cytosine (T/C) SNPs on opposite strands); other types of variations that commonly occur are estimated at a distribution of 17% A/C and T/G SNPs, 8% OG SNPs, 4% A/T SNPs, while the remaining 8% comprise nucleotide insertions or deletions (Miller et al. 2001). SNPs occur every 100—300 base pairs (bp) in the three billion bp human genome (USDOE Office of Science 2007). The National Center for Biotechnology lnformation’s public database (dbSNP) contains information on over 14.7 million human SNPs (dbSNP 2008). A variation must occur in at least 1% of humans for it to be considered a SNP (USDOE Office of Science 2007). SNPs are likely the cause of most functional human genetic variation (reviewed by F rudakis 2008); a single non-synonymous SNP within a gene that alters the amino acid sequence can change the three-dimensional structure and functionality of the resulting protein. Furthermore, non-coding SNPs can affect transcriptional activity and RNA splicing, altering the amount or functionality of a protein. Some SNPs cause phenotypic differences within populations. For example, SNPs are used for disease diagnosis and variable drug response assessment in medical genetics, and in evolutionary genetics to study variation within and among populations. 18 A: Figure 9: Example of a DNA segment containing a SNP. At this SNP site, the DNA may exhibit a CG base pair (top strand) or an AT base pair (bottom strand). An individual may exhibit one or both of these SNP alleles, constituting a homozygote or heterozygote, respectively. Adapted from: (Hall 2007). SNP analysis was not traditionally practical for forensic scientists, since it has much lower discrimination power than STRs; around four times as many SNP loci are needed to obtain similar statistical information (reviewed by Sobrino et al. 2005). However, SNPs are more useful for analyzing degraded DNA (Biesecker et al. 2005) because the target for analysis is only a single base pair, and much shorter amplicons can be utilized (reviewed by Sobrino et al. 2005). Two types of SNPs are utilized in molecular photofitting: those that underlie phenotypic variation and those used to estimate an individual’s ancestry. SNPs known as ancestry-informative markers (AIMS) differ in allele frequencies between populations, and thus are useful in estimating the percent genetic makeup of major ancestry groups (e. g. West African, European, East Asian, and Native American) in an individual. The combination of information from AIMS and SNPs underlying physical variation provides for an accurate prediction of phenotype (reviewed by Frudakis 2008). Forensic SN P Analysis A SNP analysis method that is both efficient (allowing multiplexing, or analysis of multiple SNPs simultaneously) and convenient (using equipment commonly available) for use in forensic DNA laboratories is SNP primer extension (Figure 10). In this method, the loci of interest are first amplified by polymerase chain reaction (PCR), a technique in which a DNA sequence is replicated exponentially using the same components naturally occurring in cells. During the SNP reaction, an interrogation primer binds immediately adjacent to the SNP site of interest and is extended by one dye- labeled dideoxynucleotide (ddNTP). Each of the four types of ddNTPs is labeled with a different fluorescent dye, which is detected during analysis using capillary electrophoresis. An electropherogram is generated, revealing the dye detected and resulting SNP genotype. 20 .333. s g c ,5 3.3. lirilid' rule-q _7 ___ n 7 7 , . , —— ,— _. I. __ M_ _ _ .. ___ T Polymerase A in. C Extends Primer r—z“: .33....‘1 thTT‘I‘I 1:. l1 .. .21 :1 ' ' I . -m T C Figure 10: The SNP primer extension method. During a SNP primer extension reaction, the interrogation primer is extended with a dye-labeled ddNTP based on the nucleotide present at the polymorphic site. For example, a SNP site may have two possible alleles, adenine (A) and guanine (G). If the target DNA strand has a thymine (T, left) at the SNP site, the primer is extended by a fluorescent red- Iabeled adenine; if the target nucleotide is a cytosine (C, right), the primer is extended by a fluorescent green-labeled guanine. The dye is detected and utilized to generate an electropherogram by the genetic analysis system. Adapted from: (Sobrino et al. 2005). Pigmentation Genes of Interest Substantial research has been conducted on the genetics of human hair pigmentation (Graf et al. 2005; Lamason et al. 2005; Soejima and Koda 2007; Duffy et al. 2007; Branicki et al. 2008; Han et al. 2008; Shekar et al. 2008 B), and pigmentation gene allele frequencies in different populations are well characterized. While hundreds of genes are thought to affect human pigmentation, only a fraction is considered to be main contributors (Parra 2007). The corresponding gene products are involved in one of the following: gene transcription; substrate (e. g. tyrosine) availability; receptor-ligand interactions within signal transduction pathways regulating melanin synthesis or affecting melanoblast migration and differentiation; the deposition of melanin into the melanosomes; or the construction, packaging and transport of melanosomes (reviewed by 21 Parra 2007; F rudakis 2008). There are approximately 80 candidate genes affecting human pigmentation (McEvoy et al. 2006). Among those studied most are solute carrier 45 member 2 (SLC45A2), the P gene (OCAZ), solute carrier 24 member 5 (SLC24A5), dopachrome tautomerase (DC T), agouti signaling protein (ASIP), melanocortin 1 receptor (MC 1 R ), tyrosinase ( T YR) and tyrosinase-related protein 1 (TYRPI) (involved in melanogenesis); KIT ligand (KITLG), microphthalmia—associated transcription factor (WTF), a disintegrin and metalloproteinase domain 17 (ADAM! 7), and a disintegrin-like and metalloproteinase with thrombospondin type 1 motif, 20 (ADAMT S20) (involved in melanocyte development and differentiation), lysosomal trafficking regulator (L YST) (involved in protein transport and melanosome assembly), and myosin VA (MY 05A) (involved in melanosome transport) (reviewed by Sturm et al. 1998; Voise and Van Daal 2002; Desnos et al. 2007; Parra 2007). Genotype-phenotype association studies or functional studies have confirmed the role of at least three of these genes in normal pigmentation variation (reviewed by Parra 2007): SLC24A5, SLC45A2, and OCAZ. Eight SNPs, dispersed within the genes, have been correlated with hair, skin, or eye pigmentation, as detailed below. Solute Carrier Family 24, Member 5 SLC24A5 encodes a putative transmembrane, potassium-dependent sodium/calcium cation exchanger (Figure 11) that is localized to the melanosomal membrane (Lamason et al. 2005). A positive correlation has been seen between calcium accumulation and melanin content in melanocytes (Salceda and Riesgo-Escovar 1990). It is therefore inferred that mutations causing decreased activity in SLC 24A 5 lead to 22 decreased levels of intramelanosomal calcium, less melanin, and a lighter pigmentation phenotype (Muller and Kelsh 2006). Lamason et al. (2005) originally found that mutations in this gene led to a lighter, more yellow pigmentation in zebrafish, and so named it the “golden” gene. Their results also indicated that a SNP in the third exon of SLC24A5 (dbSNP lD: rsl426654) is significantly associated with skin pigmentation in African Americans (p = 3x106) and African C aribbeans (p = 2x104) and explains 25— 38% of the difference in skin lightening (gauged using the melanin index) between Europeans and. Africans (Lamason et al. 2005). This SNP also exhibited a significant effect on skin pigmentation in South Asians (p = 1.06 x10"8; Stokowski et al. 2007) and is nearly fixed in light-skinned Europeans (Lamason et al. 2005; Lao et al. 2007; Norton et al. 2007; Soejima and Koda 2007). Known alleles' are G (ancestral, associated with darker pigmentation) and A (derived, associated with lighter pigmentation), encoding an alanine or a threonine, respectively, at amino acid 11 l (Lamason et al. 2005; Norton et al. 2007). Norton et al. (2007) found frequencies of the derived allele to be 100% for Europeans and 46% for non-Europeans (including West Africans, Island Melanesians, South Asians, Native Americans, and East Asians). Soejima and Koda (2007) observed the frequency of the derived allele to be high in Europeans (0.975), low in the Chinese (0.019), and intermediate in some other Asian groups: the Sinhalese, Tamils, and Uygurs (0.500, 0.293, and 0.536, respectively). Allele frequencies were also low in West Africans (0.09) and Native Americans (0.08). It is not known, however, how much this SNP contributes specifically to hair pigmentation. ' Note that alleles for all SNPs described herein are reported in the forward orientation, as listed in dbSNP. Alleles reported may appear different than those in several of the works cited because they are described therein in the reverse orientation. 23 .g. \~/ N\a—'/Ca2+ Figure 11: Model depicting the function of the SLC24A5, and the buildup of calcium in melanosomes. The large oval represents the melanosomal membrane. The leftmost, shaded circle represents an ATPase, an enzyme that hydrolyzes ATP to ADP and inorganic phosphate (P.). The middle circle represents a sodium/proton exchanger. The rightmost circle represents the SLC24A5 protein, which couples sodium efflux with calcium (and potentially potassium) intake. Source: (Lamason et al. 2005). Solute Carrier Family 45, Member 2 SLC45A 2 encodes a transmembrane protein known as membrane-associated transport protein, or MATP. This protein is found in melanocytes and is thought to direct the passage of proteins (specifically tyrosinase, the enzyme that oxidizes tyrosine in the melanin synthesis pathway) to the melanosome (Newton et al. 2001; Kushimoto et al. 2003). Without proper transport of melanosomal proteins, melanin is not correctly polymerized, resulting in less pigmentation. SLC45A2 mutations are the cause of oculocutaneous albinism type IV, which has been studied in Turkish (Newton et al. 2001), German (Rundshagen et al. 2004), and Japanese (Inagaki et al. 2004) individuals. Two SNPs (dbSNP IDs: rs26722 and rsl6891982) in exons three (former) and five (latter) have been identified through skin pigmentation research (N akayama et al. 2002; Yuasa et al. 2004). Alleles of the former SNP are C (ancestral) and T (derived), encoding glutamate or lysine at amino acid 272. Alleles of the latter SNP are C (ancestral) and G (derived), encoding leucine or phenylalanine at amino acid 374. The derived allele of 24 SNP rs26722 (encoding 272Lys) is most frequent among Asians (Chinese: 0434; Sinhalese: 0.204; Tamils: 0.121), and is rare in Europeans (0.025) and Africans (Xhosans: 0.034; Ghanaians: 0.041). The derived allele of SNP rsl6891982 (encoding 374Phe) is widespread among Europeans (0.916) and rarely occurs (0—0.019%) in the other five groups (Soejima et al. 2006). Branicki et al. (2008) found the derived allele of SNP r526722 and the ancestral allele of SNP rsl6891982 (encoding 374Leu) to be significantly associated with black hair compared to other hair colors (p = 0.002 and p < 0.0001, respectively) in Poles; the odds of having black hair were 5.32 and 7.05 times greater for individuals with these alleles, respectively. Graf et al. (2005) observed that individuals homozygous for the r526722 derived genotype and rsl6891982 ancestral genotype in a sample including Caucasians, Asians, African Americans, and Australian Aborigines exhibited odds ratios (odds of one hair color occurring compared to another) of 43.23 and 25.53 for black hair versus blond hair (p = 0.0005 and p < 0.0001, respectively). The P Gene 0CA2 encodes an anion transporter protein that is localized to the melanosomal membrane and is instrumental in pH regulation of the melanosome (Puri et al. 2000; Ancans et al. 2001). Originally discovered as a human homolog to the mouse pink locus for eye color, the product of the OCAZ gene is known as the P protein. Ancans et al. (2001) found that a near neutral pH (6.8) provides the conditions for optimal melanin. synthesis (Figure 12). The pH of the melanosome was discovered to regulate the activity of tyrosinase, the rate of melanogenesis, the ratio of eumelanin to pheomelanin, and the 25 rate of melanosomal maturation. Results also indicated that melanosomal pH may vary with ancestry, a sign it could have a strong effect on pigmentation. Since the P protein mediates pH neutralization in the melanosome, it is thought to be an ion channel that acts to reduce the proton concentration, having an opposing function to the melanosomal proton pump. Therefore, the P protein and the proton pump control the melanosomal pH (Figure 13) to regulate melanogenesis (Ancans et al. 2001). 160- 140-3 1203 1004 80-3 60< 4033 203 OJ I V I j I U 4.0 4.5 5.0 5.5 0.0 0.5 7.0 Buffer pH Melanin (pg/ml) Figure 12: Effect of pH on melanin synthesis. Maximal melanin synthesis occurs at a pH of 6.8. Source: (Ancans et al. 2001). 26 Eumdanln Phaeomelanln Figure 13: Melanosomal membrane proteins affecting pH. The proton pump introduces protons into the melanosome, and the P protein releases them to maintain a neutral pH. These transmembrane proteins may vary in abundance. activity and polymorphism per individual or population. acting to control melanosomal pH, and thus the amount of melanin synthesis. L-Tyr represents the substrate tyrosine. Source: (Ancans et al. 200]). 0CA2 has been identified as the major contributing gene to eye color variation in individuals of European ancestry (Zhu et al. 2004; Duffy et al. 2007). Three 0CA2 SNPs (dbSNP IDs: rs7495174, rs4778241, and rs4778138) in intron 1 were found to be significantly associated with eye color (p = 1.02x10'6', 1.57x10'96, and 4.45x10’54, respectively) in a group of Australian adolescents, their siblings, and their parents (Duffy et al. 2007); these polymorphisms were also significantly associated with hair color (p = 7.1x10'7, 8.5x10"5, and 2.1x10"2, respectively; Han et al. 2008) in Australians and Americans of mainly European ancestry. The three SNPs occur in one major haplotype block, in which they are in linkage disequilibrium (LD) and tend to be inherited together. Known alleles are: G and A for SNP rs7495174; C and A for SNP rs4778241; and G and A for SNP rs4778138. The AC A haplotype was associated with lighter overall pigmentation (Duffy et a1. 2007). The ACA/ACA diplotype was the major contributor to 27 lighter eye color, with a frequency of 0.905 in individuals with blue/green eyes and a frequency of only 0.095 in individuals with brown eyes; this genotype was also more frequent in individuals with light brown hair and fair or medium skin types (Duffy et al. 2007). Shekar et al. (2008 B) also observed an association of the ACA haplotype with lighter hair, and the G allele of SNP rs4778l38 alone was significantly associated with darker hair (p ~ 3x104) in Australian adolescents. Research by Frudakis et al.. (2003) revealed that a SNP in OCAZ (dbSNP ID: rs I. 800404) was significantly associated with iris color (p < 0.01) in individuals with. mainly European ancestry. Shriver et al. (2003) found this SNP to have a significant effect (p = 0.045) on skin pigmentation in African Americans, although not in African Caribbeans. Alleles are G (ancestral) and A (derived), both of which encode an alanine at amino acid 355. Norton et a1. (2007) observed the derived allele frequency to be highest in Europeans (0.58), intermediate among Native Americans (0.38), East Asians (0.37), and South Asians (0.29), and lowest in Island Melanesians (0.18) and West Africans (0.04), and that the F3. between West Africans and Europeans was 0.516 (p = 0.039). Lao et al. (2007) associated a different SNP (dbSNP ID: rsl448484) in an OCA 2 intron with skin pigmentation (p = 3.7x10'5). Known alleles are C (ancestral, associated with darker skin color) and T (derived, associated with lighter skin color). This SNP differentiated European and Asian individuals from others (African, Middle Eastern, Native American and Oceanic; p-value not reported). 28 Current Molecular Photofitting Technology A few assays have already been developed to infer the physical appearance of an unknown individual from a DNA sample. The SNP assay RETINOMETM predicts eye pigmentation with 92% accuracy and was released for forensic use in 2005 by the applied science company DNAPrint® Genomics (DNAPrint® Genomics 2007). DNAWitnessTM2.5, another SNP-based test by DNAPrint®, is available to estimate percent genetic make-up of Sub-Saharan African, Native American, East Asian, and European ancestries. An earlier version of this assay, DNAWitnessTM l .0, was utilized in the Louisiana Serial Killer investigation. Between September 2001 and March 2003, a serial killer had been raping and murdering college women near Louisiana State University. Inaccurate eyewitness reports led investigators to believe that the suspect was Caucasian. However, later use of the ancestry assay revealed a mix of 85% Sub- Saharan African and 15% Native American ancestry, facilitating a much-needed turn in the investigation (Sachs 2003; DNAPrint® Genomics 2007). The DNAWitnessTM assays have been utilized in over 90 criminal investigations (DNAPrint® Genomics 2007). Only two published studies involve the use of a DNA sample in estimating an individual’s natural hair color. Grimes et al. (2001) found eight SNPs within a single gene (MC 1R) that are associated with red hair pigmentation, and developed an assay to infer whether an individual has red hair. The authors did not utilize a method of measuring hair pigmentation, instead classifying hair color qualitatively. While this was useful for their purposes in investigating just a single hair color, it could decrease the accuracy of color prediction when considering all possible hair colors. Brilliant (2008) investigated 74 SNPs within 23 genes for their ability to predict hair, skin, and eye color 29 from a DNA sample. Three-SNP statistical models for predicting phenotype were tested, which accounted for 77.3% of the variance in the amount of scalp hair melanin, 38.3% of the variance in hair eumelanin2pheomelanin ratio, 45.7% of the variance in skin pigmentation, and 52.2% of the variance in eye color. The most informative SNPs in estimating hair pigmentation were r51800410 (OCAZ), rsl6891982 (SLC45A2), rsl426654 (SLC24A5), and rsl 805007 (MC 1 R). While these SNPs clearly affected hair pigmentation, there are some potential improvements to be made to the resulting statistical models. First, only three-SNP models were tested, and inclusion of additional SNPs would likely have increased the accuracy. Also, the stratification and LD resulting from ancestry, which was not controlled for, can affect interactions between SNPs and genes, and consequently influence phenotype (Frudakis 2008). Therefore, while models produced by Brilliant (2008) give very promising results, further investigation that controls for ancestry and includes more SNPs is necessary. Goal of This Study The purpose of the research presented here was to develop a preliminary genetic assay to estimate an individual’s scalp hair color from crime scene evidence (e. g. semen or blood) or decomposed or incomplete remains that cannot be identified by conventional means (e. g. human bones or other tissues). Eight SNPs in three genes were selected for investigation (rsl426654 in SLC24A5; rsl6891982 and r326722 in SLC45A2; rs7495174, rs4778138, rs4778241, r51800404, and rsl448484 in OCAZ). The study was carried out in collaboration with the Pennsylvania State University Anthropological Genomics Laboratory, which studies the genetics of variation in common human traits such as hair, 30 skin, and eye pigmentation, facial features, and tooth phenotypes (e. g. Shriver et al. 2003; McEvoy et al. 2006; Norton et al. 2006; Norton et al. 2007; Quillen and Shriver 2008; Klimentidis and Shriver 2009). DNA samples, ancestry data, and hair pigmentation measurements were collected by laboratory personnel. Hair pigmentation was measured using a DermaSpectrometer to determine the average scalp hair M index value for each individual. A multiplex SNP primer extension assay was developed for the eight SNPs and tested for accuracy. A set of DNA samples (n = 85) from African American individuals was analyzed because they have substantial admixture, resulting in ALD and AS that are useful in correlating polymorphisms with phenotypic variation. SNP genotypes were tested for linkage to phenotype while controlling for ancestry, and their effectiveness in estimating hair pigmentation from a DNA sample was determined. A model for estimating hair pigmentation was designed, utilizing the genotype data, M index values, and ancestry measurements from each individual. Due to the new and novel nature of this research, the model generated here is not intended for use in casework until it has been shaped by further investigation and development. The overall goals were to determine which of the eight SNPs had the highest correlation with hair pigmentation, and to provide groundwork for quantitatively predicting levels of hair pigmentation in an unknown individual. 31 MATERIALS AND METHODS All materials and solutions were autoclaved or filter-sterilized as appropriate. Tools and containers associated with sampling or storage were exposed to short wavelength UV irradiation for 5 min per side and, when suitable, wiped with a 10% bleach solution. Blood Specimen and Data Collection The Pennsylvania State University and Michigan State University Institutional Review Boards approved DNA collection, hair pigmentation measurement, and analysis procedures. Blood specimens and corresponding data (including age, sex, hair color, melanin index, and percent European, West African, Native American, and East Asian ancestries) from 86 consenting African American Penn State students (State College, PA) were obtained from the Shriver Lab. The criteria for inclusion were that donors had 25% or greater West African ancestry and that their hair was not gray, bleached, dyed, or thinning. Individuals were de-identified. Hair color was recorded using a categorical description (including medium blond, light brown, medium brown, dark brown, and black) as well as four digital photographs. Hair pigmentation was measured using a DermaSpectrometer to obtain an M index value. Hair was smoothed down, ensuring no scalp was visible, and the measurement head of the DermaSpectrometer was placed on the hair. Measurements were taken in three locations at the crown of the head in the parietal region and averaged to obtain an overall measure of hair pigmentation. Two buccal swabs were collected from each individual and sent to DNAPrint® Genomics, where the DNA was analyzed for 176 AIMS using ANCESTRYbyDNATM 2.5 (an assay 32 for estimating ancestry that utilizes the same markers as DNAWitnessTM 2.5). Percent European, West African, Native American and East Asian ancestries were calculated using a maximum likelihood estimation method, which fit parameters maximizing the probability of the data to the AIMS in a mathematical model, and estimated biogeographical ancestry and admixture proportions (Halder et a1. 2008). A blood specimen was collected by piercing the individual’s fingerpad and depositing the drops on an FTA® card (Whatman; Clifton, NJ), rendering the samples non-biohazardous and safe for transport. The F TA® cards were dried and stored in desiccation chambers. A 1.2 mm Harris Uni-Core punch and cutting mat (Whatman) were used to cut blood sample disks for each individual. Although the manufacturer has shown that the Uni-Core punches do not transfer DNA between F TA disks, even without cleaning (Whatman plc 2003), three disks were punched from clean portions of the F TA card and discarded as a precaution against cross contamination. Six blood sample disks were then deposited into a sterile 1.5 ml microcentrifuge tube labeled with the ID number. The blood and corresponding background data were transported to the Michigan State University Forensic Biology Laboratory for DNA analysis. Two volunteers from the Forensic Biology Laboratory donated their DNA for use as positive controls. Blood was deposited on F TA® cards, denoted as “DRF” and “KG”, and allowed to dry. Disks were punched from the FTA cards as needed, using the methods described above, for use in validation of the assay and inclusion as a control. 33 Po DNA Purification Purification of the FTA disks was carried out using a protocol adapted from Smith and Burgoyne (2004). A disk was placed in a sterile 1.5 m1 microcentrifuge tube using forceps, and 200 p1 of filter sterilized FTA® purification reagent (1% SDS, 2 mm EDTA, pH 8.0) was added. The disk was gently vortexed and incubated on a test tube rocker set to 10 rpm for 10 min at room temperature. The reagent was drawn off and 200 pl TE (10 mM Tris; 1 mM EDTA, pH 7.5) was added. The disk was gently vortexed and placed back on the rocker to incubate for 5 min. The solution was drawn off and TE was added twice more, for a total of three rinses. Remaining TE was drawn off and the disk was allowed to air-dry overnight in the open tube, stored in a clean drawer. Alternatively, disks were purified using commercial Whatman F TA® purification reagent, following the manufacturer’s instructions. Forceps were wiped with 70% ethanol and dried with a Kimwipe® (Kimberly-Clark Corporation; Neenah, WI) between disk transfer. A reagent blank (disk punched from clean portion of an FTA® card) and a positive control were also prepared. Primer Design Eight PCR primer pairs (one for each SNP assayed) were designed to vary in length and amplify simultaneously via multiplex PCR. Eight SNP interrogation primers were also designed for multiplexing, adding 5’ poly-A tails as needed to create a stepwise increase in length The primers were divided into multiplex A (Table I) and multiplex B (Table 2). Primer attributes conformed as closely as possible to suggestions outlined in the GenomeLabTM SNPStart Primer Extension 34 Kit manufacturer’s instructions (Beckman Coulter; Fullerton, CA). Multiplex A PCR primers ranged from 25—30 nucleotides (nt), with melting temperatures (Tm) between 69.5 and 72.4 °C. Multiplex A SNP interrogation primers were 23-35 nt (not including poly-A tails) with a Tm of 72.1—73.9 °C. Multiplex B PCR primers were 27—29 nt with a Tm of 70.9—73.9 °C, and SNP interrogation primers were 25— 36 nt (not including poly-A tails) with a Tm of 69.7—74.9 °C. Primers were designed using Primer3 software (http://frodo.wi.mit.edu/primer3/input.htm), with “Human” selected as the mispriming library, and product size selected in lengths increasing stepwise. PCR primers were checked for secondary structure and cross-dimerization using Primer Express® (Applied Biosystems; Foster City, CA). Reverse e—PCR software (http://www.ncbi.nlm.nih.gov/sutils/e-pcr/reverse.cgi) was used to ensure that non-target amplification would not occur during multiplex PCR, with “homo sapiens genome” selected, a size deviation of 1000, and the maximum number of mismatches and gaps at two. SNP assays were designed as two four-plexes to avoid primer cross-dimers projected by F astPCR© primer design software (http://www.biocenter.helsinki.fi/bi/Programs/fastpcr.htm). Primers were synthesized, and SNP primers HPLC-purified, by either Sigma-Genosys (Sigma-Aldrich; St. Louis, MO) or IDT (Integrated DNA Technologies; Coralville, IA). 35 E mm “BEE. owwmowfiwumwwmsopwzmwwto i “Em 89.33.. .GSEEEZm wumwmomowmowwaoaoommmowSmw ”855m 6?. 3 23 E8232 23838038ootoowwuoouo €535..— UA<_ mmd Vmccmifl E we “SEE wuobwfimuowfltmwuoaofiooouo0Ea3§§ ”afloién—Zm mZm cowcoaxm EowmoofiamtxoeBowtomwmwu ”385m 0?. an mwm Eoo=gfi< wmowwommogosm888$me E530”— UAHommw+mmuo wflwtvfl E mm “BEE Homowowawuofiwooomowouwgagiiflgflm ”Acumzcomvam Afilm cowcoaxm mmmowmmwwgowwowoawmwuow ”885% 0?. an 3N E82993 owfimommwwpmwaotwmwwuooow fiSEom DAH~2+NNLQ vtmovnfl E mm ”SEEM owwuouommwmomowpwoamwmoofi AwSEomem nHZm 9.6:me «wwwuowwwwfimwwwaoowmfiwwaw ”885m 0).. an _o_ ”523:3. ommwwuomowwwouooouomfiowawu ”REBEL 0~omZm nEm 000:0Em 08000m0w0w000ww0ww0w0w00ww 8:0.»0M 02 an 30 80238.0. 0000w00000000wm0www0w0w000 ”000300.: DAHnmomfimmuo 332.00: E mm “00805 00w00w0w0ww0000000ww0w0w000§i00§ 02m 0008.00 €038.35 00000000w0w000000000wwww0w0 H800>0M 05 an mom 800.38»: 90000000ww080w00000w0w0000 ”000303 UAONS _.0 $3307: E m0 80805 00800008000000w0080ww0000w000% A000EomEZm mZm 000:0Em 0w000000w000000000ww000000 8:0>0M .CU as com 80020.8»: 000w00w00000www80wwwww0 ”0:03.5m 0.m .CU 5: 8002:8400 000:00:00m0000000ww0w0www000 0:088 “Wham—008.0 vwvwvim: 005 8:00...— mUm 00:00—00m 008:3— moa— m0_0=< 5 Ala . . 0:0 02m mZm 008:: :000mot0E. mZm 0.0 .E 5mg— 08 0:000 9 08: 0:00 «3:0: 00 30:00:08 008100: 3 ..m 8 .m 8 80:00:08 .8 :000E0CO 00:00.08 00:80.00: 3000 05 8 808038 0000200: 0 0000208 .0088 :A.. 0:... .mZmn0 8 008: m0 Aooom 030000980. 0:0 5:85 :03 b0_00m :000t0> 080:00 88:: 08 .3 000588000: 0080—00080: 00:0:08 00:80.00: <20 8000 8 000 800000. mZm .8505 000 8008 00:08: A_Zm 0:0 MUm 0:0 88:00:08 008:: AmZm 0:0 MU“. 820:0 0:0 :00002 008 A5% .0080: e mZm EUZ "N 030,—. 37 KG DNA Amplification Single PC Rs were optimized using the DNA control samples and the lowest annealing temperature for the respective multiplex. Two-plex, three-plex and four-plex reactions were then tested, and multiplexes were optimized for primer concentrations and extension time. Parameters tested included PC R primer concentrations ranging from 0.5 to 10 uM and extension times of 1, 2, and 5 min to ensure all four fragments were fully amplifying in multiplex. PCRs contained: 3 pl 5X GoTaq® Flexi colorless reaction buffer, 1 U GoTaq® F lexi DNA polymerase (Promega; Madison, WI), 2 mM MgClz, 0.2 mM of each dNTP, multiplex A or B PCR primer mix (resulting in 0.4 M of each forward and reverse primer), sterile water to a final volume of 15 pl, and one purified FTA disk. A positive PCR control consisting of DRF or KG DNA and a reagent blank were prepared with each set of reactions. Thermal cycling conditions for both multiplexes consisted of: an initial DNA denaturation at 94 °C for 2 min, 35 cycles of denaturation at 94 °C for 30 sec, primer annealing at 66 °C for 30 sec, and DNA extension at 72 °C for l. min, followed by a final extension at 72 °C for 5 min. If DNA amplification was weak, thermal cycling was repeated, increasing the number of cycles to 40. Multiplex PCR products were separated on a 1.5% agarose gel, stained with ethidium bromide, and photographed on a UV transilluminator. Control DNA Sequencing The eight PCR amplicons were sequenced for control DNA samples DRF and KG. PCR products were purified on. a Montage® centrifugal filter unit (Millipore; 38 Billerica, MA) with 300 pl TE, following the manufacturer’s protocol. The DNA was washed an additional two times with TE and resuspended in 10—15 pl TE. DNA sequencing was performed using a GenomeLabTM DTCS Quick Start Kit (Beckman C oulter). Reactions contained 4 pl Quick Start Master Mix (comprised of buffer, MgC 12, dNTPs, dye-labeled ddNTPs, DNA polymerase, and pyrophosphatase), 2 pM forward or reverse primer, 50—100 fmol DNA and sterile water to a final volume of 10 pl. Thermal cycling consisted of 35 cycles of: DNA denaturation at 96° C for 20 sec, primer annealing at 46° C for 20 sec, and DNA extension at 60° C for 3 min. Sequencing product was transferred to a sterile 1.5 ml microcentrifuge tube and 2.5 pl fresh stop solution (1.2 M NaOAc, 40 mM EDTA, and 4 mg/ml glycogen from the Quick Start Kit) and 30 pl of cold 95% EtOH were added. The tube was centrifuged at high speed for 15 min, and the supernatant was removed without disturbing the DNA pellet. The pellet was washed with 200 pl of cold 70% ethanol and centrifuged for 3 min at high speed. The supernatant was drawn off and the wash step was repeated. The pellet was vacuum dried for 15 min and DNA was resuspended in 40 pl sample loading solution (SLS, Beckman Coulter). Sequencing products were separated on a CEQTM 8000 Genetic Analysis System (Beckman Coulter) using the program LFR-1-60, which consisted of: capillary temperature 50°C, wait for temperature Yes; denaturation temperature 90°C, duration 120 sec; injection voltage 2.0kV, duration 15 sec; separation voltage 4.2kV, duration 60 min; pause 0 min. Sequences were analyzed using the sequence analysis module of the CEQTM 8000 software, adjusting analysis parameters as needed to improve sequence readability. Sequences were edited for extraneous nucleotides manually or using the 39 BioEdit Sequence Alignment Editor (http://www.mbio.ncsu.edu/BioEdit/bioedit.html), and SNP sites were located by comparison to NCBI’s refSNP flanking sequences. PCR Product Purification for SN P Analysis PCR products were purified using Exo—SAP reactions containing: 1 U shrimp alkaline phosphatase (SAP, Roche; Indianapolis, IN) 10 U Exonuclease I (New England Biolabs; Ipswich, MA), 2 pl 10X SAP dephosphorylation buffer, the remainder of the PCR product (about 10 pl, after gel electrophoresis), and sterile water to a final volume of 20 pl. Reactions were incubated at 37°C for 75 min, followed by an 80°C enzyme inactivation step for 20 min. SNP Primer Extension SNP primers were tested in single, two-plex, three-plex, and four-plex reactions using DNA control samples DRF and KG. SNP genotyping accuracy was confirmed by comparison to the control sample sequencing results, ensuring that the alleles matched in both SNP and sequencing analyses. Multiplex SNP primer mixes were optimized, testing a range from 0.125 to 24 pM of each primer. SNP primer extension reactions were conducted using a GenomeLabTM SNPStart Primer Extension Kit (Beckman Coulter), following the manufacturer’s protocol, utilizing the primers described in Tables 1 and 2. Reactions consisted of 4 pl SNPStart master mix, 1 pl multiplex A or B SNP primer mix (concentrations listed in Table 3), 1 pl purified multiplex PCR product, and sterile water to a final volume of 10 pl. Reactions were assembled on ice and components were added in the order of (1) SNPStart mix, (2) water, (3) primers, and (4) PCR product. A positive 40 control including DRF or KG DNA and a negative control including sterile water were prepared with each set of reactions. Thermal cycling conditions for both multiplex reactions consisted of 38 cycles of: denaturation at 90°C for 10 sec, primer annealing at 45°C for 20 sec, and primer extension at 72°C for 30 sec. Primer extensions were purified using 10 pl multiplex SNP reaction product, 0.5 U SAP, 1.3 pl 10X SAP dephosphorylation buffer, and sterile water to a final volume of 13 pl. Reactions were incubated at 37°C for 45 min, followed by enzyme inactivation at 65°C for 20 min. Table 3: Concentrations for Multiplex A and B SNP Interrogation Primer Mixes The concentration of each SNP interrogation primer is shown. Primer concentrations were optimized to better balance electropherogram peak intensities within the multiplex. SNP Interrogation Concentration in: Concentration in: prime, Multiplex A SNP Multiplex B SNP Primer Mix Primer Mix r51800404 0.125 pM -_ rs7495174 0.25 pM -_ rsl426654 2 pM _- rs4778138 24 pM __ rsl448484 -- 4 pM rs26722 -- 2 pM rsl6891982 -- 0.25 pM rs4778241 -- 0.8 pM Primer extension products were separated on a CEQTM 8000 Genetic Analysis System, combining 0.5 pl purified SNP product, 0.5 p1 size standard 80 (Beckman Coulter), and 39 pl SLS. If the size standard intensity was too low for sofiware analysis, the amount of size standard was increased to 0.75 pl. In instances where SNP peak intensity was too low for software analysis, the amount of SNP primer extension product 41 ant E91, was increased to 1 or 2 pl. Extension products were electrophoresed using the program SNP-long, which consisted of: capillary temperature 50°C, wait for temperature Yes; denaturation temperature 90°C, duration 60 sec; injection voltage 2.0 kV, duration 30 sec; separation voltage 6.0 kV, duration 18.0 min; pause: 0 min. Locus tags were created for each SNP site using the SNP Locus Tag Editor. Extension products were analyzed using the fragment analysis module of the C EQTM 8000 software, with the following parameters: slope threshold 30 and relative peak height threshold 20, under the General tab; SNP ver. 2 dye mobility calibration selected under the Advanced tab; and multiplex- appropriate SNP locus tags selected under the SNP Locus Tags tab. SNP locus tags were programmed to assign each allele appropriately according to primer length and orientation, and the dye color detected. Electropherograms were compared to the corresponding raw data to verify fragments as independent peaks, as opposed to overlapping peaks due to pull-up that may have been mislabeled by the software. Statistical Analyses The data consisted of one dependent variable (hair pigmentation, measured by average M index value), eight independent variables (genotypes of the eight SNPs analyzed), and four covariates (percent European, West African, East Asian and Native American ancestries), which were secondary, intrinsic variables affecting the relationship between the dependent variable and the independent variables. C ovariates were included to increase the statistical power by accounting for inherent variability due to genetic ancestry. SNP genotype frequencies were tested for departure from Hardy-Weinberg Equilibrium (HWE; p > 0.05) using the chi-square test available in the LDA 1.0 software 42 (http://www.chgb.org.cn/lda/lda.htm). All other statistical tests were carried out using SPSS® 16.0 statistical analysis software (SPSS Inc.; Chicago, IL). A significance level of a = 0.05 was used. Analysis of covariance (ANCOVA), a method for comparing means between groups in which the covariates were controlled for, was conducted using the univariate option of the general linear model, under the SPSS analyze tab. Single- variable ANCOVAs were first carried out for the SNPs, and were set up to include descriptive statistics (number of observations, means, and standard deviations). Factorial ANCOVAs were then performed, including options for power analysis (1 - B Z 0.80; reviewed in Thomas and Juanes 1996), to compare the significance of all SNPs simultaneously and produce multi—factorial models for estimating M index. ANCOVA results included the degrees of freedom, F-statistic, and significance for each specific term. The goodness of fit of each model, represented by the corresponding R-square and adjusted R-square (adjusts for the number of independent variables and the effect size), was ranked relative to the others. The model of best fit was used to generate an equation for predicting hair pigmentation from SNP genotypes and ancestry measurements, and the corresponding parameter estimates, comprising the model intercept and coefficients to be utilized when estimating an individual’s M index value, were also generated. 43 RESULTS PCR Optimization Individual PCRs tested using the control DNAs, as well as two-plex, three-plex and four-plex reactions, generated the expected number and size of amplicons (Figure 14). Gel electrophoresis results indicated that primer design was successful and non- specific amplification did not occur. Multiplex A and B PCRs worked properly, enabling efficient amplification of all four DNA loci simultaneously. .. ._ . , " 4 . DRF DRF RF DRF DRF I00 bp M48484 l689l982 " : 3P3 4P8 I. DRF DRF 26722 4778241 (A) DRF DRF 100 bp DRF 1800404 4778l38 L SPA R) DRF DRF DRF DRF 7495174 1426654 IPA 49A ~ I KG KG KG KG l00bp l689|982 ZPB JP!) "'3 I. ' KG 26722 47782“ Figure 14: Optimization of PCR multiplexes A and B. (A) Example electrophoresis gel showing multiplex A PCR optimization using DRF DNA. The leftmost four lanes are labeled based on the r51800404, rs7495174, rs4778138, and rsl426654 region amplicons that were produced. Comparison to the 100 bp DNA ladder (100 bp L, Invitrogen; Carlsbad, CA) confirmed amplicons were the correct size (Table 1). Numbers immediately right of the ladder bands indicate the corresponding sizes. The rightmost three lanes confirm properly working two-plex (2PA), three-plex (3PA) and four-plex (4PA) reactions. (B) Electrophoresis gel showing optimization of multiplex B using DRF DNA (top) and KG DNA (bottom). The lefimost lanes are labeled based on the rsl448484, rs26722, rsl6891982, and rs4778241 region amplicons that were produced. The next three lanes show properly working two-plex (2PB), three—plex (3PB) and four—plex (4P8) reactions. Comparison to the 100 bp DNA ladder in the rightmost lane (100 bp L, New England Biolabs; Ipswich, MA) confirmed amplicons were the correct size (Table 2). Numbers immediately right of the ladder indicate the corresponding sizes. 44 SNP Primer Extension Reaction Optimization In all electropherograms produced, the outermost peaks consisted of the two red size standard fragments at 13 and 88 nt. All other fragments arose from a dye-labeled ddNTP having extended the primer, resulting in a red (ddATP), green (ddGTP), blue (ddTTP) or black (ddCTP) peak at the appropriate size. Fragment sizes for both multiplexes determined by the CEQ 8000 software were consistently lower than the actual size of the extended primers. This was anticipated, as it is a common occurrence noted by the manufacturer (Beckman C oulter Inc. 2007). Extending the separation time to 18 min during electrophoresis allowed the 88 nt fragment of the size standard to be detected consistently. The original SNP primers generated poor quality electropherogram peaks that were wide and jagged rather than clean and sharp (Figure 15). Troubleshooting revealed the SNP primers may not have been properly purified, resulting in an array of lengths. Subsequently, new primers were ordered from different vendors (Sigma and IDT), which produced good quality peaks. SNP multiplex A was readily optimized (Figure 16), using both DRF and KG DNAs. Individual SNP primer extension reactions as well as two-plex, three-plex and four-plex reactions with the new primers generated products of the correct number and size. 45 7 ~‘- 0000 I 51.1 oar mucus 10007A05_00012510m| D 20000 10000 «f- ?3 1 1 0 : A. a; «. 0 10?. ”M I” 40000 4 [su 010745151745 10007 0010301516512 | 35000 - ( B) a n * 30000 - 1 a 25000 < l l ‘3 20000 4 n n a 1'" l 315000 -1 ll 12 n ,1 K " 10000 ‘1 1 l 5000 1 I! l " 1 fl; 0 _4 .L A __ 1110100001": 70000 80000 i- SLJ DRFATTBVSS 10007 ECE_0§315189° 5(C) .. C 1 50000 <_- «10000 1 . 0 10000 «i i ii ' ——-) I I : 1" l Artifact peak ’ It i i 5000 «5 14_1—-1‘;L_L4_L_L__141 1_1___LL_1_1_1__4‘1_‘:L_1111 1 1 1 1. so 5120 (no 1 75000 _ : . I 150000 .5 ( B ) E511 778241” ‘50! €3.03312021W 125000 -§ f 100000 «:- ' 15000 <- ' 1 50000 «E J i E i ”°°° {.73. 1 .\ New rs477824l peak FL mono [10 4mm 45¢F06_0301202100j 150000 « (C ) 21‘22 “a" '27:" E125000 <5 11 3100000 <- {I 2000 c 75000 ‘E :1 1110... c 50000 4% I 1 I"! n a : 1:1 . l c ' J 25000 ~13 1,3 ,‘ 3 "14.444 1 l A : .1. \' .1 .. "/“L I - -4= K__ 0 DYE 510031 £0000 < - Eofi $1611 lPB-xhul r05_090120224n| 35000 - (D) a 30000 4 1: 11 5, 25000 - fl. ’1 V) 20000 < " l i 15000 ~ i i i 10000 < 1 ( l 1 l 5000 < 1. f ( o l l l ,1 L A 1"“; 1 1 % __L__l _L_.L Jr J_ 1 AL I % l 1 l __L—_ i 1 1 1 J % I l 1 l i 1 l l l i l 1 #“f 1 1 l 0 10 2o 30 4o 50 so 70 00 so 5120 (00 Figure 18: Electropherograms showing artifact elimination by use of a shortened rs477824l primer. The original SNP multiplex B could not be validated because of the presence of a green artifact peak in the negative control (A). The primer was shortened from 62 bp to 24 bp. The resulting SNP peak appeared at the correct size (B). SNP multiplex B was then re-optimized. generating all four peaks at the correct size (C). All subsequent reactions utilizing the (now shorter) rs4778241 SNP interrogation primers had clean negative controls (D). Positive Control DNA Sequencing All positive control DNA amplicons except one (DRF rs4778241) were successfully sequenced in both forward and reverse directions. Sequences flanking the SNP sites (Figure 19) corresponded to those in dbSNP, and sequence alleles were in concordance with the listed ret‘SNP alleles. SNP alleles from DRF and KG sequences matched the corresponding SNP primer extension results (Table 4). Despite several 49 repetitions of both the PCR and sequencing reactions, the DRF rs4778241 amplicon could not be sequenced in the reverse direction. 9 su manner 91H1.011_07091623sr 4: New Analysis : Untitled l i i .0 I! i. ' 7| ‘ll'. .- 1‘ M i fl " rf‘ .l‘ A" i? " 1 :: :J‘J; ill} I‘M-Hi Ir .1 .a‘ [nt.-“.4, IL; ix 41:4. ii I ' J;L 11.....1’Lfil'.‘ .31.A Lgl. 5(1va LAN. I‘m-‘TXAIACL Jig: -! I- -1- fifty: 14.3.: ”it. :JJJ ”I. ”i 1 1 1 4: - 225140.41 1o Analyzed Date a T . . : UU’CI “L'107111 ”CT _‘_, __._ T 11 .1 y C v C .. 'Y C- ( (- (Y ...1(,. 1 a? 0.4T1‘Tn(.. (1115.111: 141» i 1 M I” aim .J .1] 9 0 AI? 'I‘GAPAA TATMITA TA ”1"? AMA 1 ACACI'I‘ATGAAA'I'I‘WCATAI’."I‘I‘CCAAATAA’I‘CCAGGA'I'FCAAAAAGFNxGI‘CT‘I‘JAAGGGAAATW.‘A-SAATTATA'I'I‘GAACTGAA'IBAA MAGAACACAGCTAAAWAGNATGGGAEGAAA'I'I‘I‘ACA 31711171101031: 'I‘GGTA’I‘CAAAAGGA 1 7 9 311151:101.1311:ataxwru'rmmsc'n:11:11.11; ’I‘IIAAQAACCTAUAAAAA smcacumrm ICCAAAGIIAALSCAGAA F'aw E-‘tta ”l1: . -m-L I E . - “1: I . ~1- : b D .01: . . . - L A 4 A A A AAL A L J 1 1 l 1 T r 1 T 0 I I O h . run-lab" Figure 19: Sample electropherogram depicting location of a SNP within sequencing analysis results. Sequencing of the KG rs4778138 region amplicon in the forward direction is shown. The nucleotide flanked by pink lines in the upper panel and highlighted in the middle panel is SNP rs4778138, located by comparison to the refSNP flanking sequence. SNP sites were checked for heterozygosity, as evident in this case by the presence of overlapping peaks, so that sequencing genotype would be accurate. 50 Table 4: Validation of multiplex A and B SNP allele analysis. Product size, SNP genotyping results, forward and reverse sequencing genotypes, and whether calls matched are shown at each SNP for DRF and KG positive control DNAs. Extended primer run sizwthe fragment size called by the CEQ fragment analysis sofiware—was consistently lower than the actual extended primer size. An average of the run sizes for the two controls was utilized to set up the SNP locus tags for sample analysis. All SNP and sequencing results matched, confirming that the multiplex assays produced accurate results. Positive Extended SNP Sequencing Sequencing All Calls SN P Control Primer Run Multiplex Genotype Genotype M t h" DNA Size (nt) Genotype (ForwardL (Reverse) a c ' DRF 25 GG GG GG Yes DRF 29 AA AA AA Yes DRF 44 AA AA AA Yes rs4778138 KG 45 AG AG AG Yes DRF 52 AA AA AA Yes rs 1 426654 KG 55 CG GO (30 Yes DRF 29 TT TT TT Yes rs 1 448484 KG 28 CT CT CT Yes DRF 40 CT CT CT Yes 7 rs..6722 KG 40 CC CC CC Yes DRF 47 CG CG CG Yes rsl6891982 KG 47 CG CG CG Yes DRF 6O (onginal) AC AC -- Yes 24 (Redesrgned) rs4778241 60 (Original) KG 24 (Redesigned) CC CC CC Yes DNA Purification, Amplification, and SNP Primer Extension Parallel PCRs revealed no differences in DNA isolation and purification between the commercial purification protocol and that adapted from Smith and Burgoyne (2004). The adapted protocol was used for the majority of the samples, since the self-made reagent was inexpensive to prepare and could be produced fresh as needed. DNA was successfully amplified (i.e. bands for all four amplicons were visible) from 85 of the 86 blood samples. Sample 060369 exhibited no amplification using either multiplex (Figure 20). DNA isolation and re—amplification with a new 060369 FTA disk produced similar results. SNP primer extension reactions yielded inconclusive results; two low intensity 51 peaks (rsl800404 and rs7495174) in multiplex A electropherograms were surrounded by background noise, while multiplex B reactions yielded no detectable peaks (data not shown). Figure 20: Electrophoresis gel showing PCR results for sample 060369. While four bands were produced for other samples amplified, multiplex A (left) and multiplex B (right) PCR on sample 060369 DNA did not produce bands. The remaining 85 samples were successfully genotyped (Table 5), and the vast majority of multiplex A and B SNP reactions yielded clean, unambiguous results with a single analysis and addition of only one FTA disk per reaction (Figure 21). Size standard intensity was occasionally too low for the soflware to recognize peaks, and increasing its volume from 0.5 to 0.7 5 111 during capillary electrophoresis improved identification. While multiplex SNP optimization resulted in well—balanced peaks for the controls, it did not necessarily do so for the test samples. Therefore, some samples (e. g. 050355 and 050364 in multiplex A; 050222, 050240, 050245, 060076, and 060324 for multiplex B) 52 were also utilized in adjusting primer concentrations to produce more balanced peaks (Figure 22). These final primer concentrations (Table 3) were utilized in subsequent SNP reactions. Despite this, peaks were still better balanced for some samples than for others. Unbalanced peak heights between loci or within heterozygous loci did not prevent genotyping; products were verified as true peaks in the raw data (as opposed to overlapping peaks due to pull-up), and were still labeled by the software (or were otherwise readily identified by sight and manually labeled). Pull-up was rarely observed (arrows in Figure 22). Occasionally, an electropherogram exhibited low intensity peaks at the rs4778138, rsl426654, rsl448484, or rs4778241 loci, yielding only partial results. Multiplex reactions with adjusted primer concentrations or single primer extension reactions were repeated for inconclusive SNPs (Figure 23). If results were still inconclusive, a new sample disk was purified and single locus reactions were performed. This process improved analysis and provided clear results, except in a few cases; even after additional testing, rsl426654 SNP results remained inconclusive for samples 050355, 060078, and 060626 (Figure 24). These were included in data analysis with the unresolved genotypes left blank. 53 Dilutifldi. . 1E. 8 8 B 175000 ‘ 150000 - ‘ 125000 - 5, 100000 - 2. 75000 4 50000 1 25000 IflNTD [050157 19801011 GO7_0§3126016L I ll!“ 0&8 DO 00 OO 11 S 1.1 O O D 0 1111111 \ 1 \ 1 I ' CM A !\ 33 [no 0101 C uuzn IGIEQ3JF81hUtAdlE06.&IN2$015J| E! u D O O 0 41111141111111 0 I7 0 "190w“ 0M 01’ C FINNIH} h525904P00hu1Ad180V.&!H2£015J| C am 1 1 1 .1# 5L 1 l 40 SO Snotno 59 LA I 1 1 l I l l l l l I l l l l l r J I l 1 l 60 70 00 90000 00000 1- at” [0500211911 shut .am003_0501252210] 70000 ~4 «lemon - 00000 1 a $0000 4 a 71 :75 c «a .0000 4 "I“ C 028722 § 30000 1 71 .. .1 25 20000 < 0:024! T 10000 - 12 u . ,, “m u o l 1 1 l l L' @L 1 1__1 1.2 LL 1 A A 1 1 J 1 AL 1 l l 1 L J l l l l J L l l i J_L i 1 1 ,g- L J l l I I I T I I T I 0 10 20 30 40 50 150 70 00 00 Sue (r1) 150000 E ‘7" [0003041P0.m1¢2002_0901252217'] 125000 a; C : dl‘l "ruin g 100m «5 «2:122 01 E 41.12 a 75000 4: 1 E all?!) § 50000 W: n 25000 <5 f D , 1X . o $ A Figure 22: Electropherograms depicting further optimization of peak heights during sample analysis. Multiplex B peaks were not well balanced when this sample was first analyzed (top panel). While all allele calls were correct, further adjustment of primer concentrations (bottom panel) allowed confirmation of the rs4778241 and rsl448484 loci as homozygous, and the r526722 locus as heterozygous. The tiny blue artifact underneath the rsl448484 peak (arrows) was verified as either pull-up or background noise by comparison to the raw data. a o" - 70000 «5 "mm [gamma-11m E09_0901252214] 60000 4% “t“ E 50000 = 11110001002 1: E 20.10 ' a" 00000 “=- C : a!“ 3. 30000 T '3 n . 20000 1; 1‘ ”A” m: A : I I 10000 4: f 1‘ "W‘ “11.04 'I i 0 E / __ 2.; 1A a ‘ A 4"‘\_ .AJ ‘ 5 mo (5002914404005 1300 0001252212 30000 d; c L _ 1 25000 45 "M“ 3 E 320000 «: 6‘) E : 15000 -- mo 0 .: I ‘0000 E I‘ll“ 5000 45 :’ 15L _ a: 051111.11311 l‘JLLL " [)1 1 1444_J_A__1 101.41.;LA11441L1 l_1LJ l [44"1'L11 11 I r I r r I r T 0 10 20 30 10 so so 70 00 90 SR. ("0 Figure 23: Example electropherograms in which a primer extension was repeated for a single locus. Multiplex SNP primer extension results for sample 050829 were considered to be inconclusive at the rsl448484 locus, since the blue T peak was very low (top panel), and was not originally labeled by the software. (The peak was labeled above manually. for visualization.) In order to confirm heterozygosity, a SNP reaction was performed using only the rsl448484 SNP primer (lower panel). Elimination of primer competition yielded a greater concentration of the product in question and provided a better balance between the smaller and larger peaks for this heterozygous locus. 60 lm 12500 .: no” a” Emmmzageoxawmzwul E film 67 II _ 10000 -o_- 1‘ 17255052 g 7500 E I 1171 57:5 - ‘1- ' A (I) E "7‘17. '39,“. In. 3 5000 45 ‘ n- 1'" ‘0" 1 1" E u “171”? 7:01" ll: “‘f\ i t “a II ‘ J 1 2500 4" ”7 n _ l I _ / J. ” ‘IIH ,3 ’U 1- , . ' J.’ 1 \— __ ' , .- 1 - ’ ‘1 o ' ‘- . ' ...-11‘... hr ‘- A—Lx ‘1.)1 " V ‘1. L A’ :1 ww J ‘1‘ 8000 4 9". IMPAMCM’LMEZDH 7000 < rum-Icon 2 8000 < 5010 f“ ‘1 g, 5000 « 5"“ ,'l ‘0 1000i 3'" 5750. ’I 0 am 5057 r‘ l i g 3000 . memos A «n , (“J .”:21" 2000 n “a ”’9'" 1111/“, 1 ' . ”3’ a 1000 1‘. ",5“ i t an 01.01 ['1 f, 0 A A .4 + An“ A ‘4— Ai _‘ ‘1‘; ‘I if". ’ W 'T‘ " 7. v.4: A Jr. a“; 35000 — [mmu1m_mzszntl 30000 n I!” an - 25000 1 0 A I Hi“ MIUIN céacooo E l ‘1‘ I" E 15000 45 I I“ ,1 ii “Jo 10000 4; 1 , - (I '. use '.“ - 1 an ‘ ‘- our 5000 = ,1 1 2107 I, on _ . .11 : -. ,- 5110 no: , . ”‘3 o 3 1 ‘- » A * .4 L W _ — ) RX _4.’ AR 4‘ Jr- " m _. _414"—-1’\\E_\$ ‘0000 + |t503551915007_0ar125:m, 35000 <- 1: no: 30000 -- h u. 1 .0 0 117410174 5‘ 25°00 l' ( ( at“ .l ”o” u (I) 20000 4 [ 1 '1, II] on mm I 1 1‘ l 1 52. 5°00 4 l I 1 ' I L admin 1 ""1230 I 10000 - - l ""\ ,' on menu “a," ,1 a 7121 . 5000 4 I l, 2:320:- I' i 173111. 1'1 an {‘1 ,1 "13° '9’} 1 4 1 1 1 J l 1 1— LJA-LuAt-hu' on: __1 1 J 1 IBA‘L .1 4L LI—zi'iysia "1541 $211 1 #4441529, L‘ 1 1_ L tM-xu. 1 1 ° 1 T r t I I I 1 r o 10 20 30 40 50 so 70 00 90 $10 (110 Figure 24: Example electropherograms from a sample with inconclusive rsl426654 genotype. Electropherograms from several different attempts at SNP analysis are shown for sample 050355. Peak intensity was very low in initial analyses (top two panels), and only r31800404 and rs7495174 genotypes were determined. While the area around these peaks contained background noise or artifacts, comparison to the raw data verified the peaks as SNP products. The rs4778138 peaks were absent and the rsl426654 peak was obscured by background noise. Another round of amplification followed by two attempts at primer extension resulted in more intense peaks (bottom two panels). The rs4778138 products appeared in one electropherogram; genotypes were determinable and peaks were verified in the raw data. However, all rsl426654 products were obscured by background noise and genotype remained inconclusive. Interpretation of SN P Electropherogram Abnormalities Abnormal peaks occasionally appeared in electropherograms, as exemplified in Figure 24. In rare instances, peaks, including those of the size standards, appeared distorted and improperly spaced over migration time (Figure 25). Possible causes of distortion were air bubbles in the electrophoresis gel or a damaged capillary tip, resulting in injection difficulties (Beckman Coulter support scientists, personal communication). 61 Examination of the capillaries revealed one tip had been broken. Sample re-injection or use of a different capillary resolved these respective issues. 22500 20000 [070% 190mm M1 11000011252210] _ 17500 3 15000 9’ 12500 . 10000 >. 7500 O 5000 2500 110 NS 120 125 130 135 140 145 150 155 180 Condom Velocny ligation Tim. (mn) . 2101 E C IWW‘FBMadl 810317125221“ - 100000 4: 507020! b l 51 75000 «E I .5” - : I.“ m . 1am . 50000 1: , 7 4100 ".3 “n >5 : r rsl“ C t D 25000 45 ‘ . '3'] ”w” mom-2 0 :1 L 1 1 1 J 121 1 1 1 1 01.1.14" 1\x_.1 1 1 1 1A1 1 AhL 1 1 l l 1 1 1 1 1 1 1 J_L 1 1 1 1 1 2.. 1 1 I T l I l I I l l 0 t 10 20 30 40 50 60 70 00 90 Stu (00 Figure 25: Example electropherogram from a sample injected with a broken capillary. Several samples in analyses run consecutively on the C EQ generated electropherograms with abbreviated migration times and distorted peaks (top panel). All samples were injected through capillary H, which had a chipped tip. Re—injection of the SNP products using a different capillary yielded a normal electropherogram (bottom panel). Background noise or other artifacts were only problematic in electropherograms where they obscured low intensity SNP peaks (below 10,000 rfu). The top two and bottommost panels of Figure 23 represent the most notable examples, where a wide cluster of connected peaks are observed between 50 and 70 nt. Without the noise surrounding the rsl426654 peaks, alleles would likely have been determinable in this sample. The same artifacts were frequently observed in multiplex A electropherograms when all peak intensities were low, including reagent blanks, negative controls, and single SNP reactions. When peak intensities were higher, this artifact pattern did not prevent genotyping (Figure 26, top panel). Other noise or non-target peaks (such as the two small, red peaks highlighted in Figure 26, bottom panel) were occasionally observed in both multiplexes, but did not prevent SNP genotyping. 62 ‘UUUU : v asooo «é "’fig‘ ma |0$0259 w.1t04_0m125m| _ MODE) 5 "room "'32“ C : '. c 25000 «E I" u“ A 5' 20000 4; ‘3 36° ’1 A mm : ‘ yl 1147711! 115000 «E ii ""90“ I't «a l ‘L‘L‘l’ ‘Li ‘1' 7‘ E . _ | , 0 I . 10000 4E [I l J; , l mu. “mun j 'I 5000 4: I . l I '“° ,-\ .- ‘ K : (I I , w. L ‘r‘. _ .’ ’s 0000.. f ‘- 0 = 1 1 1 1 i A 1_L_4_#.4L_A)_1_¥1 #1Q4 1_TA 4_4 I ' 1.5L". i-AL MAL% Lu 1 t ; 1 l 1"1 f1 1 J o 10 20 30 so so so 70 00 90 5110010 55000 I“ 50000 - m, " on [mvn-m 809_(BJ125210U| 05000 . C Jun 40000 11477024! " - r 3 35000 «- I .. F 930000 1 A. t c i “c“ i w 4 5' l "l rmm 25000 . v Iowa: 5.20000 4 i H «4 "I 7 : 0 15000 4 1 Y " ‘9 ' I ." l "'9’” “ “" :‘I 10000 4 {1'3 I l i I Q“ !\ I l 5000 «- I . : . ,p, p . , \ 0 14AJ41-+L.J“-L'_L*JJ_LI 1 1 "u 1 “ a 1 L4_1LLJ__L_L_1 LLLLJAJ .1 4 4.11:1 L4 1 f I I I I I I I T o 10 20 30 to 50 so 70 so so 51:0("0 Figure 26: Example electropherograms where artifacts did not impede genotyping. Most artifacts (arrows in both panels) did not prevent allele determination, as they were relatively low in intensity or did not overlap SNP peaks. The wide, red pattern of peaks centered around 59 nt (top panel) was seen in many multiplex A electropherograms. Other artifacts were less common and nonrecurring. Combining SN P Multiplexes Eight-plex SNP reactions were tested using a few samples and generated poor results, presumably due to primer interactions. However, an eight-plex electropherogram could still be produced by performing separate multiplex A and B reactions and combining them (0.5 ul each) in a single well for capillary electrophoresis. This yielded clean electropherograms (Figure 27) and saved time during SNP product electrophoresis and analysis. 63 = a" w mm 70000 1 A G WDIQ mm mun-II: 00000 3“. ' on " 50000 «00 C E J WM“ {ll c mum: “a (D 40000 n 17 ”an: an 0 . a A ' 7. menu >. -0000 4 1473 i owl“ n07!!! ‘ “7‘ (0000 4 A 1 «no A 5070101 I 1 0 , mm 10000 13 ,1 mm ‘ DDUUU 60003 4 N” ' 55003 c PEKLCK 280234 50000 < mama _ 45000 4 t tan 2 10000 4 ‘ Y D 35000 If.“ n l. u,‘ to 10000 r I0 mum A O 5000 ‘ , menu 0 20000 MWI ’l A. 15000 I I 15" I I It 10003 I , A I l l I l 13 "‘ W ' ' 5008 A : , _ ___, ._ ., EL 9L o .5000 4 at: man. an [Imam vnofimrzsnzs 10000 < - an an ‘ "070141 35000 I "a. C 01“} - n ' ammo: t:- 30003 < ; 2070 rue an :[ 9:5000 « I I A A ' ‘2 20000 4 I l are... mourn mom:- i , 113’ , a. ‘ ' 815000 I c. m n 0 l I 10000 | mm H "cm manu- l l‘ 5000 I1 I ‘ /\ \_ (l I "I" . 0 A A 1.4—» L A L ....1 1 )e‘ L A ..A- x A 4- 1 L "°°°° *' m- _ "0000 4 c ”tn Misapmmosmma 100000 < "one“ 90000 a ’ :1,” E 00000 -< , “' 9 70000 a!“ 0) 60000 J ... 3111 etc. quaint.) ; 50000 ° A mm: A 0 10000 4 film. 5"!" 11‘ myx ”0 30000 1 N 7‘ .‘ I use 0 20000 ‘ A , I , "m 0 mm fl M 547701! 10000 1 1 1 § 1111 1 Lir—lL L AquJ‘ LL {\4 ’ _? 1AA AL 4.? ‘1 1 1 1 #1 1 1 1 § 1 1 T 0 10 20 JO 40 50 60 70 00 90 $120 (00 Figure 27: Example electropherograms showing eight-plex analysis results. Multiplex A and B reaction products were combined in a single well and electrophoresed simultaneously through the CEQ 8000. Analyses yielded clean electropherograms, though some contained more balanced peaks than others. Eight-plex SNP primer extension analyses for sample numbers 060040, 060076, 060261, and 060419 are shown (top panel through bottom panel, respectively). ANCOVA Results, Descriptive Statistics, and Model Formulation Alleles for all SNPs were in HWE (p 2 0.17) with the exception of rsl426654, which exhibited weak evidence of a departure from HWE (p = 0.0517) resulting in excess heterozygosity. Data from the 85 samples in Table 5 were first analyzed using all four covariates (percent West African, East Asian, Native American, and European ancestries). However, since the covariates were constrained to add to a total of 100%, the degrees of freedom were reduced to zero, and the SPSS software could not perform data analysis. Subsequent SPSS analyses, using only three of the four covariates (percent West African, East Asian, and Native American ancestries; European ancestry was 64 chosen for elimination at random), were successful. The covariate selected for elimination was inconsequential, as each covariate can be calculated from the other three, and the statistical models generated and corresponding significance and R—square values did not vary (aside from altering the intercept value) depending on which covariate was eliminated. Descriptive statistics for the SNPs, including the mean M index values for individuals with each observed genotype, are displayed in Table 6. Single variable ANCOVAs indicated that no SNPs, individually, had a significant effect on M index value (Table 7), and rs7495174 was the only SNP with a marginally significant effect (p = 0.081). Adjusted R2 values ranged from 0.032 to 0.087. ANCOVAs revealed that rs7495174 was the most informative variable in estimating hair pigmentation, followed by rsl6891982, rs4778241, rsl448484, r51800404, r526722, rs4778138, and lastly rsl426654. 65 Table 6: Descriptive statistics for each SNP. The total number of each genotype observed and the corresponding mean and standard deviation of the M index values are shown. The character on the top row of the rsl426654 table represents the three samples for which that SNP could not be genotyped. lrsl800404 Mean Std. Deviation NI [rs7495l74 Mean Std. Deviation N AA 148 083 203640 12 AA M3427 17.5887 63 AG H3139 l3 9] 55 33 AG I50_632 14 5428 22 (30 l46.230 18.5558 40 Total 145.292 17.0672 85 Total H5292 170672 85 [rsl-126654 Mean Std. Deviation N lrs4778l38 Mean Std Deviation N , 150133 183303 3 AA l46.6l4 20.5515 l7 AA l39.680 l7.5787 5 AG 145.635 15 3687 42 AG H3874 16.6733 44 00 143872 I? 8205 26 CG H7593 17.7795 33 Total 145.292 l7.0672 85 Total ”5,292 l7.0672 85 [rs-1778241 Mean Std. Deviation N lrsl448484 Mean Std. Deviation NI AA 150 008 15.8785 17 CC H5732 19.7529 26 (‘A M4058 17.4526 36 (‘T H4089 14.8507 39 (‘C H3695 l7. I342 32 TT 147.063 l8.0772 20 Total HS .202 170672 85 Total I45292 17.0672 85 , rsl6891982 Mean Std. Deviation fl F526722 Mean Std. Deviation .\1 CC 143,303 I60|65 43 ('C 145.022 17.383] 75 (‘G 145.32l l7.056l 32 (‘T 147.313 15 I334 IO (10 131.967 lob-139 10 Total H5202 I7 0672 85 Total 145.292 l7.0672 85 Table 7: Singe variable ANCOVA results for each SNP. The F-statistic, significance at a = 0.05, R2, and adjusted R2 are listed. The significance and adjusted R2 values were ranked from 1 (highest significance or fit) to 8 (lowest significance or fit), to make an initial determination of which SNPs were most valuable in a model for estimating M index. An asterisk indicates marginal significance (p S 0.1). Independent Variable Gene F Sig. 01:21:: R2 “11:13th £83?ng rsl426654 SLC24A5 0.125 0.945 8 0.101 0.032 8 r526722 0.363 0.548 6 0.101 0.056 5 rsl6891982 SLC45A2 1,764 0,178 2 0.136 0.081 2 r51800404 0.646 0.527 5 0,1 12 0.055 6 rs7495174 3.121 0081* 1 0.131 0.087 1 rs4778138 0CA2 0.061 0.941 7 0.098 0.041 7 rs4778241 1.387 0.256 3 0.128 0.072 3 rsl448484 0.769 0.467 4 0,1 14 0.058 4 66 Factorial ANC OVA analysis of all eight independent variables, plus the three ancestry values as covariates, could not be executed, as it resulted in integer overflow due to too many factors (underlying variables), levels within factors, or hi gher—order interactions among factors. Consequently, SNP variables could not be utilized simultaneously in factorial ANCOVAS, although use of seven of them allowed analysis. The eight possible combinations of seven variables were run as eight separate factorial ANCOVAs (T able 8). Model M4 had the best (relatively high) fit, with an adjusted R- square of 0.667, followed by M6 and M7, also with a relatively high fit, Ml with a medium fit, and the rest with a relatively low fit. Terms with a significant effect on M index were noted for each model (Table 9) to ensure the final model would contain as many relevant terms as possible. Table 8: Factorial ANCOVA summary for each 7-SNP model. The R2 and adjusted R2 are listed from ANCOVA conducted on each seven- SNP model. Models are listed by the SNP excluded from the model, with the seven unlisted SNPs and the three covariates being included. The adjusted R2 values for each model were ranked from 1 (best fit) to 8 (worst fit), to determine which model was the best fit for estimating M index. Independent Variable , Model Excluded in Model R2 Adjusted Rank 0f ' 2 (and Gene of Origin) R Ad} R Ml rsl800404 (OCAZ) 0.941 0.453 4 M2 Is7495174 (OCAZ) 0.913 0050 M3 rs4778138 (OCA2) 0.858 -0324 M4 rsl426654 (SLC24A5) 0.956 0.667 M5 Is4778241 (0CA2) 0.828 0.038 M6 rsl448484 (0CA2) 0.975 0.652 M7 Is26722 (SLC45A2) 0.981 0.608 M8 rsl6891982 (SLC45A2) 0.858 0.146 U’IUJNONt—‘OOQ 67 Table 9: Summary of the terms with significant effects on M index from each factorial ANCOVA. AF, EA, and NA represent West African, East Asian, and Native American ancestries, respectively. All covariates and independent variables are listed as terms, as well as interactions that had a significant (p S 0.05, indicated by two asterisks) or marginally significant (p S 0.1, indicated by one asterisk) effect in at least one model. Cells were left blank if the term had no effect. The total number of times each term had an effect was tallied. Significance Term M1 M2 M3 M4 M5 M6 M7 M8 Total Models with Sig. AF 0.013“ 0.050" 2 EA 0090* 0.050“ NA r51800404 rs7495174 0.007" 0.017" 0.021M 0.022" rs4778138 rsl426654 rs477824l rsl 448484 r526722 0093* rsl6891982 rsl800404* rs4778138 0.042“ 0.006“ NOt—OOOOAOON rs I 800404* rs477824l 0.086* # rs7495174* rs4778138 0.005" 0.005" 0094* 0.020" 0.008“ rs7495174* rsl6891982 0066* rs4778138* rsl426654 0069* 0060* rs4778138* rs4778241 0058* rs4778138* rsl 6891982 0.046“ 0.015M rsl426654* rsl6891982 0079* 0080* rs4778241* rsl 448484 0.012" rs477824l * rsl6891982 0.06] * rsl448484* rsl6891982 0.049" The covariates, most specifically West African and East Asian ancestries, only had significant effects in some models (M4 and M8; M1 and M8, respectively). SNP rs7495174 was the only independent variable with a statistically significant effect on M 68 index value (in each of four models), while r526722 had a marginally significant effect, but only in one model. The rs7495174*rs4778138 term was the most important interaction, and had a significant effect in four models and a marginally significant effect in one. The rs1800404*rs4778138 and rs4778138*rs1689l982 interactions had significant effects in two models, while the rs477824l*rsl448484 and rsl448484*rsl6891982 interactions had significant effects in one. Lastly, the rs4778138*rsl426654 and rsl426654*rsl689l982 interactions had marginally significant effects in two models, and the rsl 800404*rs477824l, rs7495174*rsl6891982, rs4778138*rs4778241, and firs4778241*rsl689l982 interactions had marginally significant effects in one. SNPs rsl426654 and r326722 did not have statistically significant effects on M index (alone or in interactions), and were the least important terms for a model predicting hair pigmentation. This corresponds both with the one-way ANCOVA results, as those SNPs had the lowest and third—lowest p-values, respectively, and the factorial ANCOVA results, as the models excluding these SNPs (M4 and M7) were ranked first and third, respectively, in goodness of fit. Therefore, all analyses indicated that M4 was the model of best fit for predicting M index values. As a final check, one and two SNPs were removed from model M4, to determine if this improved the adjusted R-square by removing variables that may only have improved the model by chance. Models not containing (1) rsl426654 and rs26722, (2) rsl426654 and rsl448484, or (3) all three, resulted in a decrease in both R—square and adjusted R-square values. Consequently, M4 in its original form (ANCOVA results shown in Table 10) was still the model of best fit. 69 Table 10: Model M4 ANCOVA results. The degrees of freedom ((11), F-statistic, and significance are listed. The dashed area indicates that intermediate interaction terms are identical to those immediately above. The original table of SPSS results, including all interaction terms, plus parameters only utilized in calculations, is available electronically upon request. Source df F Sig. Corrected Model8 73 3.304 .017 Intercept 1 59.1 19 .000 AF 1 8.697 .013 NA 1 1.125 .312 EA 1 .660 .434 rs 1 800404 2 .483 .629 rs7495174 1 2.197 .166 rs4778138 2 1.571 .251 rs4778241 2 .428 .662 rsl448484 2 .554 .590 r526722 1 3.378 .093 rsl6891982 2 .493 .623 rsl800404 * rs7495174 0 rsl800404 * rs4778138 l 1 1.366 .006 r51800404 * rs4778241 2 .935 .422 rsl800404 * rsl448484 0 r51800404 * rs26722 0 . . rsl800404 * rsl6891982 l .125 .731 rs7495174 * rs4778138 1 12.430 .005 rs7495174 * rs477824l 0 rs7495174 * rsl448484 0 rs7495174 * r826722 0 rs7495174 * rsl6891982 1 4.154 .066 rs4778138 * rs4778241 1 .087 .773 rs4778138 * rsl448484 1 1.421 .258 rs4778138 * r326722 0 . . rs4778138 * rsl6891982 1 5.038 .046 70 Table 10 continued: Source df F Sig. rs4778241 * rsl448484 1 2.357 .153 rs477824l * rs26722 0 . . rs4778241 * rsl6891982 3 3.313 .061 rsl448484 * r526722 0 rsl448484 * rsl6891982 0 r526722 * rsl6891982 0 r51800404 * rs7495174 * rs4778138 0 rsl800404 * rs7495174 * rs4778138 * rs4778241 * rsl448484 * r526722 * 0 rsl6891982 Error 11 Total 85 Corrected Total 84 Dependent Variable: M index a. R Squared = .956 (Adjusted R Squared = .667) The R—square (0.956) and adjusted R-square (0.667) for Model M4 (displayed underneath Table 10) are relatively high. The design of model M4 is reproduced on the following page. Parameter estimates results, containing the model’s intercept and coefficients (based on genotype) used when estimating M index, are available electronically upon request. 71 Equation 1 (opposite page): Design of Model M4 for predicting an M index value from ancestry and the seven included SNP genotypes. AF, EA, and NA represent percents West African, East Asian, and Native American ancestries, respectively. The equation contains 131 terms. The correct coefficient values to insert when utilizing the equation to predict an M index value are located in the table of parameter estimates (available electronically upon request). 72 Design of Model M4: Intercept + AF + NA + EA + rs1800404 + rs7495174 + rs4778138 + rs4778241 + rs1448484 + r326722 + r316891982 + rs1800404 ' rs7495174 + rs1800404 * rs4778138 + rs1800404 * rs4778241 + rs1800404 * rs1448484 + rs1800404 ' rs26722 + rs1800404 * rs16891982 + rs7495174 * rs4778138 + rs7495174 ' rs4778241 + rs7495174 * rs1448484 + rs7495174 * r526722 + rs7495174 * rs16891982 + rs4778138 * rs4778241 + rs4778138 * rs1448484 + rs4778138 ‘ r326722 + rs4778138 * rs16891982 + rs4778241 " rs1448484 + rs4778241 * r526722 + rs4778241 * rs16891982 + rs1448484 * r826722 + rs1448484 * rs16891982 + r326722 * rs16891982 + rs1800404 ' rs7495174 * rs4778138 + rs1800404 * rs7495174 * rs4778241 + rs1800404 * rs7495174 * rs1448484 + rs1800404 * rs7495174 * r526722 + rs1800404 * rs7495174 * rs16891982 + rs1800404 ' rs4778138 * rs4778241 + rs1800404 * rs4778138 ‘ rs1448484 + rs1800404 * rs4778138 ‘ rs26722 + rs1800404 ‘ rs4778138 * rs16891982 + rs1800404 * rs4778241 * rs1448484 + rs1800404 ‘ rs4778241 * rs26722 + rs1800404 * rs4778241 " rs16891982 + rs1800404 * rs1448484 * rs26722 + rs1800404 * rs1448484 * rs16891982 + rs1800404 * r526722 * rs16891982 + rs7495174 * rs4778138 * rs4778241 + rs7495174 " rs4778138 ' rs1448484 + rs7495174 * rs4778138 * r826722 + r37495174 * rs4778138 " rs16891982 + rs7495174 * rs4778241 ' rs1448484 + rs7495174 * rs4778241 * r326722 + rs7495174 * rs4778241 ' rs16891982 + r37495174 ' rs1448484 * rs26722 + rs7495174 * rs1448484 * rs16891982 + rs7495174 * r326722 " rs16891982 + rs4778138 ' rs4778241 * rs1448484 + rs4778138 * rs4778241 * rs26722 + rs4778138 * rs4778241 ' rs16891982 + rs4778138 ' rs1448484 * rs26722 + rs4778138 ‘ rs1448484 * rs16891982 + rs4778138 * r526722 ’ rs16891982 + rs4778241 * rs1448484 ‘ r326722 + rs4778241 * rs1448484 ' rs16891982 + rs4778241 ‘ r826722 ' rs16891982 + rs1448484 * rs26722 * rs16891982 + rs1800404 * rs7495174 * rs4778138 * rs4778241 + rs1800404 * r37495174 * rs4778138 * rs1448484 + rs1800404 * rs7495174 * rs4778138 * r526722 + rs1800404 * rs7495174 * rs4778138 * rs16891982 + rs1800404 * rs7495174 * rs4778241 * rs1448484 + rs1800404 * rs7495174 ‘ rs4778241 * rs26722 + rs1800404 * rs7495174 * rs4778241 * rs16891982 + rs1800404 * rs7495174 ' rs1448484 * rs26722 + rs1800404 " r37495174 * rs1448484 * rs16891982 + rs1800404 * rs7495174 * r526722 * rs16891982 + rs1800404 * rs4778138 * rs4778241 * rs1448484 + rs1800404 * rs4778138 * rs4778241 * 1326722 + rs1800404 * rs4778138 * rs4778241 * rs16891982 + rs1800404 * rs4778138 * rs1448484 * r826722 + rs1800404 * rs4778138 * rs1448484 * rs16891982 + rs1800404 * rs4778138 * r526722 * rs16891982 + rs1800404 * rs4778241 * rs1448484 * r526722 + rs1800404 * rs4778241 * rs1448484 * rs16891982 + rs1800404 * rs4778241 * r526722 * rs16891982 + rs1800404 ' rs1448484 * rs26722 * rs16891982 + rs7495174 * rs4778138 * rs4778241 * rs1448484 + rs7495174 * rs4778138 * rs4778241 * r526722 + rs7495174 * rs4778138 " rs4778241 * rs16891982 + rs7495174 * rs4778138 * rs1448484 * r326722 + rs7495174 * rs4778138 * rs1448484 * rs16891982 + rs7495174 "’ rs4778138 " r526722 * rs16891982 + rs7495174 * rs4778241 * rs1448484 * rs26722 + r57495174 * rs4778241 * rs1448484 * rs16891982 + rs7495174 * rs4778241 * rs26722 * rs16891982 + rs7495174 * rs1448484 * r326722 * rs16891982 + rs4778138 * rs4778241 * rs1448484 * r526722 + rs4778138 * rs4778241 * rs1448484 * rs16891982 + rs4778138 * rs4778241 ’ rs26722 * rs16891982 + rs4778138 * rs1448484 * rs26722 * rs16891982 + rs4778241 " rs1448484 * r526722 * rs16891982 + rs1800404 * rs7495174 * rs4778138 * rs4778241 * rs1448484 + rs1800404 * r87495174 * rs4778138 * rs4778241 * r326722 + rs1800404 * rs7495174 * rs4778138 * rs4778241 * rs16891982 + rs1800404 * rs7495174 * rs4778138 * rs1448484 ' r526722 + rs1800404 * rs7495174 * rs4778138 * rs1448484 * rs16891982 + rs1800404 * rs7495174 " rs4778138 * rs26722 * rs16891982 + rs1800404 * rs7495174 * rs4778241 * rs1448484 * r526722 + rs1800404 * rs7495174 * rs4778241 " rs1448484 * rs16891982 + rs1800404 * rs7495174 * rs4778241 * r326722 * rs16891982 + rs1800404 * r57495174 * rs1448484 * rs26722 * rs16891982 + rs1800404 * rs4778138 * rs4778241 * rs1448484 " r526722 + rs1800404 ' rs4778138 * rs4778241 " rs1448484 * rs16891982 + rs1800404 * rs4778138 * rs4778241 * r826722 * rs16891982 + rs1800404 * rs4778138 " rs1448484 * r826722 * rs16891982 + rs1800404 * rs4778241 * rs1448484 * r526722 * rs16891982 + r37495174 * rs4778138 * rs4778241 * rs1448484 * r526722 + rs7495174 * rs4778138 * rs4778241 * rs1448484 * rs16891982 + rs7495174 ' rs4778138 * rs4778241 * r326722 * rs16891982 + rs7495174 * rs4778138 * rs1448484 * rs26722 * rs16891982 + rs7495174 * rs4778241 * rs1448484 * rs26722 * rs16891982 + rs4778138 * rs4778241 ’ rs1448484 * r326722 ’ rs16891982 + rs1800404 * rs7495174 * rs4778138 * rs4778241 * rs1448484 * r826722 + rs1800404 * rs7495174 * rs4778138 * rs4778241 * rs1448484 ‘ rs16891982 + rs1800404 * rs7495174 * rs4778138 * rs4778241 * r526722 " rs16891982 + rs1800404 * rs7495174 * rs4778138 * rs1448484 * r326722 * rs16891982 + rs1800404 " rs7495174 * rs4778241 * rs1448484 * r526722 * rs16891982 + rs1800404 * rs4778138 * rs4778241 * rs1448484 * r326722 * rs16891982 + rs7495174 * rs4778138 * rs4778241 * rs1448484 * r526722 * rs16891982 + rs1800404 * rs7495174 * rs4778138 * rs4778241 " rs1448484 * r526722 * rs16891982 73 DISCUSSION The goal of this research was to establish a genetic assay for predicting the level of hair pigmentation in an unknown individual from biological crime scene evidence (e.g. blood, semen, or saliva) or other human remains. The resulting information could expedite investigations by providing new suspect leads, narrowing a list of suspects, or facilitating identification of an unknown victim. The genotyping method used to develop this assay was SNP primer extension, which required multiple steps including DNA purification, amplification, gel electrophoresis, PCR product clean-up, SNP primer extension, SNP product clean-up, and capillary electrophoresis, but was technically simple to perform. Convenience of the assay was taken into consideration, as instruments utilized were standard equipment found in most forensic DNA laboratories, and supplies were relatively inexpensive. In order to develop the most efficient hair pigmentation assay, multiplex PCR and multiplex SNP assay procedures were developed and tested for accuracy. SNP products were analyzed using two multiplexes, and to make analysis more expeditious, eight-plex capillary electrophoresis was successfully carried out. The sample set utilized originated from African American volunteers exhibiting 25% or greater West African ancestry via DNAWitnessTM 2.5 analysis. This criterion was selected because it was one of the two main ancestries of the sample donors and was one of the four main continental ancestries (West African, European, Native American, and East Asian) assessed in the AIM panel. West African ancestry can be informative for phenotype since West Africans are considered quite genetically distinct, as they are geographically and evolutionarily isolated relative to many other populations 74 (Ramachandran et a1. 2005; Jakobsson et al. 2008; Li et a1. 2008). For example, North African ancestry is less informative as this group is geographically closer to the Middle East and Europe, and has evolutionarily intermediate genetic traits to those of Europeans, West Africans, and East Asians (Li et al. 2008). African American samples were specifically chosen for DNA analysis because these individuals have substantial admixture, resulting in ALD and AS that are useful in revealing relationships between polymorphisms and phenotype, and because this was the most readily obtainable sample set with genetic admixture and available ancestry measurements. Few problems were encountered with the purification and amplification techniques. Only one sample, 060369, could not be amplified, and it exhibited a very thick caking of blood on each punch. According to Whatman’s instructions for applying blood to FTA cards, pooling of blood should be avoided as it may overload the chemicals on the card. If too much blood was applied to the 060369 sample card, the DNA may not have been properly stabilized and protected from nucleases, microbes, fungi, and oxidation, and could have been degraded. Two punches were also tested to determine if more DNA aided amplification, though it did not. It is possible that the thick layer of blood made removal of heme, a PCR inhibitor (Akane et a1. 1994), more difficult during DNA purification. One other possibility is that the PCR was overloaded with DNA, which causes PCR inefficiency by exhausting primers due to excessive supply of targets (reviewed by Hummel 2003). Future efforts could include additional purification by organic extraction, DNA dilution, or addition of BSA to bind PCR inhibitors, for samples with suspected inhibition or overloading. 75 Some PC R targets naturally amplify more efficiently than others based on properties such as length, GC content, flanking sequence, or accessibility within the genome (reviewed by Markoulatos et al. 2002). Differences in amplification are further exacerbated in multiplex PC R, when competition for reaction components can result in preferential amplification of some targets over others. During analysis of the few blood specimens where multiplex primer extensions yielded genotype results for only some of the SNPs, the two largest SNP products (which also originated from the two longer multiplex PC R products) generated lower intensity peaks and inconclusive results more often. This may have resulted from DNA degradation, e.g. due to FTA card overload as described above, leading to preferential amplification of shorter targets. Re-amplification with adjusted multiplex primer concentrations, or by single-amplicon reactions, typically yielded conclusive results. Exceptions included the three samples that could not be successfully genotyped at rsl426654 (050355, 060078, and 060626), which also had a thick layer of blood and may have experienced F TA card overloading. Therefore, the rs 1 426654 locus could have been too degraded for efficient amplification. Optimized multiplex SNP primer extension reactions were effective on the remaining samples. Differences between fragment sizes determined by the CEQ 8000 software and actual size of the extended primers had no effect on the assay, as SNP locus tags were defined based on fragment sizes observed during optimization. These discrepancies were normal since, as mentioned in the SNPStart package insert, the sequence and dye-labeled ddNTP incorporated for each fragment can affect migration time, and hence the calculated fragment size. The SNP primers from Sigma-Genosys or IDT functioned well, except for the original rs4778241 SNP primer, which formed an 76 artifact peak in electropherograms. Shortening the primer eliminated the artifact, which was most likely caused by formation of a hairpin loop that always permitted extension. It is unclear why peaks were balanced in some electropherograms but not in others, or why heterozygote peaks for certain SNPs were imbalanced. A few trends in peak levels were apparent, including the tendency for rs1448484 T peaks to be more intense than C, rs16891982 G peaks to be more intense than C, rs4778138 A peaks to be more intense than G, and rs4778241 C peaks to be more intense than A. This may have resulted from differences in which ddNTP is most readily incorporated by the polymerase, but was not corroborated by SNPs with the same base composition. Other A/G and T/C SNPs yielded balanced peaks, and the G/C and C/A combinations each only occurred in one SNP. The two bases immediately upstream (toward the 5’ end) from the SNP base, which differed from the other SNPs with the same base composition, could also have contributed to the imbalance, as they are known to affect the rate of labeled ddNTP incorporation and hence electropherogram peak height (Parker et al. 1995; Parker et al. 1996). Another possibility is that the spectral properties of the dyes affected peaks heights (Beckman Coulter support scientists, personal communication). The blue dye attached to ddTTP both absorbs and emits light at higher energy wavelengths, resulting in greater signal intensity than the dyes that emit lower energy light. Therefore, taking into consideration the emission energies of the different dyes, ddTTP should tend to produce more intense peaks, followed by ddGTP (green), ddCTP (yellow), and ddATP (red). Also, if a dye has an excitation wavelength farther from the laser emission wavelength, the signal produced will be less intense (reviewed by Butler 2005). The signals received by the detector are further complicated by the implementation of two lasers within the 77 CEQ 8000, which simultaneously excite different dyes. These spectral properties of the dyes and lasers could have contributed to differences in peak heights observed for rsl448484, r516891982, and rs4778241 heterozygotes. Attempts at eight-plex analyses also resulted in additional peak imbalance on electropherograms between loci that originated from multiplex A versus multiplex B, due to differences in product concentrations between the two reactions before they were combined for electrophoresis. This should be rectifiable by altering SNP reaction primer concentrations or SNP multiplex product amounts added during capillary electrophoresis. Although peaks in the eight—plex electropherograms were slightly compacted, even a three nt difference between SNP interrogation primers rs4778241 and r51800404 was enough to clearly resolve the fragments, and all eight SNPs were determinable simultaneously. Uncalled peaks, artifacts such as those depicted in Figures 22, 24, and 26, and distorted electropherograms were dealt with on a case-by-case basis, interpreting or re- analyzing results as previously described (see Results). Artifacts may have resulted from: (1) spurious extension products from partially degraded DNA, (2) spurious extension products from primer dimers or secondary structure, or (3) dye blobs, which originate from dyes that have detached from the ddNTPs. The pattern of artifact peaks often observed in multiplex A electropherograms, depicted in Figure 24 (top two and bottom-most panel) and Figure 26 (top panel), could not have resulted from DNA degradation products, since it also appeared in reagent blanks and negative controls. The pattern occurred not only in multiplex A reactions, but in individual primer extension reactions containing only the rsl426654 or rs4778138 SNP primers. Its intensity also 78 increased as the SN P kit expiration date approached. While this does not exclude the primers as a cause, it was hypothesized that the artifacts resulted from dyes that detached as the kit aged, comigrating with SNP products; although the dyes are much smaller in size than SNP products, differences in charge may impart a similar mass—to-charge ratio. These artifacts only impeded genotyping of samples with low intensity peaks. Should artifacts affect future analyses, column filtration (which filters out free dyes and unincorporated labeled ddNTPs) can be tested as an alternative to SAP purification to determine if this method was not fully effective. SN P Descriptive Statistics, SN P Correlation with Hair Pigmentation, and Experimental Model Statistical Significance ANC OVA results, in which the variation in the dependent variable is divided into components and attributed to the independent variables, demonstrated whether there was a significant correlation between hair pigmentation (M index value) and genotype. The mean M index value differences among the three possible genotypes in the single variable ANCOVAs (Table 6, and detailed below) were not statistically significant, therefore conclusions regarding these differences must be regarded with caution. Nevertheless the descriptive statistics for each SNP still provided important information on the correlation between genotype and hair pigmentation. Examination of all eight factorial ANCOVA models revealed which SNPs had a statistically significant effect on M index individually as well as in SNP—SNP interactions (Table 9; all further references to interactions signify SNP—SNP interactions, unless otherwise specified). Individually, only one SNP had a significant effect on hair pigmentation (rs7495175; 0.007 S p _<_ 79 0.022), while a second had a marginally significant effect (r526722; p = 0.093). When interactions are considered, the remaining SNPs also exhibited effects on M index, five of which were statistically significant and six of which were marginally significant (detailed below and in Table 9). The genotype results from SLC24A5 SNP rsl426654 paralleled descriptions for skin pigmentation by Lamason et al. (2005), in that the ancestral allele (G) was associated with darker pigmentation, and the derived allele (A) with lighter pigmentation. Individuals homozygous for the G allele had darker hair pigmentation on average than heterozygotes and AA homozygotes by 3.72 and 7.91 M index units, respectively. However, rsl426654 did not have a statistically significant effect on the M index in this research and had a marginally significant effect in only two interactions (Table 9). Considering that the study by Brilliant (2008) revealed this SNP was among the top two contributors to total hair melanin content (p = 2.76x10’35) and the hair eumelaninzpheomelanin ratio (p = 1.97x10"0), it is possible that these findings resulted from low sample size. Additionally, rsl426654 was the sole SNP for which HWE was questionable (p = 0.0517), though the deviation beyond p = 0.05 was very slight. The resulting excess heterozygosity could indicate selection against homozygotes (those having the darkest and the lightest hair pi gmentations), but is most likely attributed to peculiarities in the data after sampling. Statistical analyses of the two SLC45A2 SNPs produced different results. The descriptive statistics for SNP r326722 corresponded with allele—phenotype trends for hair reported by Branicki et al. (2008), where the ancestral C allele imparted lighter pigmentation. Individuals with the CT genotype in this study had a higher average M 80 index than those with the CC genotype by 2.29 M index units. No TT genotypes were observed in this sample, which was anticipated as the derived allele is rare in Europeans and Africans (Soejima and Koda 2007; statistical significance not reported). Other researchers (Graf et al. 2005; Soejima et al. 2006) also found the derived allele frequencies to be highest in Asian populations, however no significant correlation was noted in this study between East Asian ancestry and rs26722 genotype; there was very little Asian ancestry in the individuals studied here (range = 0—1 6% East Asian ancestry; mean = 2.12%; median = 0%; mode = 0%), therefore the correlation was likely undetectable. This SNP exhibited a marginal effect on hair pigmentation in only one factorial model (p = 0093), possibly due to the lack of TT homozygotes, and it was not involved in any interactions. Genotype results from the other SLC45A2 SNP investigated, rsl6891982, also corresponded with those of Branicki et a1. (2008), as the ancestral C allele correlated with darker hair pigmentation. Homozygotes for the C allele had the most pigmentation, with a mean M index value greater than heterozygotes and GG homozygotes by 3.05 and 16.40 units, respectively. The SNP had a statistically significant effect in two interactions and a marginally significant effect in three interactions. Graf et al. (2005) found a significant association between these two SLC45A 2 SNPs and hair pigmentation, although the SNPs did not correlate with hair pigmentation in any of the experimental models presented here and in Branicki et a1. (2008). The latter authors observed a significant association of both SNPs independently with hair color, although only rsl6891982 remained significantly associated when including both SNPs in a factorial model. They, along with Graf et al. (2005) and Soejima et al. (2006), found 81 these two SNPs to be in LD (p < 0.0001, p-value not reported, and p < 0.000001, respectively) and all proposed that rs26722 only correlated with pigmentation indirectly because of its linkage with r316891982. Additional findings supporting this theory include inconsistencies between r526722 allele frequencies and phenotype: Soejima et al. (2006) noted that while the rsl6891982 ancestral allele imparted darker pigmentation and was common in Africans (frequency of > 98%), the r526722 derived allele imparted darker pigmentation and was rare in Africans (frequency of ~ 3—4%). Graf et al. (2005) reported similar rs26722 allele frequencies between Caucasians and Australian Aborigines (ancestral allele frequencies of 97.2% versus 97.1%) despite “obvious” differences in pigmentation between these two groups. Findings reported here support the theory that only rs 1 6891982 correlates with hair pigmentation in a factorial model. Results from two of the three OCAZ intron 1 SNPs (rs7495174 and rs4778241) corresponded with findings from Duffy et a1. (2007) and Shekar et al. (2008 B). Homozygotes for the A allele of rs7495 1 74 had lighter hair pigmentation than heterozygotes by 7.21 M index units on average. No GG homozygotes were observed in the sample, which was not surprising since the GO genotype is absent in Europeans and has a frequency of 0.017 in Sub-Saharan Africans (dbSNP 2008). SNP rs7495174, which was the most informative of the eight SNPs, had a statistically significant effect individually in four models, one statistically significant interaction and one marginally significant interaction. Similarly, descriptive statistics for rs4778241 revealed that homozygotes for the C allele had lighter hair pigmentation on average than heterozygotes and AA homozygotes by 0.36 and 7.21 M index units, respectively, corresponding to findings by Duffy et a1. 2007 and Shekar et a1. (2008 B). ANCOVA results showed this 82 SNP was the least informative OCAZ SNP, having one statistically significant interaction and three marginally significant interactions. In contrast, results from the third 0CA2 intron 1 SNP, rs4778138, did not correspond with those of Duffy et al. (2007) and Shekar et al. (2008 B), who reported that the A allele contributed significantly to lighter eye (p = 4.45x10'54) and hair (p ~ 3 x 104) pigmentation, respectively. In this study, homozygotes for the A allele of rs4778138 had darker hair pigmentation than heterozygotes and GG homozygotes on average by 0.98 and 2.74 M index units. ANC OVA results indicated that rs4778138 was informative, with three statistically significant interactions (0.005 S p S 0.046) and two marginally significant interactions. The lower p-values for the association of this SNP with hair pigmentation in comparison to that for eye pigmentation, as well as the relatively small M index differences between genotypes, indicates a greater effect on eyes than hair. Additionally, results from Duffy et al. (2007) and Shekar et al. (2008 B) originated from Australians, the vast majority of whom were of northern European ancestry. Some other SNPs may therefore be equally as or more important than rs4778138 in individuals with different ancestry (here, West African ancestry). This often results from interpopulation differences in LD and allele frequencies (reviewed by Neale and Sham 2004), or gene- gene interactions and locus heterogeneity (reviewed by Moore 2003). Future genotyping of additional samples with West African admixture can test this possibility by increasing the power. This may generate allele—phenotype correlations that more closely match those of Shekar et al. (2008 B), as their sample size was over 19 times larger. However, since the standard deviation in M index for each of the three rs4778138 genotypes was 83 similar to those observed for other SNPs (Table 6), and alleles for all 0CA2 SNPs were in HWE, use of a larger sample size for this study may yield similar results. There are two ways by which the three OCAZ intron 1 SNPs could affect the P protein. A possible biochemical explanation for their correlation with pigmentation, suggested by Duffy et al. (2007), is that their location at the 5’ end of the gene puts them in tight linkage with regulatory elements affecting gene expression. If this is the case, individuals with A, C, and G alleles for the rs7495174, rs4778241, and rs4778138 SNPs, respectively, would experience down-regulation of OCAZ transcription, perhaps allowing more efficient repressor binding or less efficient activator binding to the regulatory region. Reduced P protein production would then result in less efficient pH regulation within follicular melanosomes, less efficient melanin synthesis, and decreased hair pigmentation. Alternatively, as three OCAZ alternative splicing isoforms have previously been noted (Rinchik et al. 1993; Lee et al. 1995; Gerhard et al. 2004), these SNPs may result in splicing differences that affect P protein structure and functionality. The three 0CA2 SNPs did not yield a significant three-way interaction in any models, despite being located in a single haplotype block. However, the rs7495174*rs4778138 interaction had a significant effect in four models, and the rs4778241*rs4778138 interaction had a marginally significant effect in one model. The rs4778138*rs16891982 ANCOVA result also indicates that an OCAZ and SLC45A2 gene—gene interaction significantly affects hair pigmentation (0.015 E p S 0.046), and the regulatory (OCAZ) or functional (SLC45A2) effects of the SNPs of the two genes may have a combined influence. Less efficient melanosomal protein transport due to rsl6891982 genotype would result in lower levels of intramelanosomal tyrosinase and 84 possibly other proteins necessary for melanogenesis. In combination with less efficient intramelanosomal pH regulation (which is known to affect both the activity of tyrosinase and the rate of melanogenesis; Ancans et al. 2001) from rs7495174 genotype, this SLC45A2—OCA2 interaction further decreases pigmentation levels. The effects of OCAZ SNPs rsl448484 and rsl 800404 on hair pigmentation have not been reported in the literature, although their influences on eye (rsl800404 only, F rudakis et al. 2003) and skin pigmentation (both SNPs, Shriver et a1. 2003; Lao et al. 2007) have been documented. In this research, neither SNP genotype imparted the same continuous effect on hair pigmentation that was observed for the other SNPs; rsl448484 heterozygotes exhibited lighter average hair pigmentation than CC and TT homozygotes by 1.64 and 2.97 M index units, respectively. These results are counterintuitive, as heterozygous genotypes would not be expected to alter the amount or functionality of P protein over homozygous genotypes for non-heteropolymeric proteins (Gardner et al. 1992; Rinchik et al. 1993). Despite the unexpected genotype-phenotype correlations, rsl448484 was still informative for estimating hair M index values as two interactions involving rsl448484 had statistically significant effects (p = 0.012 and 0.049). It is possible that as the SNP is located in an intron, it may just be in LD with a more probative marker that affects mRNA processing. This could lead to alternate forms of the protein, less efficient melanosomal pH regulation and melanogenesis, and decreased hair pigmentation, resulting in lighter hair color. However, since the M index differences between genotypes are relatively small (larger ones in this study ranged from 7.21 to 16.40 M index units), and the descriptive statistics do not support the ANCOVA results, rsl448484 may have no real effect on hair pigmentation. 85 One of the interactions, rsl448484*rsl689l982, again indicates that an OCAZ and SLC45A2 gene—gene interaction significantly affects hair pigmentation (p = 0.049). In this case, r516891982 amino acid sequence changes may result in less efficient melanosomal protein transport. SNP rsl448484 mRNA sequence changes may result in alternative splicing, and hence alternate forms of the protein that impede efficient intramelanosomal pH regulation (see above). In combination, SNP genotypes from these two genes would impart an additional or magnified effect on hair M index value. Similar to rsl448484, rs 1 800404 results involved counterintuitive descriptive statistics but ANCOVA interactions that did affect hair M index. Heterozygotes had lighter average hair pigmentation than GG and AA homozygotes by 3.09 and 4.94 M index units, respectively, while ANCOVA results included an interaction with a statistically significant effect and one with a marginally significant effect on hair pigmentation. The unexpected correlations between genotypes and M indices for rsl 800404 indicate it may have had a significant effect only because it is in LB with another, nearby SNP that causes an amino acid change. This is substantiated by the SNP’s allele synonymy; P protein structure is unaffected by rsl800404 genotype, since the A and G alleles both encode an alanine. Other 0CA2 SNPs that have been implicated in pigmentation variation include rsl800401 (Frudakis et al. 2003), rsl800407 (Duffy et al. 2007) and rsl 800414 (Brilliant 2008), which do cause amino acid changes. SNP r51800410, which is also located in an intron, had one of the largest effects on hair pigmentation in the study by Brilliant (2008). These SNPs are linked to rsl800404, being in the same gene, and future study may confirm significant effects on hair pigmentation. 86 Model M4 was the superior model for predicting hair pigmentation because it had the best fit of all the factorial models. Although there were factors and interactions with statistically significant effects in other models that were not seen in M4, this model still included the majority of those terms (Table 9); only two of the 15 correlating terms were missing from M4, both of which exhibited marginal effects on M index (0.060 S p3477813g . “.4366“ 5 0.069; 0.079 S 95.436654 . 5,689,982 5 0.080). The corrected model had statistically significant results (p = 0.017) and explained 66.7% percent of the variation in M index, accounting for hair pigmentation variation imparted by SLC45A2 coding SNPs, putative 0CA2 regulatory SNPs, a potential OCAZ coding SNP (or tight linkage to one), and ancestry. OCAZ regulatory changes and SLC45A2 coding sequence changes were the major influences on hair pigmentation. OCAZ SNPs rs7495174, rs4778138, rs4778241, and rsl448484 likely caused alterations in mRNA processing, resulting in splicing differences that affect P protein structure and functionality (or the former three were possibly in tight linkage with elements affecting gene expression). The remaining 0CA2 SNP, r51800404, may only be informative for a coding variation due to LD with another SNP. SLC45A2 SNP rsl6891982 leads to an amino acid change affecting MATP function, while r526722 appears to be in LD with the former. Interactions between the SNPs, as the most significant factors affecting M index, show that these sequence changes in combination have the most important effects on the M index. This study and that of Brilliant (2008) both involve the use of SNPs in modeling and estimating hair pigmentation, although half of the SNPs investigated (rs7495174, rs4778241, rs4778138, and rsl448484) were not included in the 74 SNPs examined by Brilliant (2008), and two ofthe SNPs in that study (rsl800410 in OCAZ and rsl805007 in 87 MC 1 R, both of which were among the top SNPs affecting hair pigmentation) were not investigated in this research. SNPs r516891982 and r81800404 exhibited significant effects on hair pigmentation in both studies. However, the two SNPs in this research with no significant effect on hair pigmentation (rsl426654 and r826722) were significant in Brilliant’s study (2008). This discrepancy can be explained by insufficient power or an indirect correlation between these SNPs and hair pigmentation due to LD (discussed above). Additionally, the sample set utilized by Brilliant (2008) was comprised of individuals with diverse ethnicity (self-reported), including African Americans, Caucasian Americans and Europeans, Asians, Asian Americans, Native Americans, and Hispanic Americans, although ancestry was not measured and controlled for, possibly affecting SNP correlations with hair pigmentation. Furthermore, Brilliant (2008) divided the effects on hair pigmentation into two categories: total amount of melanin and ratio of black-to-red melanin (both measured by HPLC); these differences in how hair color is measured (compared to degree of pigmentation as measured by spectroscopy in this study), could cause incongruities in the data, producing different statistical results. The biochemical method for quantifying melanin using HPLC utilized by Brilliant (2008) allows for a more precise, direct quantification, although it is expensive, time- consuming, and destructive. Reflectance spectroscopy was considered to be the best method for this study due to its cost-efficiency, portability, and ease of use, while still providing accurate measurements of pigment levels in hair (Shriver and Parra 2000). The additional information provided by dividing hair pigmentation measurements into total hair melanin and the eumelaninzpheomelanin ratio is also less important for this study as dark hair color was prevalent among the participants, so pheomelanin levels would have 88 been low. The results would likely have been similar if Brilliant’s (2008) method had been used instead. Alternatively, if the overall goal were to obtain the most precise, maximum amount of information, and especially if individuals with greater European ancestry were part of the study, the method utilized by Brilliant (2008) would be better for measuring phenotype. Future development of this hair pigmentation assay must address several difficulties, starting with testing the accuracy of model M4. A second sample set comprised of individuals with similar admixture (Brazilians with 25% or more West African ancestry) is available for genotyping, upon which the data are inserted into the model and used to estimate hair pigmentation for each individual. Comparing those results to the actual hair M index measurements for these individuals will reveal the accuracy of the model. Next this process should be repeated, testing the model on individuals with different types of admixture. Since all samples analyzed in this research were from individuals with 25% or more West African ancestry, SNPs with a statistically significant effect on M index may differ from those in groups with other admixture. If this is the case, they must be analyzed to determine any differences in the correlation between SNP genotype and hair M index value. Until all genetic determinants of hair color are known and can be accounted for in the assay, a potential problem with its use on forensic samples is if genotype—hair pigmentation correlations differ between admixed groups. Consequently, an ancestry test, such as DNAWitnessTM 2.5 would need to be carried out, and alternate equations developed and utilized for estimating hair M index, depending on the ancestry of the donor. This would be highly undesirable, as the 89 overarching goal is development ofa hair pigmentation assay that will work on anyone, regardless of ancestry. The primary statistical concern with the current model for estimating hair pigmentation is a potential lack of power due to the relatively small sample size. To address this, another sample set with similar admixture should be genotyped and the results added to those presented here. The larger sample size could clarify the importance of factors that had a marginally significant effect on hair pigmentation, and may even provide enough data to create a model containing all eight SNPs in SPSS. Additional SNPs must also be studied to ensure the loci most informative for hair pigmentation are included in the model. It is evident that some informative SNPs were absent from the model, since 33.3% of variation in hair M index is still unexplained within the group studied (and forseeably even more variation unexplained for groups not yet studied), and much of this is likely due to SNPs. Based on the ANCOVA results, SNPs rs7495174, rs4778138, rs4778241, rsl448484, and rsl6891982 should be included in further model development. SNP r326722 should be excluded as it had no statistically significant effect on hair pigmentation and is likelyjust in LD with rsl6891982. While rsl426654 also had no statistically significant effect, genotypes still correlated with M index, so further research with more samples is warranted. Additionally, r51800404 should be excluded because of SNP synonymy and replaced by neighboring non- synonymous SNPs with which it may be in LD (e.g. rsl 800401, r51800407, rsl800414). In contrast, all three of the OCAZ SNPs inherited in the same haplotype block should still be included since the allele combinations occur as eight known haplotypes (Duffy et al. 2007; Iida et al. 2009), and use of just one or two SNPs is not adequate to obtain the full 90 informational value ofa haplotype. It is also recommended that the two additional SNPs found to be informative by Brilliant (2008), r31800410 (0CA2) and rsl805007 (MC/R), be included in future studies. However, as r81805007 plays a role in red hair pigmentation, it may not be infomiative in the current sample set, as the vast majority of the individuals had darker hair and substantial African ancestry, and MC 1R SNPs are more infomiative in individuals with lighter hair and greater European ancestry (Shriver et a1. 2003). Another matter to resolve involves primer design, which is essential for making the assay more practical in a forensic context. Sample storage on FTA paper does not mimic crime scene conditions, where biological samples are often degraded. The longest PC R product in this research was 487 bp (over three times longer than the shortest), whereas many researchers strive for multiplex amplicons of 150 bp or less for analysis of degraded samples (Asamura et al. 2007; Hill et al. 2008). Longer products were initially chosen for ease in developing the multiplex assays, following the SNPStart Kit package insert recommendations to design products around 100—300 bp and selecting greater size differences to increase the ability to resolve the PCR products on an agarose gel. Now that the preliminary multiplex assays have been successfully developed, the PCR primers should be redesigned to generate shorter amplicons for use on more degraded samples. Next, the assay results must be made more practical for investigators, by drawing a connection between M index values and perceived hair color. A chart of M index values could be developed to match the hair pigmentation estimates with a hair color description, such as light, medium, and dark shades of blond, red, and brown, as well as black. Additionally, a hair sample ring could be created containing hair swatches with 91 corresponding M index values (Figure 28), which could be utilized as a reference, to aid investigators in differentiating hair colors and to define specific demarcations between hair color descriptions. For example, the ring can visually demonstrate differences between medium brown and dark brown hair, to help prevent varied interpretations of these hair colors. Buccal swabs can then be taken from remaining suspects for DNA profile comparison to crime scene DNA evidence. Similarly for human remains, the categorical hair descriptions and sample ring can be used to help narrow a missing persons list and aid relatives or friends in identifying the deceased. Figure 28: Hair Color Sample Ring. A sample ring with example hair color categories (e.g. light brown) could be created for use by police in identifying suspects or victims. Samples matching the predicted hair color would be used for comparison, for greater ease of differentiating similar hair colors. (Image from: http' //WWW. wigsfactory. " P- ..l. y/ii'urai y/Tree/ Hair_ Care_ Products/Natural_ Color _Ring .jpg) A final consideration is that hair color is not uniform across an individual’s scalp, and can change with age. Additionally, environmental factors including the amount of UV exposure due to time of year and geographical location can affect an individual’s hair pigmentation. Application and interpretation of the model (or the range of error with 92 which the results are interpreted) will need to be adjusted to account for these additional sources of variation in M indices. 93 CONCLUSIONS The genetic and statistical methods presented here demonstrate that SNPs can be identified and models developed that are predictive of hair pigmentation in individuals with known ancestry. Model M4 is not for immediate use in forensic casework, but instead lays the groundwork for development of a hair pigmentation assay for future use in forensic DNA laboratories. The assay is designed to estimate pigmentation levels in scalp hair, as facial, pubic, and other body hair were not investigated. The model is not applicable for bald individuals, those with dyed, graying, or red hair, or for detecting variation in natural hair color due to environmental factors. Fortunately, most of these characteristics (with the exception of some dyed hair and certain environmental factors) are easily identified on suspects, and further investigation can still proceed accordingly, foregoing hair pigmentation analysis. The current model may only accurately predict hair pigmentation for individuals with similar admixture to the sample set studied, i.e. individuals having 25% or more West African ancestry, with Europeans and West Africans as the main parental contributors. Further development of the model with additional samples and SNPs might allow future use of the assay on any crime scene sample. An additional obstacle is that SNP analysis may be difficult on degraded samples. The methods described here were not optimized for degraded DNA, but could be adapted for that purpose. Genotyping procedures were designed using equipment and methods familiar to the forensic biology community and employing multiplex PCR and SNP reactions for efficiency. When a biological sample is collected and analyzed, and a CODIS search yields no matches, the hair pigmentation assay can be utilized. The model provides a 94 preliminary basis for estimating hair color for a physical profile, which could be used to produce an overall “fuzzy photo” of the unknown individual. The results may generate leads that are instrumental to law enforcement in cases where initial identification of suspects or potential victims from a missing persons list is unsuccessful. 95 BIBLIOGRAPHY Akane A, Matsubara K, Nakamura H, et al. “Identification of the heme compound copurified with deoxyribonucleic acid (DNA) from bloodstains, a major inhibitor of polymerase chain reaction (PCR) amplification”. J Forensic Sci. 1994 Mar: 39(2): 362-72. Alaluf S, Atkins D, Barrett K, et al. “The impact of epidermal melanin on objective measurements of human skin colour”. Pigment Cell Res. 2002 Apr: 15(2): 1 I9- 26. Ancans J, Tobin DJ, Hoogduijn MJ, et al. “Melanosomal pH controls rate of melanogenesis, eumelanin/phaeomelanin ratio and melanosome maturation in melanocytes and melanoma cells”. Exp Cell Res. 2001 Aug 1: 268(1): 26-35. Asamura H, F ujimori S, Ota M, et al. “MiniSTR multiplex systems based on non-CODIS loci for analysis of degraded DNA samples”. Forensic Sci Int. 2007 Nov 15: 173(1): 7-15. Beckman Coulter Inc. “GenomeLabTM SNPStart Primer Extension Kit”. Package insert. Beckmancom. 2007. Available online: . Biesecker LG, Bailey-Wilson J E, Ballantyne J, et al. “Epidemiology. DNA identifications after the 9/1 I World Trade Center attack”. Science. 2005 Nov 18: 310(5751): 1122-3. Bonilla C, Shriver MD, Parra EJ, et al. “Ancestral proportions and their association with skin pigmentation and bone mineral density in Puerto Rican women from New York city”. Hum Genet. 2004 Jun: 115(1): 57-68. Borges CR, Roberts JC, Wilkins DG, et al. “Relationship of melanin degradation prochJcts to actual melanin content: application to human hair”. Anal Biochem. 2001 Mar 1: 290(1): 116-25. 96 Branicki W, Brudnik U, Draus-Barini J, et al. “Association of the SLC45A2 gene with physiological human hair colour variation”. J Hum Genet. 2008: 53(11-12): 966- 71. Brilliant MH. “Gene Polymorphism and Human Pigmentation”. NCJRS. NCJ Document No: 223980. NIJ-Sponsored Research Final Report. September 2008. Available online: . Bureau of Justice Statistics (81 S). “Bureau of Justice Statistics Criminal Justice System Description”. US. Department of Justice Website. 2004. Available at: . Butler J M. “Forensic DNA Typing, Second Edition: Biology, Technology, and Genetics of STR Markers”. Burlington: Elsevier Academic Press, 2005. C ommo S, Gaillard O, and Bernard BA. “Human hair greying is linked to a specific depletion of hair follicle melanocytes affecting both the bulb and the outer root sheath”. Br J Dermatol. 2004 Mar: 150(3): 435-43. Database of Single Nucleotide Polymorphisms (dbSNP). National Center for Biotechnology Information, National Library of Medicine. 2008. (dbSNP Build ID: Build 129). Available online: . de C assia Comis Wagner R, Kiyohara PK, Silveira M, et al. “Electron microscopic observations of human hair medulla”. J Microsc. 2007 Apr: 226(Pt 1): 54-63. Deedrick DW and Koch SL. “Microscopy of Hair Part 1: A Practical Guide and Manual for Human Hairs”. Forensic Sci Commun. 2004 Jan: 6(1). Available online: . den Dunnen JT and Antonarakis SE. “Mutation Nomenclature Extensions and Suggestions to Describe Complex Mutations: A Discussion”. Hum Mutat. 2000: 15(1): 7-12. Available online: . Desnos C, Huet S, and Darchen F. “'Should I stay or should I go?': myosin V function in organelle trafficking”. Biol Cell. 2007 Aug: 99(8): 41 1-23. DNAPrint® Genomics, Inc. “Forensics”. DNAPrint® Genomics Products & Services Website. 2007. Available at: . Duffy DL, Montgomery GW, Chen W, et al. “A three-single—nucleotide polymorphism haplotype in intron 1 of OCA2 explains most human eye-color variation”. Am J Hum Genet. Feb 2007: 80(2): 241-52. 97 F indlater J E, Kramp H, and Wolfe EJ. “The Michigan Law Enforcement Response to Domestic Violence Officer Manual, Fourth Edition”. SOS — Publications and Forms. State of Mighigan Website. Oct 2007. Available at: . Frudakis T. Molecular Photofitting: Predicting Ancestry and Phenotype Using DNA. Burlington: Elsevier Academic Press, 2008. F rudakis T, Thomas M, Gaskin Z, et al. “Sequences associated with human iris pigmentation”. Genetics. 2003 Dec: 165(4): 2071-83. Gardner J M, Nakatsu Y, Gondo Y, et al. “The mouse pink-eyed dilution gene: association with human Prader-Willi and Angelman syndromes”. Science. 1992 Aug 21: 257(5073): 1121-4. Gerhard DS, Wagner L, Feingold EA, et al. “The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC)”. Genome Res. 2004 Oct: 14(IOB): 2121-7. Graf J, Hodgson R, and van Daal A. “Single nucleotide polymorphisms in the MATP gene are associated with normal human pigmentation variation”. Hum Mutat. 2005 Mar: 25(3): 278-84. Grimes EA, Noake PJ, Dixon L, et al. “Sequence polymorphism in the human melanocortin 1 receptor gene as an indicator of the red hair phenotype”. Forensic Sci Int. 2001 Nov 1: 122(2-3): 124-9. Grove G, Zerweck C, and Damia J. ”Human Skin Coloration using the RGB Color Space Model”. CyberDERM, Inc. Presented at The L’Oréal Institute for Ethnic Hair & Skin Research 4th International Symposium. Nov. 9-1 1, 2007. Available online: . Halder I and Shriver MD. “Measuring and using admixture to study the genetics of complex diseases”. Hum Genomics. 2003 Nov: 1(1): 52-62. Halder I, Shriver M, Thomas M, et al. “A panel of ancestry informative markers for estimating individual biogeographical ancestry and admixture from four continents: utility and applications”. Hum Mutat. 2008 May: 29(5): 648-58. Hall D. “Single-nucleotide polymorphism”. Wikimedia file description page. 6 July 2007. Available online: . 98 Han J, Kraft P, Nan H, et al. “A genome-wide association study identifies novel alleles associated with hair color and skin pigmentation”. PLoS Genet. 2008 May 16: 4(5): e1000074. Harris EE and Meyer D. “The molecular signature of selection underlying human adaptations”. Am J Phys Anthropol. 2006: Suppl 43: 89-130. Hess AF. “The Newer Knowledge of the Physiological Action of Ultra-Violet Rays”. Proc Am Philos Soc. 1926: 65(3): 202-206. Hill CR, Kline MC, Coble MD, et a1. “Characterization of 26 miniSTR loci for improved analysis of degraded DNA samples”. J Forensic Sci. 2008 Jan: 53(1): 73-80. Hilebner C, Petermann I, Browning BL, et a1. “Triallelic single nucleotide polymorphisms and genotyping error in genetic epidemiology studies: MDRI (ABCBI) G2677/T/A as an example”. Cancer Epidemiol Biomarkers Prev. 2007 Jun: 16(6): 1185-92. Hummel S. “Ancient DNA Typing: Methods, Strategies, and Applications”. Berlin: Springer, 2003. Iida R, Ueki M, Takeshita H, et al. “Genotyping of five single nucleotide polymorphisms in the OCA2 and HERC2 genes associated with blue-brown eye color in the Japanese population”. Cell Biochem Funct. 2009 Jul: 27(5): 323-7. Inagaki K, Suzuki T, Shimizu H, et al. “Oculocutaneous albinism type 4 is one of the most common types of albinism in Japan”. Am J Hum Genet. 2004 Mar: 74: 466- 471. Ito S. “High-performance liquid chromatography (HPLC) analysis of eu- and pheomelanin in melanogenesis control”. J Invest Dermatol. 1993 Feb: 100 (2 Suppl): l66S-17IS. Jablonski NG and Chaplin G. “The evolution of human skin coloration”. J Hum Evol. 2000 Jul: 39(1): 57-106. Jakobsson M, Scholz SW, Scheet P, et a1. “Genotype, haplotype and copy-number variation in worldwide human populations”. Nature. 2008 Feb 21: 451(7181): 998-1003. J imbow K, Ishida O, Ito S, et al. “Combined chemical and electron microscopic studies of pheomelanosomes in human red hair”. J Invest Dermatol. 1983 Dec: 81(6): 506-1 1. Kaessmann H, Wiebe V, Weiss G, et al. “Great ape DNA sequences reveal a reduced diversity and an expansion in humans”. Nat Genet. 2001 Feb: 27(2): 155-6. 99 Klimentidis YC and Shriver MD. “Estimating genetic ancestry proportions from faces”. PLoS One. 2009: 4(2): e4460. Kobayashi T, Vieira WD, Potterf B, et a1. “Modulation of melanogenic protein expression during the switch from eu- to pheomelanogenesis”. J Cell Sci. 1995 Jun: 108(Pt 6): 2301-9. Kushimoto T, Valencia JC, Costin GE, et al. “The Seiji memorial lecture: the melanosome: an ideal model to study cellular differentiation”. Pigment Cell Res. 2003 Jun: 16(3): 237-44. Lamason RL, Mohideen MA, Mest JR, et al. “SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans”. Science. 2005 Dec 16: 310(5755): 1782-6. Lao 0, de Gruijter J M, van Duijn K, et al. “Signatures of positive selection in genes associated with human skin pigmentation as revealed from analyses of single nucleotide polymorphisms”. Ann Hum Genet. 2007 May: 71(Pt 3): 354-69. Lee ST, Nicholls RD, Jong MT, et al. “Organization and sequence of the human P gene and identification of a new family of transport proteins”. Genomics. 1995 Mar 20: 26(2): 354—63. Li JZ, Absher DM, Tang H, et a1. “Worldwide human relationships inferred from genome-wide patterns of variation”. Science. 2008 Feb 22: 319(5866): 1100-4. Liu Y, Hong L, Wakamatsu K, et a1. “Comparison of structural and chemical properties of black and red human hair melanosomes”. Photochem Photobiol. 2005 Jan-Feb: 81(1): 135-44. Markoulatos P, Siafakas N, and Moncany M. “Multiplex polymerase chain reaction: a practical approach”. J Clin Lab Anal. 2002: 16(1): 47-51. McEvoy B, Beleza S, and Shriver MD. “The genetic architecture of normal variation in human pigmentation: an evolutionary perspective and model”. Hum Mol Genet. 2006 Oct 15: 15 Spec No 2: R176-81. Miller RD, Taillon-Miller P, and Kwok PY. “Regions of Low Single-Nucleotide Polymorphism Incidence in Human and Orangutan Xq: Deserts and Recent Coalescences”. Genomics. 2001 Jan 1: 71(1): 78-88. Moore JH. “The ubiquitous nature of epistasis in determining susceptibility to common human diseases”. Hum Hered. 2003: 56(1-3): 73-82. 100 MullerJ and Kelsh RN. “A golden clue to human skin colour variation”. Bioessays. Jun 2006: 28(6): 578-82. Nakayama K, F ukamachi S, Kimura H, et a1. “Distinctive distribution of AIMI polymorphism among major human populations with different skin color”. J Hum Genet. 2002: 47(2): 92-4. Neale BM and Sham PC. “The future of association studies: gene-based analysis and replication”. Am J Hum Genet. 2004 Sep: 75(3): 353-62. Newton J M, Cohen-Barak O, Hagiwara N, et al. “Mutations in the human orthologue of the mouse underwhite gene (uw) underlie a new form of oculocutaneous albinism, OCA4”. Am J Hum Genet. 2001 Nov: 69(5): 981-8. Norton HL, F riedlaender J S, Merriwether DA, et al. “Skin and hair pigmentation variation in Island Melanesia”. Am J Phys Anthropol. 2006 Jun: 130(2): 254-68. Norton HL, Kittles RA, Parra E, et al. “Genetic evidence for the convergent evolution of light skin in Europeans and East Asians”. Mol Biol Evol. 2007: 24(3): 710-22. Okazaki K, Uzuka M, Morikawa F, et al. “Transfer mechanism of melanosomes in epidermal cell culture”. J Invest Dermatol. 1976 Oct: 67(4): 541-7. O’Neil D. “Modern Human Variation: Overview”. Palomar College, Behavioral Sciences Department, Modern Human Variation Website. 2009. Available at: . Ortonne JP and Prota G. “Hair melanins and hair color: ultrastructural and biochemical aspects”. J Invest Dermatol. 1993 Jul: 101(1 Suppl): 828-898. Park J H and Lee MH. “A study of skin color by melanin index according to site, gestational age, birth weight and season of birth in Korean neonates”. J Korean Med Sci. 2005 Feb: 20(1): 105-8. Parker LT, Deng Q, Zakeri H, et al. “Peak height variations in automated sequencing of PCR products using Taq dye-terminator chemistry”. Biotechniques. 1995 Jul: 19(1): 116-21. Parker LT, Zakeri H, Deng O, et al. “AmpliTaq DNA polymerase, F S dye-terminator sequencing: analysis of peak height patterns”. Biotechniques. 1996 Oct: 21(4): 694-9. Parra EJ. “Human Pigmentation Variation: Evolution, Genetic Basis, and Implications for Public Health”. Am J Phys Anthropol. 2007: Suppl 45: 85-105. 101 Prota G, Hu DN, Vincensi MR, et al. “Characterization of melanins in human irides and cultured uveal melanocytes from eyes of different colors”. Exp Eye Res. 1998 Sep: 67(3): 293-9. Puri N, Gardner J M, and Brilliant MH. “Aberrant pH of Melanosomes in Pink-Eyed Dilution (p) Mutant Melanocytes”. J Invest Dermatol. 2000 Oct: 115(4): 607-13. Quillen EE and Shriver MD. “SLC24A5: exchanging genetic and biochemical knowledge”. Pigment Cell Melanoma Res. 2008 Jun: 21(3): 344-5. Ramachandran S, Deshpande O, Roseman CC, et al. “Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa”. Proc Natl Acad Sci USA. 2005 Nov 1: 102(44): 15942-7. Reed AC. “Tropical Climatology”. The Scientific Monthly. 1927: 25(5): 404-416. Rhine S. “Bone Voyage: A Journey in Forensic Anthropology”. University of New Mexico Press: Albuquerque, NM: 1998. Rinchik EM, Bultman SJ, Horsthemke B, et al. “A gene for the mouse pink-eyed dilution locus and for human type II oculocutaneous albinism”. Nature. 1993 Jan 7: 361(6407): 72-6. Ritter, Nancy. “Missing Persons and Unidentified Remains: The Nation’s Silent Mass Disaster”. The N11 Journal. Jan 2007: 256: 2-7. Roberts DF and Kahlon DP. “Environmental correlations of skin colour”. Ann Hum Biol. 1976 Jan: 3(1): 11—22. Rundshagen U, Zuhlke C, Opitz S, et al. “Mutations in the MATP gene in five German patients affected by oculocutaneous albinism type 4”. Hum Mutat. 2004: 23: 106- 1 10. Sachs J S. “DNA And A New Kind Of Racial Profiling Police sketches from eyewitness accounts are notoriously unreliable. The question is, Will "DNA sketches" be any better?” Popular Science. 2003: 263(6): 16. Salceda R and Riesgo-Escovar JR. “Characterization of calcium uptake in chick retinal pigment epithelium”. Pigment Cell Res. 1990 Sep: 3(3): 141-5. Shekar SN, Duffy DL, Frudakis T, et al (A). “Spectrophotometric methods for quantifying pigmentation in human hair-influence of MCIR genotype and environment”. Photochem Photobiol. 2008 May-Jun: 84(3): 719-26. 102 Shekar SN, Duffy DL, Frudakis T, et al (B). “Linkage and association analysis of spectrophotometrically quantified hair color in Australian adolescents: the effect of OCA2 and HERC2”. J Invest Dermatol. 2008 Dec: 128(12): 2807-14. Shriver MD and Parra EJ. “Comparison of narrow-band reflectance spectroscopy and tristimulus colorimetry for measurements of skin and hair color in persons of different biological ancestry”. Am J Phys Anthropol. 2000 May: 1 12(1): 17-27. Shriver MD, Parra EJ, Dios S, et al. “Skin pigmentation, biogeographical ancestry and admixture mapping”. Hum Genet. 2003 Apr: 112(4): 387-99. Singh SK, Nizard C, Kurfurst R, et al. “The silver locus product (Silv/gplOO/Pmell7) as a new tool for the analysis of melanosome transfer in human melanocyte- keratinocyte co—culture”. Exp Dermatol. 2008 May: 17(5): 418-26. Slominski A, Wortsman J, Plonka PM, et a1. “Hair Follicle Pigmentation”. J Invest Dermatol. 2005 Jan: 124(1): 1321. Smith LM and Burgoyne LA. “Collecting, archiving and processing DNA from wildlife samples using FTA databasing paper”. BMC Ecol. 2004 Apr 8: 4: 4. Sobrino B, Brion M, and Carracedo A. “SNPs in forensic genetics: 3 review on SNP typing methodologies”. Forensic Sci Int. 2005 Nov 25: 154(2-3): 181-94. Soejima M and Koda Y. “Population differences of two coding SNPs in pigmentation- related genes SLC24A5 and SLC45A2”. Int J Legal Med. 2007 Jan: 121(1): 369. Soejima M, Tachida H, Ishida T, et al. “Evidence for recent positive selection at the human AIMI locus in a European population”. Mol Biol Evol. 2006 Jan: 23(1): 179-88. Sparling B. “Educational Resources: Ultraviolet Radiation”. NASA Advanced Supercomputing, Education Resources Website. 2001. Available at: . Staricco RJ and Pinkus H. “Quantitative and qualitative data on the pigment cells of adult human epidermis”. J Invest Dermatol. 1957 Jan: 28(1): 33-45. Stokowski RP, Pant PV, Dadd T, et al. “A genomewide association study of skin pigmentation in a South Asian population”. Am J Hum Genet. 2007 Dec: 81(6): 1 1 19-32. Sturm RA, Box NF, and Ramsay M. “Human pigmentation genetics: the difference is only skin deep”. Bioessays. 1998 Sep: 20(9): 712-21. 103 Sturm RA, Teasdale RD, and Box NF. ”Human pigmentation genes: identification, structure and consequences of polymorphic variation”. Gene. 2001 Oct 17: 277(1-2): 49-62. Thody AJ, Higgins EM, Wakamatsu K, et al. “Pheomelanin as well as eumelanin is present in human epidermis”. J Invest Dermatol. 1991 Aug: 97(2): 340-4. Thomas L and Juanes F. “The importance of statistical power analysis: an example from Animal Behaviour”. Anim Behav. 1996 Oct: 52(4): 856-9. Thong HY, J ee SH, Sun CC, et al. “The patterns of melanosome distribution in keratinocytes of human skin as one determining factor of skin colour”. Br J Dermatol. 2003 Sep: 149(3): 498-505. Tidball-Binz M. “Forensic Anthropology and Medicine: Complementary Sciences from Recovery to Cause of Death”. Schmitt A, Cunha E, and Pinheiro J, Ed. Totowa: Humana Press, 2006. Tishkoff SA and Kidd KK. “Implications of biogeography of human populations for 'race' and. medicine”. Nat Genet. 2004 Nov: 36(11 Suppl): 821-7. Tobin DJ. “Human hair pigmentation — biological aspects”. International Journal of Cosmetic Science. 2008 Aug: 30(4): 233-57. US. Department of Energy (USDOE) Office of Science. “SNP Fact Sheet”. Office of Biological and Environmental Research, Human Genome Project Information Website. 2007. Available at: . US. Department of Justice (USDOJ). “Text Version of CODIS Brochure”. FBI CODIS Website. 2008. Available at: . Voisey J and van Daal A. “Agouti: from mouse to man, from skin to fat”. Pigment Cell Res. 2002 Feb: 15(1): 10-18. Westerhof W. “Evolutionary, biologic, and social aspects of skin color”. Dermatol Clin. 2007 Jul: 25(3): 293-302, vii. Whatman plc. “Cross-Contamination Study: Carryover Does Not Occur During Punching and Processing of F TA or CloneSaver Cards”. Whatman Supporting Documentation - FTA Nucleic Acid Collection, Storage and Purification. 2003. Available online: . Wilkerson CL, Syed NA, Fisher MR, et a1. “Melanocytes and iris color: Light microscopic findings”. Arch Ophthalmol. 1996 Apr: 1 14(4): 437-42. 104 Yuasa I, Umetsu K, Watanabe G, et al. “MATP polymorphisms in Germans and Japanese: the L374F mutation as a population marker for Caucasoids”. Int J Legal Med. 2004 Dec: 118(6): 364-6. Zhu G, Evans DM, Duffy DL, et al. “A genome scan for eye color in 502 twin families: most variation is due to a QTL on chromosome 15q”. Twin Res. 2004 Apr: 7(2): 197-210. 105 3 9 2 1 3