5d”; ‘ $2253.. . axhiaw. 2i ., . ... at . H .Wmmfixmm? z. u. ' . Evita 2! , .._. fl...»:L.e.u . 1.. ; 002 LIBRARY Michigan State University This is to certify that the thesis entitled EXAMINATION OF STRUCTURAL POLYMORPHISM AT D21S11 LOCUS; IMPLICATIONS IN HUMAN IDENTIFICATION APPLICATIONS presented by Brian Aloysius Higgins has been accepted towards fulfillment of the requirements for the Master of degree in Criminal Justice Science Major Professor’s Signature 7/]6 /@ ?' Date MSU is an affirmative-action, equal-opportunity employer PLACE IN RETURN Box to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/07 p'lClRC/DateDueindd-pj EXAMINATION OF STRUCTURAL POLYMORPHISM AT D21 Sll LOC U S; IMPLICATIONS IN HUMAN IDENTIFICATION APPLICATIONS By Brian Aloysius Higgins A THESIS Submitted to Michigan State University In partial fulfillment of the requirements For the degree of MASTER OF SCIENCE Department of Criminal Justice 2007 ABSTRAC.r EXAMINATION OF STRUCTURAL POLYMORPHISM AT D21 Sll LOCLIS; IMPLICATIONS IN HUMAN IDENTIFICATION APPLICATIONS By Brian Aloysius Higgins Length polymorphism analysis of DNA fragments is the conventional method used in the human identification fields. Structural polymorphisms of STR alleles have the same base pair length, but differ in the composition of the repeats in the variable regions. and their examination can provide a second level of discrimination in human identity applications. This study revealed seven structural polymorphisms in allele 3‘) at STR locus D2181 1 from African American and Caucasian individuals in North America, including three which have never been observed before. Statistical analysis of the power of discrimination, heterozygosity, and the power of exclusion, useful in forensic and paternity analysis, all increased when the seven variants characterized in this study w ere taken into account. The study underscored the advantages of characterizing STR structural polymorphisms over length polymorphism, and explores the potential applications of this added discriminatory power. Copyright by BRIAN ALOYSIUS HIGGINS 2007 I dedicate this thesis to my loving wife Misi, who understands that quitting is not an option and to all individuals in the world who do not realize that, when there is a will there is a way. iv ACKNOWLEDGMENTS I would like to thank Margaret Kline from National Institute of Standards and Technology, for providing the D2181 1 forward and reverse primer sequences and subsequent PCR conditions, Marco Scarpetta Ph.D. for providing facts pertaining to paternity testing, Marco Scarpetts Ph. D. and David Foran Ph. D. for assistance in formulating this thesis, and finally David Foran Ph. D. and Orchid Cellmark for providing reagent, materials, and instrumentations for this research. TABLE OF CONTENTS LIST OF TABLES ........................................................................................................... viii LIST OF FIGURES ........................................................................................................... ix INTRODUCTION .............................................................................................................. 1 Human Identity Testing .............................................................................................. 1 Early Methods of Human Identification ................................................................... 1 Genetic Testing for Human Identification ................................................................. 2 Early Genetic Testing Methods .................................................................................. 3 Early DNA Genetic Testing ....................................................................................... 4 Introduction of the Polymerase Chain Reaction ........................................................ 7 PolyMarker and DQa — Sequence Specific Oligonucleotide Probes ........................ 7 Short Tandem Repeats ............................................................................................... 9 Statistical Calculations Used to Evaluate S TRs ...................................................... 10 DNA Databases ........................................................................................................ 1 1 Statistical Calculations Used in Human Identity Applications ............................... 12 Genetic Mutations .................................................................................................... 14 Structural Polymorphisms of STR Loci ................................................................... 15 Structural Polymorphism Researchfor Human Identity Testing ............................. 19 MATERIALS AND METHODS ...................................................................................... 20 D21 S 11 Sample Identification ................................................................................. 20 Allele Isolation ......................................................................................................... 20 DNA Sequencing ...................................................................................................... 21 DNA Sequence Analysis ........................................................................................... 22 Statistical Analysis ................................................................................................... 23 RESULTS ......................................................................................................................... 25 21511 Allele 30 Structural Polymorphism Distribution ....................................... 25 Distribution Analysis of Variance ........................................................................... 26 Power of Discrimination Analysis Incorporating Structural Variant Frequencies 27 Heterozygosity Analysis Incorporating Structural Variant Frequencies ................ 27 Power of Exclusion Analysis Incorporating Structural Variant Frequencies ......... 27 Hypothetical Statistical Analysis of Power of Discrimination, Power of Exclusion, and Heterozygosity .................................................................................................. 28 DISCUSSION ................................................................................................................... 30 The Origins of Structural Polymorphism ................................................................. 30 Statistical Analysis of Power of Discrimination, Power of Exclusion, and Heterozygosity ......................................................................................................... 35 Assisting Human Identification STR Testing Applications with Structural Polymorphisms ......................................................................................................... 36 Proposed Technique for Analysis of Structural Polymorphisms ............................. 38 Structural Polymorphism in the Courtroom ............................................................ 41 Vi APPENDICES .................................................................................................................. 44 BIBLIOGRAPHY ............................................................................................................. 66 vii LIST OF TABLES Table l -Allele frequencies of different populations for 12 alleles at D881 I79 locus ....... 3 Table 2 - Identification of ABO Blood Types by Agglutination ....................................... 4 Table 3 - Example of simple. compound, and complex repeats of STRs ......................... 10 Table 4 - Paternity index formulae ............................................ . ...................................... 13 Table 5 — Known structural polymorphisms at the VW A locus ........................................ 17 Table 6 - Known structural polymorphisms at the D381 358 locus .................................. 17 Table 7 - D2181 1 primer sequences used for PCR and sequence reactions ..................... 21 Table 8 - Equations used in the statistical analysis of D2181 1 allele 30 structural polymorphisms .................................................................................................................. 24 Table 9 - Structural polymorphisms observed at allele 30 of the D2181 l locus... ........... 25 Table 10 - Frequency of structural polymorphism for allele 30 at D218] 1 ..... . ............... 26 Table l l - Comparison of descriptive statistics with and without structural polymorphisms ......................................................................................... . ................. . ...... 28 Table 12 - Comparison of statistical analysis with and without structural variants ......... 29 Table 13 - Mutational pathways from ancestral allele 32 Table 14 - Single base substitution mutation of A—rG creating structural variants ......... 33 Table 15 - Single base substitution mutation of G—rA creating structural variants ......... 33 Table 16 - Insertion mutation in structural variants of allele 29. ...................................... 34 Table 17 - Deletion mutation in structural variants of allele 3! ........ . ......................... 34 Table 18 — Probe sequences that can distinguish all seven structural polymorphisms of D2181] Allele 3O .............................. . ............................... . ............................................... 39 Table 10 - Pattern of hybridization for D2181 l Allele 30 Probes .................................... 40 viii LIST OF FIGURES Figure 1 - RFLP Analysis ................................................................................................... 6 Figure 2 - Example of DQa Typing and the reverse dot blot system ................................. 8 Figure 3 - Slippage mutations .............. I ............................................................................. 1 5 Figure 4 - Proposed view of microchip array ................................................................... 42 ix INTRODUCTION Human Identity Testing Identification of individuals is an important segment of forensic science. A number of techniques, notably fingerprints and genetic testing, are utilized to distinguish victims and/or criminal suspects. Genetic testing. in particular short tandem repeat analysis, is used to identify casualties in mass disasters, in patemity testing, and in criminal investigations. A great deal of research is being performed to increase the speed (Paegel et al. 2003 Bienvenue et al. 2006) and success rate (Coble and Butler 2005) of genetic typing on marginal evidentiary samples. Early Methods of Human Identification The first system used for personal identification, termed anthropometry and introduced by Alphonse Bertillon, was based on precise body measurements. This technique described by Gloor (1980) and used in the late 18003, incorporated eleven measurements of an individual; including height, reach, width of the head, and length of the left foot. The idea behind the system was that skeletal dimensions remain fixed from the age of twenty until death, and that bone lengths were thought to be extremely diverse. It was believed that no two individuals could have exactly the same measurements. Fingerprinting, which replaced Bertillon’s anthropometry system. is of course still widely used for identification. The classification of finger ridge patterns, first described by Francis Galton in the book Finger Prints, was developed in the late 1800’s (Crow 1993). Fingerprint identification is accomplished by first classifying ridge patterns into three categories: arches, loops. and whorls. Ridge characteristics, or minutia. are then examined and compared (Stigler 1995). Fingerprints are remarkably stable from early childhood to death and although their size may increase with age, the ridge characteristics do not change. Even with the introduction and advances of new methods to distinguish individuals, fingerprints are still a standard form of identification throughout the world. Genetic Testing for Human Identification Genetic testing technologies are based on the determination of which alleles, or genetic variants, are located at a particular site (locus) of deoxyribonucleic acid (DNA). Every individual has two alleles per locus—one inherited from each parent. The prevalence of a specific allele in a population is termed the allele frequency, and many laboratories have tabulated frequencies for the loci they test (Table 1). The frequency of an allele often varies among populations or ethnic groups and is used in statistical calculations to determine the probability of identity or paternity. Loci are chosen such that alleles are inherited independently, therefore, the likelihood that an individual’s alleles would occur together is the product of the frequency with which each allele occurs by itself. This product rule (Evett et a1. 1996) also states that the allele frequencies from all loci used in a genetic test can be multiplied to find the probability of occurrence of the genetic profile in a given population. As more loci are added, the probability of an individual’s specific profile coincidentally appearing within the population decreases. The genetic tests described below determine which alleles are found at different DNA loci. little I - Allele frequencies of different populations for 12 alleles at D88l17g locus. Allele . African American I Caucasian : Hispanic 8 i 0.00237 ; 0.0169 0 00663 9 0.00487 i 0.0127 I 000957 ‘0 0.0232 I 0.0915 I 0.0862 I I 0.0445 I 0.0724 0.0611 '3 0.119 0147 I 0.122 , ‘3 0.203 0.326 i 0.295 '4 0.335 0.l98 i 0.253 , ‘5 0.194 0.l06 ' 0.132 16 0.0606 . 0.0267 0.0309 17 0.01 19 I 000362 0.0037 '8 0.00122 0.000323 ; 0.003682 ‘9 0.000321 i j D881179 is a short tandem repeat (described below) and is located on chromosome eight. Allele frequencies differ among ethnic groups. For example, the most common allele in African Americans is l4, while in Caucasians and Hispanics it is 13. Note also that allele [9 has only been observed in African Americans. Allele frequencies from Einum and Scarpetta 2004. Earl} Genetic Testing Methods Antigens on red blood cells can be used to help identify an individual. Antigens themselves are not alleles, but are the result of what allele is encoded by a person’s DNA. The six different major blood group-systems that have been used for human identification are ABO, Rh, MNSs, Kell, Duffy, and Kidd (Singh et al. 1982). An individual’s blood type is determined by adding anti-serum (antibodies) to a sample of blood, which may undergo agglutination (clumping of the cells), in the presence of the correct antigen. An individual with the blood type AB would show agglutination when testing with anti-A or anti-B antiserum. Alternatively. a type 0 blood sample would not show a reaction against either of these antisera (Table 2’). Discovery of the ABO system and that each blood group is derived from a genetic locus was the beginning of genetic testing for human identification. Table 2 - Identification of ABO Blood Types by Ag Iutination Blood Tested With Blood Type Antigens Present in Anti A Anti B Blood A A + - B B - + AB Both A and B + + 0 Neither A nor B - - The four ABO blood types are shown in the first column. The respective antigens for each type and their agglutination reactions are given. (+ shows agglutination; - shows absence of agglutination) Adapted from Saferstein 1998 Enzymes are also found in blood cells, and their variability can be used to further identify an individual. These proteins are isolated by simple lysis of the red blood cells into aqueous solution (Smithies 1955). By using the process of electrophoresis, the enzymes can be separated and identified through various staining or specific enzymatic reactions. Unfortunately, obtaining results from dried blood stains can be problematic in blood group and enzyme testing. Red cell antigens and enzymes are not particularly stable molecules and can denature when blood dries and ages. Furthermore, relatively few alleles exist in the majority of these test systems, providing little discriminatory power. Early DNA Genetic Testing Advances in DNA technology in the 19703 paved the way for the detection of variation in specific DNA sequences and shifted the study of human polymorphisms from the protein products of DNA to DNA itself (National Research Council 1992). The goal of these studies was to find regions of DNA that vary among individuals. The research showed that repeated DNA sequences—small segments of DNA that repeat multiple times—exist and vary in length from person to person. These sequences, termed variable number of tandem repeats (VN TRs). are highly polymorphic, and are dispersed throughout the human genome. The first characterization of VNTRs described them as short tandem repeated DNA sequences 10 to 15 base pairs in length (leffreys et al. 1985a). Later research characterized other VNTRs with repeats as long as 100 base pairs (Chambers and MacAvoy 2000). Their relatively high mutation rates create the variability in the number of repeats observed from person to person. A method for detection of VNTRs, called restriction fragment length polymorphism (RFLP) analysis, is the origin of modern DNA-based human identity testing (Jeffreys et al. 1985b). RF LP testing (Figure 1) uses a restriction enzyme that excises regions from the DNA strand, including those containing the VNTR. The fragments of DNA are separated by size by gel electrophoresis and visualized for analysis with radioactive or chemiluminescent probes. By analyzing a sufficient number of VNTRs, one can reduce the probability of a chance match between two persons to an extremely low level (National Research Council 1992), providing sufficient discriminatory power for human identification in the forensic and paternity fields. However, RFLP requires a large quantity of intact DNA, which is not always found in evidentiary samples from crime scene investigations or mass disasters. Samples that do not have substantial amounts of intact DNA may not provide results through RF LP analysis. restriction endoniclease cleavage Sites I 9repeats I I 111888.88 I - 8 {888.818. I. TEL... :‘Tf‘l 1 11 F“ ___, . ; 1 l8 I 8repeats I______ I 41????“ ,_ __i I I 9-9“ I I Digest DNA with I 1 restriction enzyme. I Lane 1 2 11 — IO 9 - 8 — _ number of repeats in 7 restriction fragment 5 5 4 — minisatellite patterns Figure l - RF LP Analysis — A restriction enzyme is used to cut the DNA at specific sites. The resulting DNA fragments are then separated by size through electrophoresis. The DNA fragments (alleles) are then visualized through fluorescent probes. The sample results are the number of repeats at a VNTR: Lane 1 = 8 (repeats), 9 (repeats) Lane 2 = 4, 11 Lane 3 = 8, 8 Figure from Fairbanks and Anderson 1999 Introduction ofrhe Polymerase. Chain Reaction Enhancements have been made to human identity testing since the introduction of VNTRs and RF LP. The polymerase chain reaction (PCR) (Mullis et al. 1986) advanced the analysis of DNA evidence found at crime scenes. PCR can produce many copies of the targeted VNTR through DNA amplification. This procedure uses DNA primers— small sequences of DNA, chosen to span both ends of the VNTR of interest, and DNA polymerase—an enzyme that assembles a new strand of DNA. During a cycle of amplification, the primers attach to their complementary target sequence and the DNA polymerase replicates the original DNA strand. This process is similar to DNA replication during cell division, but is limited to the VNTR region. Each amplification cycle doubles the amount of the targeted DNA strand. so 30 cycles amplifies the DNA , roughly one billion fold, producing a sufficient amount of material to determine the allele(s) present. An advantage of PC R is that foreign DNA, for instance from bacteria or fungi, is not amplified, thereby providing detection of only the target DNA. PolyMarker and DQa-—Sequem'e Specific Oligonucleotide Probes The AmpliType® HLA DQa Forensic Amplification and Typing Kit was the first commercially available forensic kit based on PCR (Fildes and Reynolds 1995). The test determines the HLA DQa genotype of DNA samples through the use of reverse dot blot technology, by detecting single base polymorphisms (Conner et al. 1983) and was implemented in hundreds of laboratories worldwide (Fildes and Reynolds 1995). The technology allows a simultaneous screening of many known variants at the DQu locus and consists of hybridization of amplified products to oligonucleotide probes. which are bound to a nylon membrane. Visualization of the hybridized alleles is accomplished by a simple colorimetric reaction, allowing the analyst to score the DNA sample for the alleles present at the DQa locus (Figure 2) (Saiki et al. 1989). Figure 2 - Example of DQu Typing and the reverse dot blot system The DQa types are determined by the patterns of PCR product binding to the probes, which is indicated by dots. Formation of a dot at 1 indicates that one or two of alleles 1.1, 1.2, and 1.3 are present, and dots at 1.3, 2, 3, and 4 indicates that respective allele is present. A dot at ‘1.2, 1.3, 4’ indicates one or two of these alleles are present and ‘all but 1.3’ will indicate any allele is present except 1.3. C is a control indicating that the amplification was successful. The DQa alleles present for each row are as follows: Row1—1.3,4 Row2—2,4 Row3—I.l,4 Row4—l.l,4 Rows-1.3.4 Row 6 - 1.1, 1.2 Row 7 — 1.3, 4 Figure from Saferstein 1993 Subsequent advances led to the development of the first DNA test in which multiple loci were analyzed at one time, termed multiplexing. This test, performed using the AmpliType PolyMarker (PM) Amplification and Typing Kit, simultaneously amplifies six genetic loci; HLA DQa, Low Density Lipoprotein Receptor (LDLR), Glycophorin A (GYPA), Hemoglobin G-gamma Globin (HBGG), D788, and Group Specific Component (GC) (Fildes and Reynolds 1995). PCR products are hybridized to the provided strips and visualized with the same colorimetric reaction as for DQa. The resulting dots are used to determine the genotypes of the DNA sample at each locus. Amplifying, visualizing, and analyzing multiple loci simultaneously greatly reduces the testing time for a DNA sample. Short Tandem Repeats Short tandem repeats (STRs) are a subset of VNTRs and are comparable in all characteristics except in the length of the repeated sequence (Craig et al. 1988). Analysis of STRs represents a significant advancement in human identity testing over the VNTRs. Most STRs are comprised of repeated sequences of two to five base pairs in length, with alleles differing in the number of repeat units (Barber et a1. 1996). Three classes of STR’s exist—simple, compound, and complex repeats (Table 3). Simple repeats contain units of identical length, while compound repeats are composed of two or more repetitive elements. Complex repeats may contain several repeat blocks of variable length, along with variable intervening sequences (Urquhart 1994). There are many advantages of analyzing STRs over VNTRs. One example is the ability to obtain results from degraded samples, often found in forensic casework, which is very difficult given the large length of VNTR alleles. STR alleles are shorter in repeat length (Momhinweg et. al 1998), with those most commonly used for human identification applications having four or five base pair repeats. Therefore, STR loci are better candidates for analysis of degraded samples. Table 3 - Example of simple, compound, and complex repeats of STRs Simple Repeat (AGAT)(AGAT)(AGAT)(AGAT)(AGAT) compound Repeat TCTA(TCTG)2 (TCTA)5 Complex Repeat (TTTC)3(TTTT)(TTCT)(CI l l)7(TTCC)2 Simple repeats contain units of identical length. Compound repeats are composed of two or more repetitive elements. Complex repeats contain several repeat blocks of variable length, along with variable intervening sequences. Today, human identification applications use several STRs amplified simultaneously, a technique known as multiplexing. Each locus is chosen to have a narrow allele size range, to allow for multiple STRs to be analyzed all together without size overlap. This procedure reduces the processing time of STR analysis. As with VNTRs, the probability of a chance match between two persons decreases as more loci are tested. Numerous commercial STR multiplex kits have been used by the human identification field, which range from 3 to 16 STR loci per kit. Statistical Calculations Used to Evaluate S TRs The STRs chosen for human identification are used to distinguish one individual from another. Various statistical calculations are employed to determine the discriminatory power of each STR locus, including the power of discrimination (PD) and heterozygosity (He). The power of discrimination is a computation of the odds that any two people chosen at random will have different genotypes and is equal to one minus the 10 probability ofidentity (P1). P1. the sum of squares eftlie genotype frequencies. is the probability of two randomly selected individuals having an identical genotype at a particular locus (Fisher 1951). P1 is also known as the matching probability (pM) and examples are shown in Appendices A and B. The greater the power of discrimination, the lower the chance of a person‘s genotype coincidentally matching another individual's at a particular locus. A person carrying the same two alleles at a locus is called a homozygote or a heterozygote if two different alleles are present. A high level of heterozygosity is desirable in a STR locus because it indicates that more allelic diversity exists, lessening the chance of a random match. Homozygosity is tabulated by summing the squares of the allele frequencies. while heterozygosity equals one minus the homozygosity (Edwards et al. 1992). However. the utility of high heterozygosity must be balanced against the need to have a tight allele range—the size difference between the smallest and largest alleles must be relatively small and similar to the ranges of the other loci. This allows the STRs to be successfully multiplexed. DNA Databases DNA databases are a tool for solving crimes, by combining forensic science and computer technology. The Federal Bureau of Investigation laboratory‘s Combined DNA index System (CODIS) and the United Kingdom‘s national DNA database (NDNAD) contain DNA profiles collected from convicted offenders and casework samples. These are compared electronically by federal, state, and local crime laboratories in an effort to link crimes to each other or to link unsolved crimes to convicted offenders. ll CODIS was established in 1998. As of March 2007 it contained over 4.5 million entries and has assisted in over 30,000 investigations (FBI 2007). Two indices and three levels of hierarchy are utilized in CODIS. The offender index contains the profiles of convicted felons, while the forensic index has the profiles of crime scene evidence and missing persons. Genetic profiles are uploaded at the local level as they are obtained from convicted offenders or crime scene evidence. The profiles navigate from the state to the federal level, where they can be utilized for comparison. Identification of a suspect is possible when matches are found between the offender and forensic indexes. Crimes scenes can be linked together within or across multiple jurisdictions when matches are found in the forensic database, allowing investigators to share information pertaining to crime scene(s), suspect(s), lead(s), and evidence. Statistical Calculations Used in Human Identity Applications When a DNA profile (a collection of STR loci) is found to match between an evidentiary sample and a suspect, the strength of the association must be stated in the report. Different types of statistical calculations are used to describe the results of these genetic test batteries. Random match probability, utilized in forensic analysis, is the likelihood of any two people chosen at random having the same alleles by coincidence for a given STR. Two formulae are used to calculate this probability, in which p and q are the allele frequencies: p2 is used when an individual is homozygous at a locus and 2pq is appropriate for heterozygotes. For example, if an African American has the alleles l3 and 14 at locus D8Sl 179, using the frequencies from Table 1, the random match probability is 2(0.203)(0.335), or 0.136. Thus, 13.6% of African Americans have these 12 specific alleles at the locus. As more loci are added to a test battery, it becomes less likely that two unrelated people will have identical profiles. The power of exclusion (PE) is a statistic used in paternity analysis, and is analogous to the forensic random match probability. The PE is the proportion of time that a falsely accused man will be excluded from paternity (Fisher 1951). Typically, if all thirteen CODIS loci are examined, on average the falsely accused alleged father will be excluded as the father at seven to eight loci and will be included by coincidence at five to six loci (Einum and Scarpetta 2004). Table 4 - Paternity index formulae Arrangement Mother Alleles Child Alleles Alleged Father Paternity Index Alleles Formula I BD AB AC 1/2a 2 BC AB AC l/Za 3 BC AB AB l/Za 4 BC AB A Va 5 B AB AC l/2a 6 B AB AB l/2a 7 B AB A I Va 8 AB AB AC 1 l’[2(a+b)] 9 AB AB AB I lt'(a+b) 10 AB AB A l/(a+b) l 1 AB A AC l/2a 12 AB A AB i/2a 13 AB A A ha 14 A A AB l/2a J 15 A A A Na 3 The allele arrangement of the mother, child, and alleged father that is present at a particular locus will determine which paternity index formula is used. A, B, C, and D represent alleles; a and b are allele frequencies. Table from AABB 2006. F inally, the paternity index calculates the odds in favor of fatherhood. It uses one of four formulae, dependent on the arrangement of the alleles seen in the tested parties at each locus (Table 4) (AABB 2006). Most allelic arrangements use the formula 1/2a. In all instances, the maternal obligate allele (MOA)——the allele passed from mother to child, is defined. while the alleged father is heterozygous at that particular locus. In 13 cases where the alleged father is homozygous and the MOA is defined, the formula l/a is used. The formula 1/[2/(a+b)] is selected when the paternal obligate allele (POA)—the allele passed from father to child, is defined, but the MOA is ambiguous. The formulae 1/(a+b) or '/2(a+b) are utilized in the remaining ambiguous situations. Genetic Mutations As with any region of DNA, genetic mutations occur at STR loci and genotypes can change between generations (Wiegand et. al 2004). Several types of mutations can modify the nucleotide sequence of DNA. Changes at a single base pair or a few adjacent nucleotides are referred to as point mutations and are classified as substitutions, insertions, or deletions. Substitutions occur when a single base pair in the DNA is changed. An insertion or deletion occurs when one or more nucleotides are introduced into or removed from the DNA strand. Slippage mutations or slipped-strand mispairings during DNA replication (Figure 3a) have been proposed as a molecular mechanism that creates the variability seen in STRs (Levinson and Gutman 1987, Klintschar et al. 2004). The newly synthesized strand may slip or mispair causing unpaired nucleotide(s) to loop out of the strand while nucleotides beyond the loop continue to replicate (Figure 3b). This results in the addition of nucleotides to the newly synthesized strand, or an insertion (Figure 3d). In contrast, the template strand may slip or mispair during replication (Figure 3c) causing unpaired nucleotide(s) to loop out of the strand while nucleotides beyond the loop continue to replicate, causing a deletion (Figure 3e). 14 ”1 so 4"“ "f ' W w --‘ ' "“T’. AIHINF‘ S‘y'iil‘fi( 1"” 5’. ./’-\ (c '2‘ *1. ’-. H 73’. TCITIfTTCAGGI‘."rGAA, 5’ iln'llliiiiilllllL a DNA tQDIICSIICJ‘! at mom, recreates repeat gr II 'I I I! 2C ,/L-" \ /" >5- / “‘~\ “A "4| \A. l I n.-. aura-an. v Tar-7A,..-“ r ‘I r l T'I triniiti 5' ACAAAA U 1“". ., z: TI’VTT“'T{'\A"\‘T-xr‘ '1' .a r; . , H . A I , . A ”J 3' .terr'rTTCAGGertsrA s “I Lip-L“. 9:,fi,.L?-I lisrrtiririrrirr; ““734" ""‘ " a“ T b 31-1-wasyu'rreszee strand stzos. Ki.) H II c lerncimte strand slips. I H I I r if I V l. .I' ; \J’ 7‘...“ 93."; {IA-all;- fi‘wfif‘flh‘h‘r‘ ' _: u 'I . . 1",” SI) "‘4' Hr . 8.5M, ,, ....,. . , AunAA \u r .L‘Lfi r (:9 . I ,I ’ I ’ 3 III FT“! ADI'JAsVntt‘ess 3'. 11"T VTCA‘JGW LIZTGAA. ".- 5 (HA An‘UTCCATCAC I: 'L;,I;1'|',t;,, e, {MMTIWAGGTAGT AA 5' 7‘3 igtllJILlllL-[tllll ‘ g. d New}; S‘yttll‘tiisilec: SI’H'VJ can 08 an insertion mutation. e New-'33.. .iyt';-.i'tt-'.‘SI/.9‘.I sitar-.9: natures -..; rtetetrgg: Ir”u,riai:!:r‘- Figure 3 e Slippage mutations , (3) During DNA replication (b) the new synthesized strand slips creating (d) an insertion mutation, elongating the DNA strand by one base pair or (c) the template strand slips causing (e) a deletion mutation, shortening the DNA strand by one base pair. Figure from Fairbanks and Anderson 1999 Structural Polymorphisms ofSTR Loci The examination of length polymorphisms is widely used in the human identification field to resolve forensic and paternity cases. However. size analysis alone does not differentiate alleles of equal length with different DNA sequences. Structural polymorphisms of STR alleles are defined as having the same base pair length. while differing in the composition of the repeats in the variable regions. Even though they are known to exist. the human identity field does not currently examine STR alleles to this level of detail. In theory, their examination could provide a second level of discrimination in human identity applications. The allele frequencies would be sub- categorized, increasing the power of discrimination or paternity index. Structural polymorphisms have been identified at the highly characterized C ODIS loci of F GA (Barber et al.1996, Griffiths et al. 1998), vWA (Brinkmann et al. 1996a, Griffiths et al. 1998), D381358 (Szibor et al. 1998, Momhinweg et al. 1998), and D2181 l (Schwartz et al. 1996, Griffiths et al. 1998, Zhou et al. 1997, Brinkmann et al. 1996b). FGA has a complex repeat structure of (TTTC)3 TTTT TTCT (CTTT)n CTCC (TTCC)2. The only two known structural variants are at allele 27, in which 27a has the sequence (TTTC)3 TTTT TTCT (CTTT)19 CTCC (TTCC); This was observed in a study that detailed sequences of 22 F GA alleles to establish their repeat unit structure and assign allele designations (Barber et al.1996). Griffiths et al. (1998) characterized the second structural polymorphism, 27b, as (TTTC)3 TTTT TTCT (CTTT)13 CCTT (CTTT)5 CTCC (TTCC)2. These differ in that 27a has a tetranucleotide of CTTT that repeats 19 times while 27b has a single base substitution from T —> C at the nineteenth repeat to produce a C CTT structure. Eight structural polymorphisms have been observed at the VWA locus, two each for alleles 13, 15, 16, and 18 (Table 5). The locus has a compound repeat structure of TCTA (TCTG)3-4 (TCTA)n. Structural polymorphism 13a and 13b differ in two respects, the first of which is found at the beginning of the sequence, in which 133 has an insertion of TCTA. The second is a T —+ C substitution in 138 that occurs at the fourth tetranucleotide repeat of TCTA, thus converting it to TCCA. Structural polymorphisms 16 15a and 15b differ by the proportion of TCTG and TC TA repeats, which is also the case for 16a, 16‘, 188, and 18“. Table 5 - Known structural ptgymorphisms at the VWA locus Structural Repeat Structure Reference Polymorphism [TCTA]|TCTG|3-4[TCTA].. 13a (TCTA); (TCTG)4 (TCTA)3 TCCA (TCTA)3 Griffiths et al. 1998 13b TCTA (TCTG)4 (TCTA)3 TCCA TCTA Brinkmann et al. I996a 15a TCTA (TCTG)4 (TCTAIIO TCCA TCTA Brinkmann et al. I996a 15b TCTA (TCTG)3 (TCTG) .1 TCCA TCTA Brinkmann et al. 1996a 163 TCTA (TCTG)4 (TCTA)H TCCA TCTA Brinkmann et al. 1996a 16b TCTA (TCTG)3 (TCTA‘m TCCA TCTA Brinkmann et al. 1996a 1821 TCTA (TCTG)4 (TCTA)13 TCCA TCTA Brinkmann et al. l996a 18b TCTA (TCTG)5 (TCTA)|2 TCCA TCTA Brinkmann et al. I996a Alleles 13, 15, 16, and 18 each have two structural polymorphisms and their respective sequences are shown. Table 6 - Known structural polymorphisms at the D3Sl358 locus. Structural Repeat Structure Reference Polymorphism TCTA (TCTG)2-3 (TCTA)n 15a TCTA (TCTG)3 (TCTA)11 Szibor et al. 1998 ISb TCTA (TCTG)2 (TCTA)|2 Szibor et al. 1998 16“ TCTA (TCTG)3(TCTA)12 Szibor et al. 1998 16” TCTA (TCTG)2 (TCTA)13 Momhinweg et al. 1998 1.73 TCTA (TCTG)3 (TCTA)13 Szibor et al. 1998 17" TCTA (TCTG)2(TCTA)14 Momhinweg et al. 1993 Alleles 15, 16, and 17 each have two structural polymorphisms and their respective sequences are shown. 17 Six structural polymorphisms have been characterized at the D38 1 358 locus (Table 6), two each in alleles 15, 16, and 17. The locus has a compound repeat sequence with a basic motif of TCTA (TCTG)2-3 (TCTA)... The sequences of 15“, 15b, 16‘, and 17a were identified in a human population study from the northern and southern regions of Germany (Szibor et a1. 1998). In a separate study, structural polymorphisms 16b and 17b were found to differ in the number of TCTG and TCTA repeats (Momhinweg et al. 1998). D21S11 has a complex tetranucleotide repeat sequence of three variable regions; I (TCTA)n, II (TCTG),,, and III (TCTA),,, along with a constant region of 43 base pairs inserted between regions II and III, and its repeat structure is (TCTA)n (TC TG)n {(TCTA)3 TA (TCTA)3 TCA (TCTA); TCCATA} (TCTA)... Structural polymorphisms have been observed at 15 alleles of D2181 1 (Appendix A). These were identified in studies of allelic ladders (Griffiths et al. 1998), while investigating length and structural variations (Mollier et a1. 1994, Brinkmann et al. 1996b, Walsh et al. 2003), and in population studies (Schwartz et al. 1996, Bagdonavieius et al. 2002, Zhou et al. 1997). Known structural polymorphisms for most D21S11 alleles differ in the number of repeats in each variable region, the sole exception being 322°, which has only two tetranucleotide repeats of TCTA in the first repeated sequence of the constant region. Alleles 37.2, 38.2, and 39.2 show a 14bp deletion from the constant region producing a repeat motif of (TCTA)n (TCTG)n (TCTA)3 TCA (TCTA)2 TCCATA} (TCTA)n (Walsh et al. 2003). 18 Structural Polymorphism Research for Human Identity Testing Despite the numerous structural variants identified across many loci utilized for forensic identification, no research has characterized all of the structural polymorphisms at a single STR locus. Therefore, population frequency data do not exist for any of the STR structural variants. The goals of this study were to examine structural polymorphisms for allele 30 of D2181] to determine what variants exist, to establish their relative frequencies, and to investigate whether the structural alleles have a distribution that is not dominated by one allele. The statistical calculations used in human identification applications were re-examined after the frequencies of the structural variants were established. D21 S11 was chosen because of its extremely polymorphic nature, and the 30 allele was examined because four known structural variants have already been reported (Schwartz et al. 1996, Griffiths et a1. 1998, Zhou et al. 1997, Brinkmann et al. 1996b), suggesting the potential existence of still more variants. This study consisted of sequence analysis of one hundred and eighty six samples collected from African American and Caucasian individuals in North America. Structural variants were identified, their relative frequencies established, and statistical analyses of the power of discrimination, heterozygosity, and the power of exclusion were recalculated incorporating the frequencies of the structural variants. l9 MATERIAIS AND METHODS D2151] Sample Identification Upon prior review and approval from the University Committee on Research Involving Human Subjects (UCRIHS) at Michigan State University. a list of DNA profiles from unrelated individuals was generated through a database query of Orchid C ellmark’s paternity testing laboratory. Each candidate was from a Caucasian or African American individual in North America and was required to be a heterozygote at the D2181 1 locus with the 30 allele being the smaller of the two alleles. Allele Isolation DNA samples were PC R amplified with primers synthesized by Integrated DNA Technologies, specific to the D2181 l locus. The primers sequences and cycling conditions were provided by the National Institute of Standards and Technology through Margaret Kline (personal communication) and are displayed in Table 7. Primer stocks were diluted to a concentration of 9.8pM/uL. The PCR amplification reaction contained 18.75ng of genomic DNA, l4pM of forward and reverse primer, 1.25 units of Taq polymerase (Invitrogen), and 5.011L of a reaction buffer, provided by Orchid C ellmark, containing Tris-HCI, KCI, BSA, MgClg, and dNTPs, for a total reaction volume of 28uL. PCR amplification was carried out in a MJ Research PT-225 Tetrad thermal cycling block with an initial step at 95° C for ten minutes, then 35 cycles 0f 94° C for 30 seconds. 59° C for one minute, and 72° C for one minute. followed by a final extension step at 60° C for 45 minutes. The amplified products were resolved by 4% polyacrylamide gel electrophoresis, with a Life Technologies 82 sequencing gel electrophoresis apparatus. The D2181] alleles were visualized with Cambrex SYBR-Green 1 Nucleic Acid Gel Stain and a UV transilluminator and the 30 allele was identified as the smaller band. D2181] 30 alleles were excised using a disposable scalpel and placed in 2mL microcentrifuge tubes containing 150uL of deO. The gel slices were allowed to incubate for at least 1 hour at room temperature, followed by centrifugation at 14,000 rpm for 30 seconds. Serial dilutions of 1/10‘1 to 1/10'5 of the supernatant were prepared and amplified to determine which dilution produced adequate amounts of the target amplicon without nonspecific product formation. Serial dilutions of 1/104 and 1/10'5 typically produced the least nonspecific product and were used for sequencing, after being purified with a QiaQuick column (Qiagen) and concentrated into 30uL of H20. Table 7 - D2181] primer sequences used for PCR and seflence reactions. Forward 5’ — CCA GCT TCC CTG ATT CTTA CA -— 3’ Reverse 5’ — CAC TGA GAA GGG AGA AAC ACT G — 3’ The primers sequences were provided by the National Institute of Standards and Technology through Margaret Kline. DNA Sequencing Seventy one of the PCR products were sequenced on a Beckman Coulter C EQ 8000 Genetic Analyzer. Forward and reverse sequence reactions were prepared for each PC R product, with the same primers used for PCR. A dye terminator cycle sequence reaction was performed following the Beckman Coulter Quick Start Kit standard operating procedure. The reactions were stopped by addition of 2.511L of Stop Solution/Glycogen mixture, containing 211L of 3M Sodium Acetate, 2pL of lOOmM Naz- EDTA pH 8.0, and 1 11L of 20mg/mL of glycogen, followed by 6011L of 95% cold ethanol 21 in a sterile 1.5m1 centrifuge tube. liach sample was mixed tl‘ioroughly and centrifuged at 14000 rpm for 15. minutes. The supernatant was removed and the pellet was rinsed twice with 100p]- of 70% cold ethanol, followed by immediate centrifugation at 14,000 rpm for three minutes after which the pellet was vacuum dried for30 minutes. DNAs were re- suspended in 40p]. of Sample Loading Solution, provided in the Quick Start kit, and sequencing was conducted on a CEQ 8000, with the following conditions: injection voltage of 2.0 kV. injection duration of 120 seconds. separation voltage of 4.2 kV, and separation duration of 60 minutes. I Quantification of the remaining 1 15 PC R products was performed with a spectrophotometer. From these results, lOng of PCR product was added to a 12111- sequence reaction volume containing ngO and 30pM of primer. The reactions were submitted to Michigan State University Research Technology Support Facility Genomic Core, which utilized an Applied Biosystems Prism 3730xl capillary sequencer and a BigDye® Terminator v3.1 Cycle Sequencing Kit. DNA Sequence Analysis The raw sequences were analyzed with either CEQ 8000 or ABI Prism 3700 analysis software and compiled in BioEdit Biological Sequence Alignment software (Hall 1999). Forward and reverse sequences for each sampleiwere aligned and binned according to the structural polymorphism observed. Statistical Analysis A chi-square test was performed to investigate whether the distribution of the structural polymorphisms in African Americans differed significantly from the distribution of the structural polymorphisms in Caucasians. The published D2181] 30 allele frequencies for African Americans and Caucasians are 0.187 and 0.258 respectively (Einum and Scarpetta 2004). The structural variants’ allele frequencies were calculated by multiplying the number of times each variant was observed by their respective D2181] 30 allele frequency. For example, if the allele frequency for an African American structural variant was observed to be 17.43%, that structural polymorphism allele frequency would be (0.187)*(O.1743) or 0.0326. Power of discrimination, power of exclusion, and heterozygosity calculations were performed with published D2181] allele frequencies (Einum and Scarpetta 2004), then recalculated using the structural polymorphisms allele frequencies for comparison against the original published calculations. Formulae for statistical calculations can be found in Table 8. A hypothetical statistical analysis of discrimination, power of exclusion, and heterozygosity at the D2181] locus was also performed to evaluate the statistical change in these variables if structural polymorphism frequencies for all alleles at the locus were known. No structural variant allele frequencies are known for the other D2181] alleles; therefore hypothetical African American structural polymorphism allele frequencies were used, with the assumption that the structural polymorphism allele frequencies would have a diverse distribution. In cases where only two structural polymorphisms have been identified, the frequencies of 75% and 25% were used. Frequencies of 60%, 30%, and 10% were used where three structural variants were known. Structural allele frequencies 23 found in this study for D218] 1 30 allele of 57%, 1796, 14%. and 12% were used where four have been recognized. Table 8 - Equations used in the statistical analysis of D2181] allele 30 structural polymorphisms Statistical Test Equation Chi Square Test where: f 2 f“ is the observed frequency 2’ 2 = Z 370— — n e f? is the expected frequency n is the number of samples Power of Discrimination where: P is frequency ofith allele in a population ofn 1 samples Heterozygosity where: H = l _ 2 P12 P is fre uencv of i‘h allele in a o ulation ofn q . P P 1 samples Power of Exclusion where: PE : H 2 (1 _ (1 — H )H 2 ) H is the Heterozygosity A chi square test was used to evaluate if a significant difference existed between proportions of observed frequencies of structural polymorphism for African American and Caucasian. Power of discrimination was used to calculate the odds that any two people chosen at random will have different genotypes. Heterozygosity was used to calculate the proportion of heterozygous individuals in the population. The power of exclusion was used to calculate the proportion of time that a falsely accused man will be excluded from paternity. Equations from Butler 2005 RESULTS D2151] Allele 30 Structural Polymorphism Distribution A total of 186 samples were sequenced and analyzed at allele 30 of the D218] 1 locus. Forward and reverse sequences were obtained in all cases, for a total of 372 sequences. two hundred and eighteen of which were derived from unrelated African Americans and 154 from unrelated Caucasian Americans. The four previously described structural polymorphisms (303, 30", 30C, and 30“) as well as three novel polymorphisms (30¢, 30f, and 30g) were observed (Table 9). The novel structural variants are different by the number of repeats in the variable regions; all had the same constant region as the previously reported structural polymorphisms. Table 9 - Structural polymorphisms observed at allele 30 of the 021811 locus. Structural Repeat Structure Reference Polymorphism |TCTA|n|TCTG]n{|TCTA]3TA|TCTA|3 TCAITCTAlzTCCATAHTCTAIn 30a [TCTA]4[TCTG]6{43 bp}[TCTA]12 Schwartz et al. (1996) 30" [TCTA]5[TCTG]6{43 bp}[TCTA]1 1 Zhou et al. (1997) 30c [TCTA]6[TCTG]5{43 bp}[TCTA]H Griffiths et al. (I998) 30‘I [TCTA]6[TCTG]6{43 bp}[TCTA]10 Brinkmann et al. (I996b) 30e [TCTA]4[TCTG]7{43 bp}[TCTA]H Novel Allele 30T [TCTA]5[TCTG]5{43 bp}[TCTA]12 : Novel Allele 30g [TCTA]5[TCTG]7{43 bp}[TCTA]|0 Novel Allele Seven structural polymorphisms were observed in this study. three of which were novel. All seven have the same length, but differ by the number of repeats in each variable region. All have the same constant sequence. 25 The observed frequency ofthe structural vanants is shown in Table 10. Six structural polymorphisms were found in African Americans (30“. 30b. 30“. 30“. 30C and 30"). while five were observed in Caucasians (3 ". 30b. 30C. 30". and 30%). Structural polymorphism 30b was the most frequent in African Americans, appearing 56.88% of the time. in comparison to 12.99% in Caucasians. In contrast, structural variant 30C appeared 53.25% of the time in Caucasian, compared to 1 1.93% in African Americans. Novel variant 308 was observed three times (2.75%) and 30f once (0.92%) in African Americans. while not being found in Caucasians. Novel structural polymorphism 30g was observed once (1.3%) in Caucasians and was not observed in African Americans. Table 10 - Frequency of structural polymorphism for allele 30 at D2151 1. . l ; | African American : Caucasian ;. '. 1 Structural n Observed l Structural i n f Observed Structural l Polymorphism Frequency Polymorphism I 1 Frequency Polymorphism Allele Frequency? 1 Allele Frequency 1 308 19 . 0.1743 0.0326 1 19 ' 0.2468 0.0037 .r J l 1 1 30b ; 62 T 05688 0.1064 1 10 0.1299 0.0335 .L l 1 30C i 13 0.1193 0.0223 1 41 0.5325 1 0.1374 1 l : r 2 30“ 3 11 0.1009 0.0189 1 6 ; 0.0779 1 0.0201 ‘- ' % l 4 a 308 i 3 0.0275 . 00051 1 i 3 . r 30' 1 1 0.0092 0.0017 1 g i i 30% .. l 1 ‘- 0.0130 ‘ 0.0033 Total 109 ' 1.0000 0.1870 1 77 1.0000 0.2580 A total of l86 structural variants were observed. 109 in African Americans and 77 in Caucasians. The D215] 1 30 allele frequencies are 0.187 for African Americans and 0.258 for Caucasians. and the structural polymorphisms’ allele frequencies sum to these published allele frequencies (Einum and Scarpetta 2004). To obtain each structural polymorphism’s allele frequency the observed structural polymorphism frequency was multiplied by its respective D2181] 30 allele frequency. n = number of occurrences. indicates no occurrences. Distribution A nulysis of l-"ariam'e A chi-square test result below the critical value signifies the structural polymorphisms distributions for African Americans and Caucasians are similar. while a 26 greater value indicates they are different. The chi square value was 50.9869 and was greater than the critical values at a level of significance of 0.995. Thus. the distribution of structural polymorphisms in African Americans. differed significantly from Caucasians and is noticeable by each distribution having different variants being the most frequent. Power of Discrimination Analysis Incorporating Structural Variant F requencies The power of discrimination was calculated with and without the structural polymorphism allele frequencies (Table 11). The established power of discrimination at the D2181] locus in African Americans is 0.961 and in Caucasians is 0.955 (Einum and Scarpetta 2004). After accounting for the structural polymorphism frequencies, the power of discrimination increased by 0.011 (1.1%) to 0.972 in African Americans and by 0.020 (2.0%) to 0.975 in Caucasians (Appendix B). Heterozygosity Analysis Incorporating Structural Variant Frequencies A heterozygosity analysis was performed incorporating the structural polymorphisms (Table 11). The expected heterozygosity at the D21S11 locus for African Americans is 0.850 and for Caucasians is 0.839 (Einum and Scarpetta 2004). This value increased by 0.022 (2.6%) to 0.872 in African American and by 0.042 (5.0%) to 0.881 in Caucasian after including the structural polymorphism allele frequencies in the calculation. Power of Exclusion Analysis Incorporating Structural Variant Frequencies The power of exclusion analysis was reanalyzed including the structural 27 polymorphisms data (Table 1 1). The power of exclusion at D21 S1 1 is 0.706 in African Americans and 0.685 in Caucasians (Einum and Scarpetta 2004). The result increased by 0.045 (6.4%) to 0.751 in African Americans, and by 0.079 (1 1.5%) to 0.764 in Caucasians. after incorporating the structural polymorphism allele frequencies (Table 11). Table 1] - Comparison of descriptive statistics with and without structural polymorphisms African American Caucasian Without Structural j With Structural Without Structural With Structural Polymorphisms Polymorphisms Polymorphisms Polymorphisms PD 0.961 0.972 0.955 0.975 He 0.850 0.872 0.839 0.881 PE 0.706 0.751 0.685 0.764 Power of discrimination (PD), power of exclusion (PE), and heterozygosity (He) results with and without the structural polymorphism allele frequencies. All descriptive statistics increased when structural polymorphisms were included. Hypothetical Statistical Analysis of Power of Discrimination. Power of Exclusion, and Heterozygosity Table 12 shows the comparison of the power of discrimination, heterozygosity. and power of exclusion without structural variant frequencies, with only the structural polymorphisms frequencies of allele 30, and with hypothetical structural variant frequencies from all alleles at the D21S11 locus. The descriptive statistics for D2181 1 showed greater discriminatory capability with inclusion of allele 30 structural polymorphism data. In comparison to the published power of discrimination. the hypothetical result increased by 0.027 (2.8%) to 0.988 when incorporating the hypothetical structural polymorphism data (Table 12. Appendix C). The heterozygosity increased by 0.067 (7.8%) to 0.917. and the power of exclusion increased by 0.131 (18.5%) to 0.837, with the hypothetical structural polymorphism frequencies. Table 12 - Comparison of statistical analysis with and without structural variants African American Without With Structural Polymorphisms With hypothetical D2181 1 Structural From Allele 30 at D2181 1 Structural Polymorphisms Alleles Polymorphisms PD 0.961 0.972 0.988 He 0.850 0.872 0.917 PE 0.706 0.751 0.837 Results for power of discrimination (PD), expected heterozygosity (He), and power of exclusion (PE) with and without structural variant allele frequencies in African Americans. Published African American allele frequencies were used for all D2181] alleles (Einum and Scarpetta 2004). Hypothetical values were assigned to each structural variant, with the observed subdivision of allele 30 as a guide, and assuming that all structural polymorphisms display diverse distributions. In cases where only two structural polymorphisms are known, the frequencies of 75% and 25% were used, 60%, 30%, and 10% where three structural variants were known, and 57%, 17%, 14%, and 12% were used with four. All descriptive statistics increased when the hypothetical structural polymorphism frequencies were included. 29 DISCUSSION Structural polymorphisms have been identified at four CODIS loci (Barber et a1. 1 996, Griffiths et al. 1998, Brinkmann et al. 1996a, Szibor et a1. 1998, Momhinweg et a1. 1998, Schwartz et al. 1996, Zhou et a1. 1997, Brinkmann et al. 1996b). Since these studies were not specifically designed to discover all variants, it is likely that more exist, which is also evident in research of structural polymorphisms in non CODIS loci (Urquhart et al. 1993, Mdllier and Brinkmann 1994, Rolf et al. 1997). Structural polymorphisms are not used for human identification because STRs provide enough information to resolve most cases of human identity, but could assist in cases that are not conclusively resolved, particularly those in which only a few STRs can be obtained. The aim of this study was to examine structural polymorphisms and assess their ability to assist in casework that cannot be resolved by STR length analysis alone. The research was designed to catalogue structural variants for allele 30 at locus D2181] to determine if additional variants exist beyond the four already characterized and whether a distinct racial distribution is present. Furthermore, the results allowed for the establishment of relative frequencies for the structural polymorphisms and an examination of their effect on probative statistical parameters in human identification. The Origins of Structural Polymorphism The human species began in Africa and expanded throughout the world (Cann et al. 1987). Therefore the ancestral D2181] allele—the allele from which all alleles were derived, would be in the oldest and most diverse population (Wiengand et. a1 2000). This 30 can be seen in the allelic variability in African Americans, as the most common allele at the D218] 1 locus is 28, with a frequency of 24%, and is most likely the progenitor allele. Furthermore, if the most common allele is the ancestral allele, then the most common structural variant is likely the progenitor sequence. Two structural variants are known at allele 28 in African Americans and the most frequent has the greatest chance to mutate and produce other sequences, particularly the most common sequence variants of other alleles. For the 30 allele studied here, structural polymorphism 30b is most common in African Americans and could possibly be created from the progenitor sequence variant at allele 28. Two insertion mutations are necessary for allele 28 to transition to allele 30. the first to create allele 29 and the second to produce allele 30. Table 13a shows an insertion mutation of TCTA in the first variable region (VRI) of 288, which creates a sequence variant for allele 29 that is unknown, followed by an insertion mutation of TCTA in the third variable region VRIII to produce 30b. On the contrary, if the mutation occurred in VRIII first, then in VRI, allele 283 would transition to 29a and then to 30b (Table 13b). Additionally, two separate insertion mutation of TCTA in VRIII of 28b would generate 30b (Table 13c), which creates a structural variant for allele 29 that is unknown. The likely primitive sequence can be more conclusively determined once sequence variant frequencies for allele 28 are known. As structural polymorphism allele frequencies are determined for other D218] 1 alleles. other mutational pathways could be proposed. 31 Table 13 - Mutational pathways from ancestral allele a Mutation Pathway ] h Mutation Pathway 2 28‘”l (TCTA), (TCTG)..~:43bpt (TCTA).0 28“ (TCTA): ('TCTG)(.{43bp} (TC-TAM Insertion TCTA (VRI) Insertion TCTA (VRIII) 29(7) (TCTA 1, (TCTG).,{43bp} (TCTA)1o 29a ('1‘CTA)4(TCTG)6{43bp} (TCTA).. Insertion TCTA (VRIII) Insertion TCTA (VRI) 30" (TCTA), (TCTG)6{43bp} (TCTA)., 30b (TCTA), (TCTG).t43bp} (TCTA).. - . . 30b is the most common structural variant and c Mutation Pathway 3 is most likely closely associated with the ancestral allele, which is allele 28. There are 28b (TCTA)5 (TCTG).,-{43bp} (TCTA)9 three mutational pathwaysb from structural variants 28a and 28b to 30 . a) Insertion - mutation of TCTA in VRI of 28' followed by an Insemon TCTA (VRIII) insertion mutation of TCTA in VRIII would (71 , _ . produce 30". b) Insertion mutation of TCTA in 29 (TCTAl5 (TCTG).,{43bp} (TCTAM VRIII of 28a followed by an insertion mutation of TCTA in VRI would produce 30". c) Two Insertion TCTA (VRIII) successive insertion mutation of TC TA in VRIII of 28‘l would produce 30". Note mutational 30b (TCTA), (TCTG)(,{43bp} (TCTA)11 pathways ] and 3 produce a 29 allele that is unknown. The variability of alleles and their sequences started from mutational events on the ancestral allele, from which more alleles and their sequence variants are created. The variability seen in allele 30 was likely created from mutations on the 30 structural variant sequences or from other structural polymorphisms of other alleles. This study revealed seven allele 30 structural polymorphisms, of which three were novel and all were different in the composition of repeats in the variable regions. It is feasible that some of the 30 allele structural polymorphisms could have been derived from some other 30 structural variant. A single substitution mutation of A—+G in the last base of the last tetranucleotide repeat of VRI can transition 30", 3 c. 30". and 30f to 30°, 30b, 30g, and 308 respectively (Table 14). Alternatively. a single substitution mutation of G——>A in the last base of the first tetranucleotide repeat of VRII can transition 30“, 30". 306. and 30g to 30f. 30?. 30b, and 30d respectively (Table 15). Table 14 - Single base substitution mutation of A—+G creating structural variants Structural polymorphism Substitution Structural Polymorphism 30b [TCTA]5[TCTG]6{43bp} [TCTA]11 Ago 30c [TCTA]4[TCTG]7{43bp}[TCTAII1 30c [TCTA]6[TCTG]5{43bp} [TCTA]11 A—’G 30b [TCTA]5[TCTG]6{43bp}[TCTA]11 30a [TCTA]6[TCTG]6{43bp} [TCTA]10 A“’G 30g [TCTA]5[TCTG]7{43bp}[TCTA]10 30r [TCTA]5[TCTG]5{43bp} [TCTA]12 A_’G 30a [TCTA]4[TCTG]6{43bp}[TCTA]12 A single base substitution, of A—>G, on the last base pair of the last tetranucleotide repeat of VRI in some 30 structural variants can produce other 30 allele structural polymorphisms. Each row shows a 30 variant be created by a different 30 polymorphism. Table 15 - Single base substitution mutation of G—>A creating structural variants Structural Polymorphism Substitution Structural Polymorphism 30a [TCTA]4[TCTG]6{43bp}[TCTA]12 G—’A 30r [TCTA]5[TCTG]5{43bp}[TCTA112 30b [TCTA]5[TCTG]6{43bp}[TCTA]1 1 G—’A 30c [TCTA]6[TCTG]5{43bp}[TCTA]11 [—30e [TCTA]4[TCTG]7 {43bp} [TCTA]11 G—’A 30b [TCTA]5[TCTG](,{43bp}[TCTA]1 1 / 3Og [TCTA]5[TCTG]7 {43 bp} [TCTA]|0 0‘4!“ 30d [TCTA](,[TCTG](.{43bp} [TCTA]10 A single base substitution, of G—rA, on the last base pair of the first tetranucleotide repeat of VRII in some 30 structural variants can produce other 30 allele structural polymorphisms. Each row shows a 30 variant be created by a different 30 polymorphism. Following the theory that slippage mutations create the genetic variability seen today, it would take one slippage mutation on allele 29 or 31 to derive most of the 30 variants. A mutation resulting in an insertion of a tetranucleotide of TCTA in the VRI of 293 produces 30b (Table 16). An insertion of TCTG in VRII in 29a and 29b would create 306 and 30d respectively. Insertions of TCTA in VRIII in 29a and 29h create 30a and 30c. Slippage mutations resulting in a deletion of TCTA in VRI-of3 1“, 31b, 31C, and 31d 33 Table 16 - Insertion mutation in structuralyariants of allele 29 Structural polymorphism Insertion Structural Polymorphism (VR) 29a [TCTA]4[TCTG]6{43bp}[TCTA]11 TCTA 30b [TCTA]5[TCTG]6{43bp}[TCTA]I1 (VRI) 29a [TCTA]4[TCTG](,{43bp}[TCTA]H TCTG 30e [TCTA]4[TCTG]7{43bp}[TCTA]11 (VRII) 29"1 [TCTA]4[TCTG]6{43bp}[TCTA]H TCTA 30a [TCTA]4[TCTG]6{43bp}[TCTA]12 (VRIII) 29 1TCTA].[TCTG],{4315p}[TCTA]In (TCTUG 30d [TCTA]6[TCTG]6{43bp}[TCTA]10 VR) 29" [TCTA]6[TCTG]5{43bp}[TCTA]10 TCTA 30C [TCTA]6[TCTG]5{43bp}[TCTA]11 (VRIII) Each row shows a separate tetranucleotide insertion from a slippage mutation. The first column indicates the starting 29 sequence variant. The second column shows the tetranucleotide sequence that was inserted and the variable region (VR) in which it occurred. The third column shows the resulting 30 variant. Table ]7 - Deletion mutation in structural 1 ariants of allele 3] Structural polymorphism Deletion Structural Polymorphism (VR) 31a [TCTA]5[TCTG]6{43bp} [TCTA]12 ($111]; 308 [TCTA]4[TCTG]6{43bp}[TCTA]12 31a [TCTA]5[TCTG]6{43bp} [TCTA]12 (TERTHG) 30f [TCTA]5[TCTG]5{43bp}[TCTA]12 31a [TCTA]5[TCTG]¢,{43bp} [TCTA] .2 (11%;?) 30b [TCTA]5[TCTG]6{43bp} [TCTA]11 3? [TCTA161TCT015{43bp} [TCTA112 {51:3 30‘ [TCTA]5[TCTG]5{43bp}[TCTA]12 31b [TCTAloITCTGls {43bp} [TCTA112 ($313) 30C [TCTA]6[TCTG]5{43bp}[TCTA]11 31C [TCTA]6[TCTG]6{43bp} [TCTAJH {51:3 30b [TCTA]5[TCTG]6{43bp}[TCTA]1 . 31° [TCTAloITCTG]e{43bp} [TCTA111 (T613310) 30c [TCTA]6[TCTG]5{43bp}[TCTA]11 3 lc [TCTAIoITCTGlot43bp1 [TCTA11 1 ($33) 30d [TCTA]¢,[TCTG]6{43bp}[TCTA]10 31d [TCTA]7[TCTG]5 {43bp} [Term]. {513; 30c [TCTA]6[TCTG]5{43bp}[TCTA]11 Each row shows a separate tetranucleotide deletion from a slippage mutation. The first column indicates the starting 29 sequence variant. The second column shows the tetranucleotide sequence that was inserted and the variable region (VR) in which it occurred. The third column shows the resulting 30 variant. 34 would produce 308, 30f. 30b. and 30C respectively (Table 17). Deletion of TCTG in VRII in 31a and 31c creates 30r and 30c. A deletion in VRIII ofTCTA in 313, 31b, and 31c produces 30b, 30°, and 30d. Statistical Analysis o/‘Power of Discrimination, Power of Exclusion, and Heterozygosity No previous research has been designed to investigate the racial distribution of structural variants, calculate their frequencies, and assess their potential application toward human identification. The observed frequencies of the structural polymorphisms at allele 30 show that a distinct distribution. exists. This characteristic can be used with additional STRs and/or structural variants to distinguish individuals in human identification, which can provide additional discriminatory statistical power. The descriptive statistical parameters used in human identification were recalculated incorporating this study’s D2181] 30 allele data. The power of discrimination. heterozygosity, and power of exclusion values all increased when the structural polymorphism allele frequencies were taken into account. Replacing the established 30 allele frequency with the structural variants’ frequencies is essentially adding more categories (alleles) to the D2181] locus, thereby strengthening these statistical parameters. However, applying structural polymorphism allele frequencies from only one allele at a STR locus gave only a minor improvement of 1% to 2% in discriminatory power. This was expected as the structural variants are a subdivision of an allele. If structural polymorphism frequencies for all alleles at a locus were known. then the minor increments found for one allele would add up to a large improvement in the discriminatory power for that particular locus. 35 To estimate if the minor increments add up to a large improvement, a hypothetical statistical analysis of power of discrimination, heterozygosity, and power ofexclusion was performed on the D2181] locus using all alleles known to have variants. As expected, all the results increased, with the largest impact being on the power of exclusion. Adding structural variants to the power of discrimination adds more possibilities to discriminate individuals; in the case of the 30 allele, seven additional ways. Meanwhile, incorporating all structural polymorphisms in the power of exclusion adds 21 more combinations to exclude an alleged father at the 30 allele, which provides a larger improvement than what is seen in the power of discrimination (Table 12). Therefore, adding more categories (alleles), especially among the more common alleles. allows for a substantial improvement in the descriptive statistics. Identifying structural variants and determining their frequencies would clearly increase an STR locus and a battery of STRs discriminatory power. Assisting Human Identification S T R Testing Applications with Structural Polymorphisms A problem facing the forensic community is obtaining viable STR information from degraded samples (Butler et a1. 2003), often associated with cold cases. A few STR markers may be obtained from a degraded sample, but this gives a relatively low power of discrimination, resulting in a weak statistical match. Adding and analyzing structural polymorphisms could strengthen these types of cases, by supplying a second level of discrimination. If structural polymorphisms exist at an allele, sequence analysis could be performed to determine which variant is present and resolve whether the sequences match between the suspect and the evidentiary samples. Non-matching sequences can eliminate 36 suspects. while matching sequences would increase the power of discrimination, thereby decreasing the random match probability. In parentage testing, it is an accepted standard that two genetic inconsistencies between a child and the tested parent are required to conclusively exclude an individual from being the biological parent (AABB Standards 7‘h edition). However, the majority of paternity laboratories are conservative and require three exclusions before issuing a conclusive report (Scarpetta, personal communication). Paternity cases with only two exclusions at 13 or more STR loci are frequently observed, especially in cases where the mother is not tested (AABB Standards 7‘h edition). If a known structural polymorphism exists at the paternal obligate allele, sequence analysis could be performed to determine which structural variant is present, and possibly provide a third conclusive exclusion. Occasionally paternity laboratories have to work with poor quality DNA samples. in which only a few STR loci can be identified. Therefore, a low power of exclusion and combined paternity index is obtained in inclusion cases, or an insufficient number of exclusions may be seen. If structural polymorphisms exist at a paternal obligate allele, sequence analysis could be performed. This would increase the power of exclusion and combined paternity index, or provide a sufficient number of exclusions, which could convert an inconclusive report to a conclusive one. Samples from mass disasters are usually degraded from fire, water, and other environmental conditions. To identify individuals in these unfortunate events, techniques from both forensics and paternity are employed. A known sample of DNA can be obtained from a toothbrush, hair brush, or razor for an individual suspected to have been caught in the disaster. These sample DNA profiles can be compared to unknown samples 37 found at the scene. In addition, if a known sample cannot be obtained, samples of relatives can be obtained for a paternity, grandparent, or family study test. Structural variants can assist in identifying individuals in mass disasters, by employing the same methods described above for degraded samples in forensics and patemity. Proposed Technique for Analysis of Structural Polymorphisms Structural polymorphism analysis has some shortcomings. Sequencing the variants is more labor intensive than size analysis and is not amenable to high throughput processing at this time. The allele subjected to analysis needs to be separated from the other allele of the same locus, otherwise the sequence reaction would produce multiple sequences, making the results uninterpretable. Furthermore, structural polymorphism analysis of homozygotes would be problematic because both alleles migrate at the same rate and thus cannot be separated by size by any electrophoretic technique. It would still be beneficial to isolate the homozygotes and sequence them, because if they were identical, a clean sequence would be obtained. Alternatively, if they had different structural polymorphisms, the sequence results will be unreadable due to overlapping sequences from each variant. A solution for the drawbacks discussed above is to utilize a technique that can distinguish alleles by their sequences and not by their size, as electrophoresis does. Microchip technologies for analysis of STR alleles and single nucleotide polymorphisms (SNPs) (Radtkey et al. 2000, Tillib and Mirzabekov 2001, Sobrino et al. 2005) have this capability. Microchip assays are similar to the reverse dot blot technique, in which oligonucleotides probes are developed for each STR allele. After hybridization of the 38 amplicon, visualization occurs through excitation of the fluorescent probe (Tillib and Mirzabekov 2001). Microchip arrays lend themselves to highly multiplexed testing (Radtkey et al. 2000), have the capability to detect 1250 or more alleles on one micro- array slide, and are comparable in processing time to STR analysis (Jiang et. a1 2006). Table 18 shows four probes that could be created to identify all the D2181 1 allele 30 structural polymorphisms. Probe one will bind to VRI and to the first tetranucleotide repeat of VRII. Probe two will bind to VRI and to the first two repeats of VRII. Three and four hybridize to VRII, with probe three binding to the last tetranucleotide repeat of VRI. Each probe will hybridize to multiple allele 30 structural variants. The pattern of hybridization would allow the identification of each variant (Table 19), as was the case for PolyMarker and DQa Typing Kits. Table 18 - Probe sequences that can distinguish all seven structural polymorphisms of D2181] Allele 30 Probe Sequence Tm 1 3’ - AGAT AGAT AGAT AGAT AGAT AGAC AGAC - 5’ 54°C 2 3’ - AGAT AGAT AGAT AGAT AGAT AGAT AGAC - 5’ 53°C 3 3’ - AGAT AGAC AGAC AGAC AGAC AGAC AGAC - 5’ 60°C 4 3’ - AGAC AGAC AGAC AGAC AGAC AGAC AGAC - 5’ 61°C All of the probes will hybridize to multiple sequence polymorphisms and the resulting patterns would allow each variant to be identified. T... - melting temperature This proposed scheme will have limitations, for example, it will not distinguish homozygous 30 alleles with different structural polymorphisms. The pattern of hybridization for certain combinations of structural variants would be the same for other pairs. For example, if a homozygous 30 allele had variants 30° and 30°, it would produce the same pattern of hybridization as 30° and 30°. The 30 allele would have to be isolated from any other D2181 1 product, to prevent the other amplicon from hybridizing to these probes. Furthermore, since all four probes have different melting temperatures (T...), four different hybridizations will need to be performed. Table 19 - Pattern of hybridization for D2181] Allele 30 Probes Structural Polymorphism Probe 30a 30" 30c 30(1 30e 30f 30g 1 X X X X X 2 X X 3 X X X X X 4 X X All of the probes will hybridize to multiple sequence polymorphisms and the resulting patterns would allow each variant to be identified. For example, 30" would only hybridize to probe 3, while 30d will hybridize to probes 1, 2, and 3. X indicates hybridization. Despite these limitations, structural polymorphism analysis is a logical next step in human identification. It may be possible to develop probes to identify all D218] 1 alleles and their sequence variants (Appendix D). However, as more D2181 1 structural polymorphisms are identified the development of suitable probes may become more difficult. Furthermore, the alleles would still need to be isolated with gel electrophoresis and excised before hybridization is performed, because the pattern of alleles produced by two D2181 I alleles may be the same for other pairs of alleles. Structural polymorphism analysis will not be truly effective until all structural variants for each allele at every locus have been identified. Once this happens, a technology will need to be developed, or a current one adapted, to analyze both the STR 40 n l— alleles and their structural variants, without the need to excise bands from gels. For example, a microchip array could be developed that allows only one target hybridization site for each structural variant. Currently there are 161 alleles throughout the thirteen C ODIS loci (Applied Biosystems 2006a, Applied Biosystems 2006b), not including microvariant or off ladder alleles. Twenty three of these have sixty known structural variants (Butler 2005), including the variants observed in this study. Therefore, at least 198 sequence specific probes would need to be created. Some probes will have to be long. perhaps the entire length of the target sequence. so specific temperature dependent hybridization would not be possible with current technologies. It would be difficult to develop all the probes with the same Tm. Consequently, the analysis would necessitate serial hybridizations, where the temperature across the array is reduced and hybridization is detected for a given probe when its specific temperature is reached. If these hurdles can be overcome, the microchip array could hold all the necessary probes on one plate, detect alleles and their respective structural variants at the same time, and ultimately eliminate the need to separate the STR alleles from each other. Furthermore, it would have the ability to distinguish homozygous alleles with two different structural polymorphisms, due to the specificity of the probes (Figure 4). Structural Polymorphism in the Courtroom Currently, the vast majority of DNA evidence admitted in court proceedings is based on STR allele size analysis. Structural polymorphisms have not been introduced in the courtroom in forensic or paternity cases, but could potentially be used by the prosecution or defense in an attempt to strengthen STR analysis. Structural variants have 41 the potential to be utilized in courtroom proceedings as the technology and methodology are established in the scientific community. —-._— ___.-_ . l 8 9 10 11 12 13 14 158 15b 15.2 16a 16b 162 17a 17b l ’ x "‘x ,""". r “a , r ‘5 ,c“ "x_ ‘~~_ , ' “ , /'“\ ,x"~\ / ‘»\_ // \\ 1'" -~. . u ‘ ' '. ‘. , ; . _ l , _‘ f _r' ,l I 1‘ I ‘ ‘ ' D381 358 - _ ' , . a ‘ 1 ‘ ~ " ' ‘. ‘ ' ' I K .I i 1‘ / °\ / i‘ . ’ \'\ .. i \s. ’ “\_ ,’ l‘ . 7 ,‘ \c 4 ‘ \ , ,1" R. .‘l ‘3 '4 .../' 10 11 12 13a 13b 14 15a 15b 152 16a 16b 16.1 17 18a 18b 11A\‘ //’_\ r. ’4 \x l " ""~ l/"‘\\ (,f'\ ./-fii\ ,/F\‘ /—-~\\ ;I'/-« .\ //—-\ “ /,—\\I\‘ - . ,_ .\ i vWA 1: j 1' ’1 ( :j y l l j It :. 1. ; I . 1' 1‘ 6 a l'_ l I ‘1 l . I . i \ -. .- \ 2" \\~ _ “I, \~._ ,3 \‘ ’14 \ 4", \\V/ \\~’/, \‘ - 1' , .\~~/’. \._ «._ _..-' ‘ \_-," a 7' 18.2 13.3 19 192 20 21 22 23 24 25 l "1T1 1" A) 1“ ,2,» (A) (A) TI) b1 1’ 1 1 1 .l 1 \2—1' xx \J' \_/’ ‘\_/ \/ K," ‘~\_/ \2/ \V/ 1 1 1 Figure 4 - Proposed view of a microchip array Shown is a partial illustration of a proposed micro array for simultaneous analysis of STR alleles and structural polymorphisms. In this display the genotype for the sample at D3Sl358 is alleles 14 and 17, with structural variant 17b at allele 17. The genotype at vWA is 16‘I and 16°, which by size analysis it would be a homozygote 16. d ,A / \ / \ A Some lawyers already have the knowledge that structural polymorphisms exist (Scarpetta, Personal Communication) at certain STR alleles and may use this information in an attempt to discredit STR evidence, since the sequences of alleles found in the evidence may be different than the alleles of their client(s). Even though structural variants have only been observed at four CODIS loci, they may exist at all thirteen. Therefore, omitting STR loci with known structural polymorphisms from a test report is not an option, as forensic scientists must report all the STR results that are obtained during testing. Otherwise, it may be perceived that evidence is being withheld that could possible eliminate a suspect from a crime. As a result. they may need to compare and confirm the sequences of each allele in the evidentiary sample to the suspect’s sample. to ensure they match. Once sufficient research has been completed to identify all structural 42 variants at every allele used in human identification, it will only be necessary to determine the sequence of alleles with known structural polymorphisms. In addition, when all variant frequencies are calculated, the statistical gains afforded by structural polymorphisms would strengthen the power of DNA evidence in the courtroom. 43 APPENDIX A 44 Known structural polymorphisms at D2181] locus Structural Repeat Structure Reference Polymorphism [TCTAlanCTGM[TCTAI3TAITCTA|3TCA [TCTAhTCCATAuTCTAh 27a (TCTA)4 (TCTG)6 {43bp} (TCTA)9 Mollier et a1, 1994 27" (TCTA)6 (TCTG)5 {43bp} (TCTA)3 Schwartz et al. 1996 27c (TCTA)5 (TCTG)5{43bp} (TCTA)9 Griffiths et al. 1998 28° (TCTA)4 (TCTG)6{43bp} (TCTA)10 Mollier et a1, 1994 28" (TCTA)5 (TCTG)6{43bp} (TCTA)o Zhou et al. 1997 29’ (TCTA). (TCTG)o{43bp} (TCTAm corms or al. 1998 29" (TCTA)6 (TCTG)5{43bp} (TCTA)10 Zhou et al. 1997 308 (TCTA)4 (TCTG)6{43bp} (TCTA)12 Schwartz et al. 1996 30" (TCTA)5 (TCTG)6{43bp} (TCTA)“ Zhou et al. 1997 30c (TCTA)6 (TCTG), {43bp} (TCTA)11 Griffiths et al. 1998 30d (TCTA)(, (TCTG)6{43bp} (TCTA)10 Brinkmann et al. 1996b 30.2a (TCTA)5 (TCTG)(,{43bp} (TCTA)10 TA TCTA Griffiths et al. 1998 30.2" (TCTA)5 (TCTG)5{43bp} (TCTA)11 TA TCTA Schwartz et al. 1996 31a (TCTA)5 (TCTG).,{43bp} (TCTA)12 Griffiths et al. 1998 31" (TCTA)6 (TCTG)5 {43bp} (TCTA)12 Mollier et al. 1994 31" (TCTA)6 (TCTG)6{43bp} (TCTA)11 Zhou et al. 1997 31d (TCTA)7 (TCTG); {43bp} (TCTA)H Schwartz et al. 1996 32a (TCTA)6 (TCTG)5{43bp} (TCTA)13 Griffiths et al. 1998 32" (TCTA)5 (TCTG)6{43bp} (TCTA)13 Zhou et al. 1997 32.23 (TCTA)5 (TCTG)6{43bp} (TCTA)12 TA TCTA Griffiths et al. 1998 32.2" (TCTA)4 (TCTG)6{43bp} (TCTA)13 TA TCTA Brinkmann et al. I996b 32.2L (TCTA)5(TCTG)6{[TCTAhTAlTCTAbTCA Brinkmann et al. l996b [TCTAjoCCATM (TCTA).3 TA TCTA 33.2a (TCTA)5 (TCTG)(,{43bp} (TCTA)13 TA TCTA Griffiths et al. 1998 33.2" (TCTA)6 (TCTG)5{43bp} (TCTA)13 TA TCTA Brinkmann et al. I996!) 33.2C (TCTA)6 (TCTG)(,{43bp} (TCTA)12 TA TCTA Brinkmann et al. 1996b 45 Known structuralpolymorphisms at D2181] locus Structural Repeat Structure Reference Polymorphism |TCTA|..|TCTG|..{|TCTA13TA|TCTA|3TCA [TCTAhTCCATAuTCTAh I 343 ('I‘CTA)5 (TCTG)(,{43bp} (TCTA115 Zhou et al. 1997 l 34" (TCTA)1Q (TCTG)5{43bp} (TCTA)H Brinkmann et al. I906b 35. (TCTAlio (TCTGlsl43hp1 (TCTAllz Griffiths et al. 1998 35" (TCTAlu (TCTG)5{43bp} (TCTA)11 Brinkmann et al 1996b 3o“ (TCTA)11 (TCTG)5{43bp} (TCTA)12 Griffiths et al. 1998 36b (TCTAllo (TCTG)5{43bp} (TCTA)13 Brinkmann et al. 1996b 36’ ('ICTAllotTCTGlol43bp} (TCTAhz Brinkmann et al. 1996b 37.2a (TCTA)-;(TCTG)14{[TCTAI3TCA[TCTA]2 Walsh et al. 2003 TCCATA} (TCTA)12 37.2" (TCTA)9(TCTG)12{[TCTA]3TCA[TCTA]2 Walsh et al. 2003 TCCATA} (TCTA)12 37.2“ (TCTA)o('I‘CTG)13{[TCTAI3TCA[TCTA]3 Walsh et al. 2003 i TCCATA} (TCTA)11 37.2" (TCTA)10(TCTG)111[TCTA]3TCA[TCTA]2 Walsh et al. 2003 TCCATA} (TCTAMZ 37.2c (TCTA). 1(TCTG)1 . {[TCTA]3TCA[TCTA]2 Walsh et al. 2003 TCCATA} (TCTAh. 382’ (TCTA)9(TCTG )13{[TCTA]3TCA[TCTA 12 Walsh et al. 2003 TCCATA} (TCTAm 38.2" (TCTA)10(TCTG)11([TCTA];TCA[TCTA]3 Walsh et al. 2003 TCCATA} (TCTA)13 39.2"l (TCTA)10(TCTGI13{[TCTAlsTCAITCTA‘IZ Bagdonavieius et al. 2002 I J TCCATA} (TCTA),3 39.2Tr l Bagdonavieius et al. 2002 ! 1 I (TCTA)1[(TCTG)12{[TCTA];TCA[TCTA]; TCCATA} (TCTA)17_ 46 n—.—.—— .- APPENDIX B 47 48 d: Eddd «dimmd Sic ow ded 8A.: mendddd cmddocd comm; Sum: Sim mdccd mm womdddd mdmvmd comma: ommdddd momma wadedd mm. _ dddd oedema Adan—ma... 331:; Eddd hm wdflddd mnmdddd cdimms ddmmddd edmmm; mcomdd d «wedded 97:3..— cdimr; doth—3M .Eddd cm. mcéwmd 3-me _ Samoa 2 ded wdic cicadd mdmcma scan—NH moth. no-5: mdddd ”mm wooddd Sided worm—RM 03: .dd odmwmd .cmflod n _ mmddd 3-5%.» odiwmd 32mg.— Endd mm omcdddd w» Eddd ddimfim m: Sod ndmmd SF. _ ddd :mdddd comics mdémd cede”; .mdcd Nam mmmEdd momddd o 8-3de mmmddd con“; _ ._ $wmddd Ecddcd moan—NV.— cdmw _ ._ 2.-.;de omdcd 3. Nmmcddd gm _dcd 8-5.x". Nvf _dd come—d dog—dd memddd md-mdm.\. dome—d «dmmw; wdwdd m.mm anodddd mdmdddd edmmmd 9mm Edd mom—Nd cmmmddd m _ mdddd mom _ ._ bdmmd odfiofim owddd mm vomidd dddmddd mdémmd mmmdmdd woman. www.mmdd mnfiddd nfdddd mdédm... mode. v waddd m.mm Omcmdd d Sudddd mdicdd Ndmoddd odmvvm ommwddd caged Baum—é 8-5%.». mam—mo.— mtdd Mm NE: 3d mmmdcd mddgd cmcidd 3-55.. mummmdd vmxmdcd mdedd meted.— md-um_.m mwcd m...“ 2H _ dd odomddd moi _ 5o wmdmdd md-m~d._ N. _ momcd Eddddd we _ dddd mdimd; .ndmmwd odwdd F. V wwdvddd cmwdddd momma wmmndcd edmvwd .mmdddd m _ mddd worm _ 9v odivwd mdhhm _ ._ mo _ dd mdm W momdddd dimes comedd owedddd 23.“va Exceed Nd _ dddd ocmm _ .v no-1“: v. m ccmmd; d _ A Edd adm m. mod—odd mmddcd 8-me _ .o oviddd od-mmd._ Smmddd onmdddd modem; cdmmd; oohood n3 mddd ed». M. Edged mwwdddd mdéomd wm Kddd 3-3.54. mm Eddd E ded mdimmé Edmund mom». : mug—dd udm .m Edged 3.3.5.0 mom an _mmm_dd odmmmd vam _ dd .mcmdcd mdimws corammd mdéod; mommmdd odm m 22 _dd mvnwcdd wmdccd ndmdgd wd-mm_.m :58 d 2d— 3d mmmdddd mom—SN . 3.53:0 sordid cdm .h. Sided momdod :vwddd dean—c: 3.”de wovmdcd wddmmd wounded 3-33.; oommmdd adm some m nmmdddd 3mm; momdddd mdmmnd 3-33.. noun—N— Siod edddd mdm Emmmdd mdmwfim emu Sod wf .mdd wwwdddd 3155.». 2 .dddd ow . d om worn: mode”? mow: _._ Sun—Va ”diam dome Eddd Ndm deddmod c _ mnmdd mwmdddd modcwd 9. Seed mvmd mm dimddd Vmaddd mom—N: mdmomd cmdd mm 8-5%.. Shim Sins Eddd cm 3.5 meme Eddd NHN . ”5-20 mdddd mm Candid cammmdd cdddd 22d Edod mvmd omdd good good mdddd _ 22?. adm adm mdm dN mdw mm Z. cm Nwm mm o_o=< zocoacfim o_o_.< 3.2.2.79... ombegu 2532. =a be Sag—6m d». 22.... E9... 3.23:3...— o_o=a mEmEEeEbca 22.32:; aid—.35 :52. .5. 5.23.3.3 :2EE.E.5£U..¢ gotch— coded .wo-mvod mo-m.mn._ mmooood mddddd mdfiNd. codmfi. oo-m_m..m mo-mow.. modomd moood mo-mom.. vioood modmwé ozoood nmmoood modwmw oo-mw.v no-5“?— mo-m.wm.m mo-mm..o Eood moan. $6 hvwoood .Noood Vmooood voodood emmoood mo-Moo.N modwmd mmoood womoood .oood oodofim mode...N no-mmo.. wok“: _ .m mdfimwé mo-mm . .. oo-m.mo.. oo-moo.m mo-mm . ._ moknGo. moood owmoood. wmmvood wo_ood commood .womood com—ood wo.ooo.o mmmoood mm. .ood ovomood 3 mod mo-mmw.m mvoood no.oood NNmoood Nomoood o. .oood mo-m©o.. mo-mo..m h. .oood Nomoood .mood mo-m.mv.m Ewoood momoood Ecoood . mmooood hmmoood mo-mmo.~ wokmzod mmmoood wwmoood omood mwmoood mnmvood oo.oo.o mommood mwovood mm. .ood oo.ooo.o N. . moood mo. .ood woomood womod mo-m.m. .m wmcoood mm .oood wnvoood vehoood ntoood mo-mwm.. 3&3...» E .oood moood ovood fmvood hwmmood w . good omm. .od woemood wmmoood Shoood osmood vmwvood voood oomoood amtood mwhmood cooood mo-mo.m htoood ovooood .N. .ood N5 .od vonmood vaood woo—ood otoood mmmoood moiood ammood Nmod meoood no.moo.o whmoood mmwoood mmomood Emmood oowod ocmoood mo-m.om.o weoood mmnoood NmNSod Naod oofivod mo-m.:.. moéwvd m. .oood ESood 3%.ch vo.ooo.o ommoood Elwood cmmoood mm.oo.o ND... .od Noo.ood momNNod good good mtod Nmod ooood mo.o.o Stood Elwood thde mommmod mm Wmm Nm N..m .m Ndm .om uom com Uom «.2?V mugs—5...". o.o=< mm S. 2 Sm 2 3m x Nam mm mm... NM N: 2 N2 .3 92 “.2 com 222 Kauanbau alauv 49 Nvmmnad "8.: =o::=_E_._um::o .533— wntmod "A2: 3 5:538: :32): 865:3: 2:85» 2.58.. :5 .3 83:5 .8 E:w =a .3 3.5: 3: E: 2:1 _ 8 =3 e m. cot—«EEtemEuc .535— Smnm 8m: 8m 2 Sm: 8:: .m 8-: _ .m 8:3. 8-33 88: mm 8mg; 3-: _ E Sm?» 3&3.“ 8-5%.” 8:8: 22%.” v _ 8: R 8mm: 8.33 33:3 3%: 82mm.» £88: $8.: 3 v filo 3-5%.. 89:22 8.52 2E3: 88: mm m £88: 358: :88: :28: 3 8: mm a 8m. 3o 258a 3585 :8: N: w 3.5;: $885 $8: 3 .m 388: 88: N? m 88d :8: Good 88 o 18.3 38.0 083. 88: 222 m 3 2 9m mm mm 3m 3 NS 222 55:69.5 o_a=< 50 APPENDIX C 51 8-..:- . .N 8-...88 8888.8 828.8 8-88.8 3.88.8 8888 8-88.. 8.88... 8.88.” 88.8 8 8288.8 8.88.. 88 .888 888.8 88 . 888.8 8888.8 888.8 8-... _ .8 8.8.8.8 8-8:: 88 .88 .9 8-..: 8-8.8.8 8 . 88.8 888 .8 8 8888.8 88 .88 88888.8 8.88.8 8.8.8.. 8.8. .8 8.8 N. .8. . 8-88 .- 888. 8 . .888 8888.8 8.8888 888.8 88888 8.88.8.8 8.88.. 8.88.8 8888 .8 88.8.8. 8.8.. 88.88 8.88.8 8.88.8 .8888 .888 8 8.882 8.888 88.8.8. 8.... .88 U.8 8-88.8 8-88.8 .8888 28888 3.888 888.8 8888.8 8.82 8-88.8 8.88... 3.88.8.8 8.8 8-8 . mm 8.888 88888 888.8 8 8 . 88.8 888 .88 8888.8 .2888 8-88.8 8.882. m . .888 “.8 8.8.8 8.8.8 8888.8 8.88 8.88.8 .8888 888.8 8-82.. 8.8.8.8 8.8.8.8 8888 8-...3 8.88.8 8.8.8 888888 .8888 3.88.8 8888 8-88.8 8.83 8.8.8... 3 _ 8.8 ”N88 8.88... 8-88.8 8888.8 8888 8-88.. 8858 m. .888 8-8.2... 8-88.8 8.8.88. 8.8.88 .8 8-88.” 8.8: 888.8 88.8.8 8-88.8 8:888 8888 8-88.. 888.. 8-88.8 8.88.8 o8". 8-...38 8.8.8.8 8888 88888 :888 3.888 828.8 8-88.8 8.8.8... 8-8». 2 8888. 8.8 88 8-..: .. 8-88.8 .888 8.888 888.8 88888.8 88388.8 8-88.8. 8-88... 8.88.. 8888 U88 8.8:. 8-8.. 888.88 .8888 .8. .88 88888 8.88.8 8888 8-8 . .8 8-88.8 888.8 .88. 88-...8. 8-88.8 888.8 .8... .88 888 8 88.8.8 8.8.8 8. 8.: 8-8888 8.88. 888.8 .88 8-.. 8 88.8.8 8-88.8 8 . 888.8 8-88.8 8.8.8.8 8.8-8.8 8.8%.. 8.8. 8-8.8 888 8.8 8.82 8.8.8.. 588.8 88 .88 8888 88.88 8.88 m. .888 8-88.8 8.8.8.8 888.8 88 8-888.. 8-82.8 GS .88 88.8.8 88.88 8888 N888 888.8 8-8.8.8 8.8.8... 8. :8 .88 8.4.8.8 88-8.8.8 8-8.8.8 8-8. .8 8.88.8 8.8.8 8-88.. 8-8 88-88 88.8.. 888.8 88.88 8-88.8 8-8 . .8 8-8.2.8 8-8.8.8 8-8.8.8 8.888 8-8.. 8-88.. 8.8... 888.8 88.88 .8888 8.888 888.8 .888.8 88.88... 8.88.8 8.88. 8-88.8 888.8 .88 8.8888 .8888 82888 88.88 888.8 8-88.8 88.888 88.8 =8 8a.: .8 88.888 8888 8.88.. 8-88... 8-88.8 888.8 08 888.8 88. .88 8.88..- 8-88.8 8.8.8. 8.8.8 .8 88. .88 8-88.8 885.8 8.8888 88.8 “8 8-83.. 8.8.8 8.8.. 28.8 8 buanbud alauv 8m... 8-88 .888 8.8 88-88 88.8 8 888.8 888.8 888.8 88.8 888.8 88.88 888 8.8.8 .888 «88.8 8.2:. 88.8 “8.8 88.- ...N ,8 .8 .8 8 N 8 8 8.2:. 55:59.... u_o=< 3.95:7»...— onbocou £582. :3: 3.53% dot—5:39: 0.0:: mEmEEoEbca .2383: 32352;.— chEoE _ 539 .5.— :o_.a_:u_au :EuaEEtunE .3 322. 33352;: oo-Mm oo-m—o ooéom... mo-mo_.m nod-8.8.0 oo-mNc.~ oo-MSé 28mg..— wo-mm. wo-mo.m cocood oo-mm.m wo-Mmo._ oo-m _ m.w mo-mmm.m mo-mvwfi oo-mmmN cc-mné no-mwo; wc-mv. woémé hooood Sham-Nd wo-MwNA mo-mmo._ mo-m_.m no-mde codex-N wed—5m ho-mwod wo-mn; wo-m:.m mwooood mo-va; woflwmé mod-3v...” 3:506 coma—.m- oo-Mwmd mo-mmo._ no-mvwd menu-tum no-mg; mwmoood wo-mwN-A 8&me gimme wm_ooo.c oo-mmo.m mo-mw _._ mO-Mmmd 28-de wo-m: ho-mfim mmoood wo-MmNM no-mww; wNSood mwmoood mo-mw: mc-mmm.m mo-moos @o-mNmN ho-EN hO-Mmé moSod wo-mmo.m wo-mfd mo-m :1- Nmmoocd oo-mmwé mo-mmo.m 3.2.: code.»— no-mmm; mo-m—oofi Eoood mo-mm _ .o no-mmmd Nmmoood nocoood mo-mmo.m mo-Mm _ .0 mm _ oood coda-v mo-mccfi oo-m _._ mm _oo.o no-mmw. no-movw mvvoood «mm—cod mo-m _ .v mmaocd 0:386 wo-Mwnd no-mmms co-MNN ocmood wo-mm. wo-MWv mo-mmc.m oo .oood oo-mcmgc. mo-m _o._ mo-mmo.m nc-mmfi wo-mo ho-mw; moood ho-mmoa co-mw 2 $386 Emmood mo-mond vcmoood wmmoood mo-mwwg eo-Mnm. oo-m :av wage-o co-mm: co-mmm.m .3806 $3390 vow-coed 32:56 3286 maimed cod—Ev mo-mZL mmmmod ho-mmm; ho-Mmcé £885 2 Sod mot-“KY8,” vo—oood womoood co-mvvfi no-mmé way-wow; .mood wo-mwms Sum—Nd oioocd wmmoood mo-mmo. mo-mcoé 8-de co-meM ho-mmoN. ho-mmm.w wig—cod no-m _ Wm no-mvoé wmmoood m _ o _ ocd mo-moo.v ow _ coo-o nomoood 3&2: hc-mwwd mac-mood mmvvood 28-va— no-mNcé E-moood mm. 39o 8&va moSood Smoocd Wye-moms wok-.3. .w eo-me; memood mo-m—Noé cod—om.— mm. :56 wommood mo _ ocod .moood .mooood mo-mNNN oo-mww. co-mvmh vmoood no-mvmd co-mtd mvmmood car-$856 .3886 .8256 mvmgod mo-mvvé eo-mfim mo-m: _._ mg _ cd wo-mmN no-moé ammoood who—cod mo-Mm fin mm Soc-o $886 8.5. no-mNd cod-GEN ovood mo-mnva oo-mvo; mvwoood mmmood Bumps mmmoood oovoood mo-mhm: co-mom._ code—é vooood oo-mvog eo-mN_.m mmmood owmnood mmmoood good mom—cod mo-mm co-mfé wok-www.— mwomod co-MwoN . Awe-mad omomood wtfod covoood «Em—cod wan-mood mo-mood oo-mmm.w mot-“EN v0.36 mmooood whooood whoood mmmfd @396 £56 £85 flood Sood goo-o 8m.wm 8N.wm nwm 8mm own sum 88m cm m.mm mm o_o=< 55:69.:— 22?. 8.8 .8 «8 8.88 .88 .3- 08.8 8.8.8 «N 8 8 gamm- 88.8 mm mm 8.8.2 Kauanban many 53 8.88... 8-88.. 8-8 . .8 2 .888 8288.8 8888.8 8888 8-88.8. 8888 8888.8 .888 8.88.8- 8888 8-88.. 8-888 8-888 8 . 88.8 8-88.8 8-8... 88 .8888 8 . 888.8 8.- . 88.8 8888 8-82 888.8 8 _ 888.8 8.888 .8888 88888.8 83“. . 8 8 . 88.8 88 . 88.8 8.888 8.88... 8-88.. 8-8 . .8 8. .888 8 .8888 8888.8 .8888 8.88.8 .8888 8888.8 888.8 8888 8-8 .8 8-8 . 8.8 8888.8 8 . 8888 88 . 88.8 8888 8-8. 2 888888 8888 888.8 888.8 8-88.8 8 _ 888.8 88888.8 888888 .8888 8288.8 8-88.8 88 . 88.8 8888.8 88 .88 8.88.8 8.88. 8-8: 8 .888 8888.8 8888.8 888.8 8.888 888.8 8288.8 888.8 888 8.8.8.8 8-8 . .8 8888.8 .8888 82.888 8888.8 8.88... 8888.8 88 . 88.8 888.8 8888 8-8 . .8 3888.8 8888 8888.8 8888.8 8288.8 88.8 $8.88 88888 888 8 88. .88 2888 8888.8 88 .88 88.88 88888 8 :888 8-8 88888 82 .88 8.8.8 8 .888 8-88.. 8-88... 8 . 888.8 8 . 888.8 8 .8888 8888 8-8. .8 8888.8 8 . N . 88.8 288.8 888.8 8-88.8 8.888 888.8 8888.8 8888.8 .8888 882 8.288 88888 88.88 8188.8 88 .8888 88888 88.88 888.8 88. .88 8888 8-88.8 88888 888.88 888 88888 8.888. 8-88.8 888.8 888.8 8888.8 8888 8-8... 288888 8888.8 8888.8 8888.8 8-88.8 2 .888 88888 8888 888888 8888 8-82 8.8.8 :8888 82 .88 8888 8-8....- 88888 8.888 28888 8888.8 8888 8-88.. 2888 888888 8.8288 8288 8 8.888 88888 8.888 88888 .8888 88888 8.888 8888.8 8828.8 2 .888 8 .8888 8-88.. 8-88.8 .8 .8888 8. 888.8 .8 . 88.8 8 . 88.8 8-88.8 8888.8 .8 . 88.8 888.8 88888 8.8.8.8 8:888 8888 888.8 8888.8 88888 8.88. .8288 8888.8 8.88 8.88.8. 8.88. 8.8% 888.8 8888.8 2 .8888 888.8 8.8888 888.8 8 .288 888.8- 8.888 888 88.88 8888 8-8 .8 8888 88.888 8. 88.8 8888.8 8888.8 8 .8888 8.8.8 8-88.8 8:888 8888 888.88 88888 88888 88.88 8-88.8 8.888 88888 888.8 8. 2 .88 8888.8 82888 888.88 8.888 888.8 88 . 88.8 8-8 .88 88888 .8888 8888 88.8.8 8-88.8 8 .888 888.8 8888.8 8828.8 888 8 88888 8 . :8 8.88 8.28.8 8.888 888.88 888.8 888.8 888.8 888 8:88 8.38 88.88 .88 088 888 088 88m «88 8.88 888 888 8.8.2 88:283....— 227.. .o 6‘! Vt:- mm '3' m nu mmm rfirfir'} ('3 U P{~(\{ N8 N. N. N.“ N (\1 NM ('3 of. (*3 63 .C N M Na ._'N mm '0 u———— MMMM J ' (0.0 8888 “8 8m .88 088 88m 08 88m ".8- 8.88 888 888 o_o=< .(auanbaJJ many 54 8-8: 8.83 8-8 _ 8 8.82 8-m8.~ 8-82 8-m_ a. m 8m? 8-m$.m 8-3.. 888.8 8888 8-5% :5 _ 3 8-588 838 _ .m 883.; 8-83 8-5.» 888.8 8-93.. 888.8 8-m8.~ 8-83 8.8; 8-m _ mm 8-88.8 8-m;._ 8-me 8-83 888.” 8-23 888.8 8-2m.” 89:8 8888 8-m8._ 8x88; 8-88.8 8.88; 889% 8-m$.~ 8.58.» 888.8 8-m _ 3 8-8; 8-88 8-82 8288.. 8-sz 8-5m.“ 8&3 8-m _ 2 8.88.8. 888.8 8-83 8-88 8-38.. 8-83 8-83 8888 8288.8 888; 8-38.8 8888 8.88.8 8.82 8-m8.~ 8-83 883 8-82 m _ 888 8-83 8.83 8%; 8 88.8 .888 $me 8.8% 8-88.. 8-m :8 8.82 888.8 2 88.8 883 888.8 288.8 888.8 8 _ 88.8 8-82 8-5: 82888 8 _ 88.8 $888 388.8 8.8? 888.8 ”8 _ 8.8 888.8 8-58.8 8-m 8._ 8.83 8-8 _ ._ 8-32 8-83 8-88.. 8.8.”. 8-m§.~ 8-m _ 3 888 888.8 8-m$.~ 8.88.” 8888.8 888.8 $.88 288.8 8-m$.o 988.8 828.8 888.8 888.8 8.88.” $88.8 388.8 888.8 888.8 8m :58 84:3 828.8 2888 888.8 38.8 2:88 E83 E” 8.8 888.8 $82.8 888.8 88.8 828.8 833 «gm .3 08m no». US Dom “am mom 8% «a 222 35:59.:— 2o=< ax. on». can gem in «S. 08 n8 “8 m8 n.8 a8 o.o=< Kauanbeug alauv 55 3xmeeee mumeeee weighed an _eee.e Nuneeee mm _ eece 5. Seed _o_eee.e woeeee medec o voceee «geeee wwteee on _eee.e hmmeee e we . .meee veweeee mnveeee mnmeeee mieee meeee Nwemee V: See wmmeeee Ste—eee .mwveee weweee e «veeeee mi See vmmeee veeee vegee weird.— _ _ _eee.e Céeeee meémmd mefiwfio M: _eee.e bemeeee we-mm_.v mveee ofeeee mvm .eee mmeeee momeeee mwmeeee 2 See cm. _eee.e «.m—ee venmeee _e_ee.e w: _ee.e mzeee canveee ooveeee Nmee wedge NNeeee someeee woweeee me-mmm.e xencee e mm _eee.e Emeeee mfizeee oe_eee.e emf .ee efeeee wen—eee mfeeee awn—ed cmaeee mvveeee m. Evee metamd xveee Nwemee vgvee mveee om_e.e Nmee wehoeee emf _e.e gum—ed m. Swee wveee LNNm «Nd». nmm «mm N. _ m c. m u. m a. m a; pmem 0.2:. 55:3...”— o_~=< an e" u_o=< .(auanhaJJ BPIIV 56 88.8.8 8-8 8-8 . .m 888.. 888.8 8-8 I 888.. 8-88.. 8-82 8-88 888... 88 .3 88 8.8 888.8 88 . 8.. 8.83 888.. 88%.. 888.. 888.8 8.88 888... 88-8vw.m 88-8wo.8 no-8.m.8 88-88..~ 88-888.” 88-888.. 88-8mo.. 88-8VM.~ 88.88”.» 88.88..» mwoooo.o 8-8... 8-82 8888 888.8 8888 8-82 888.8 888.8 8888 88:8 888... 8-8.... 88 .88 88 .3 888.8 888. m 888.8 88 8.» 888.8 8.88 888. m 888... 88:... 88:... 888... 88 . Z 8 .88... 888.8 882 888.8 8-8.8.8 88 .8. 8 . 8.8 mo-mvm.~ mo-8wo.m 88-88N.m 88-888.. 88-88m.8 88-88... no-8wm.. 88-888.. mo-8mc.m 88-888.m .8888.8 888.» mm .88... 888.. 888... o .88... 888-8 8-8 . .v 8-88.8 8.88... 888.. 8.8... mm.ooqo momooQo mo-8mam wo-8vv8 .wmoood 88-8. ammooqo .888888 88.88Qc aomooQo wmmooqo 8888898 nvmnwa «Naoqo bnooqo 8mw.888 88.88qo 8888898 N~o.oqo ammooqo o.qooQo momOQQO 888.898 88.8898 888.98 8 .88... 888... 888. m o. .88... 888.8 888... 8 .888 a .88... .888... 8.8.... 88... 888898 88.898 mvood om.qo mmqo wopooqo 8Nm..88 8888.88 m..owqo ”wood awmm ammm 88m “mm ~.m .m u.m 8.8 ~.m 8~8m 20=< huuuoacuhh 2o=< 8.8 nwmm “N 8 8 0.9.2 Kouanbmg alauv 57 88 .2. 8-82 888.. 888.. 88 . 8 88: 88. ... 88-82 888.. 888... 88.... 882 8-8.. 88.8... 8882 8-88 88 . 3 8-8... 8882 8838 8-8; 88.... 888.8 88.. 8.82.8 88 .2 888.8 888.8 888.. 88... .8 8-8.: 8-8 . .. 88...... 88.82 888.. 888.. 88 ...... 8-82 888.. 888.. 8-8.... 8882 888.. 88...... 8882 888.. 8-8 . 2 888.. 88 . .8 8-8 . .2 888.8 88.... 8.88.8 8.88... 88.... 88.8... 888... 88 . 8.8 88 ... 888...... 888.8 88...... 8-8.... 888... 8-8.... 8 . ...... 88...... 8-82 8-82 8-8.. 88.... 8-8: 8-8 ... 8-82 88 .88 888... .88... 88...... 888... 8-8 . .. 88.8 888.. 8-8 ... 888... 888.8 888.. 8.82 8 . 8... 8882 8.8.... 888 2 8-8.... 88.8.. 8-82 8-8:. 8.8.... 888.8 888.8 88.... 888.2 88 . .... 8-8.... 8-8.... 8-8.... 888.. 888.8 88. ... 8882 8-8 . .- 8...... 8-8.... 88.... 888... 888.2 888.8 888... 8.8.... 8.88... 8-82 8.8.... 88.... 888... 8.8.... 888.8 88...... 8.8.... 88...... 88.... :88... 88...... 88.... 88.8.. 8-8.... 888.... 88 .... 888.8 m. 8...... 8-8.... 8-8... .8.... 8-8 . .N 88 . m. 888.. 8882 888.8 888.. 8882 83...... 888.. 8-82 8-8 . ... .... . 8.... 888... 88.. _ .8 83.8... 88...; 888.8 ... 8...... 888.. 8-82... 88.... 888... 88...... 8-8... 8.8.... .88... 88...... 28.... 88...... 88...... 8-8.2 888.8 88... 888 .. 88.... .88.... 88.... .88... 8:8... 831...... 88.... .88... 88...... 8...... 88.... .8 .8 2.8 .8 .8 02.8 .28 .28 8 02.8 2...... 35:59.5 20:4. Nmm N v f") ."3 4:: MNN“! m ('3 M "I ("a m 0.0.7. Kauanbud 3I3IIV 58 5 85m... ”5.: 8:355:85 .... .33...— mwoflod ”:2.— .8 .bEnanPE :82): 8035:5235» 93.8...— =a ho 8.3.5... .... E=m 835:3...— onbgom «EV-8n =5 ... 8.23... .... =5... 2: I _ 3 :33 m. 5.35523... .... .33.... 2.83. m «...-mm.V mod _ .m “...-m . n.— wo-m _ .m mod-3:8 “No-awed Sun. ... 38.8w moan—w.— oooco... ..wm «...-mod 2.8.36 383 88.9% MBA-mm ms wofihmé 3-88.. $8.38 mod _ .N 82...... .3 «...-was wo-va.m ”ciao-N 2.8.3.» mafia _ .m 2.88m.— 88: ..m wo-mmm.m mwooood ..wm wok-8N _ .w 38.3.8 ...-mad ...-mi... EARN-m 88:5.— wofimma 388... «mm 38.3.. Sfiwqm 88:28 38 _ v.0 2.8.3.. ...-mm...— mmoood at. W com. ... ...-m _ v.0 8888.. 88:8...” 388 _ .m we _ co... ...-m fl nodmfim com—N _ .. oo-m_m~.m 87mg.— Soood .3 m. 8.8.38 count-e 38.3.8 Sac-o ...-.m .m 383%.. cofl _ ._ cameo-o «cm W mod-3 mooo... m.m m 82...... Booed «mecca-o mmmoco-o mmoood 85o... Goes... 3:5... 825... Soc... 22?. ..wm owm ..wm “mm at” mum 00m .3 «cm m.mm 2o=< 55:3...— o_u__< 59 APPENDIX D 60 Table contains the probe sequences that can distinguish all alleles and their structural polymorphisms at D2181]. All of the probes will hybridize to multiple alleles and sequence polymorphisms and the resulting patterns would allow each variant to be identified. Probe Sequence 1 5’ - GCCT (TCTA)4 TCTG - 3’ h) 5" - GCCT (TCTA)5 TCTG - 3‘ 3 5’ - GCCT (TCTA)6 TCTG - 3’ 4 5’ - GCCT (TCTA)7 TCTG - 3’ 5 5’ - TCTA (TCTG)5 TCTA - 3’ 6 5’ - TCTA (TCTG)6 TCTA - 3’ 7 5’ - TCTA (TCTG)7 TCTA - 3’ 8 5’ - CATA (TCTA)3 TCTG - 3’ 9 5’ - CATA (TCTA)9 TCTG - 3’ l0 5’ - CATA (TCTA)10 TCTG - 3’ I l 5’ - CATA (TCTA)H TCTG - 3’ l2 5’ - CATA (TCTA)|2 TCTG - 3’ 13 5’ - CATA (TCTA)13 TCTG - 3’ I4 5’ - CATA (TCTA)I4 TCTG - 3’ 15 5’ - TCTG (TCTA)3 TA — 3’ l6 5’-TA TCTA TCTG-3’ l7 5’ - GCCT (TCTA)9 TCTG - 3’ l8 5’ - GCCT (TCTA)10 TCTG - 3’ l9 5’ - GCCT (TCTA)H TCTG - 3’ 20 5’ - TCTA (TCTG)H TCTA - 3’ 21 5’ - TCTA (TCTG)12 TCTA - 3’ 22 5’ - TCTA (TCTG)13 TCTA - 3’ 61 31 ll 1 mm _m il, ii 1 .I I3--- .1 Lillw cm 2 [III llll IT. ....III .ll.|¢ M: l .lIII'll ll llfill ILl ll ill: I it 11' .llllillll' II B— L x x x 2 x x x x x x x x x x x x x x x x x 2 Ti '2 il.l I «I ll. IE||1 m. x I: S I--- Liw. x x _ _ x x x x x x x l 2 x x x x II o x x x w h x x x x x x x x x x x x x o x x x x x m 11.53- Y v x x x x n ---, x wi x x x x x N -i x Ill. x xl x x x x - x _ 2m! 9% as” .3 Ni ..fi EN .33 .33 ..fi .fi 3: ..Z a: 3 N3 3 SN ..N 2.8; .28:— :m_~Q 2: .a 353...; 3:233 :2: cat—«13.23 =a saga—:36 :3 523 £233 «an... «5.? 5:35.52. 3 £22.51 62 O NMVTWONOOO‘ '— mem 32m mem LWNM 32m LWNM ..Nm «NM NAM in u—m in a—m 32.». Lwcm mam .3. gem each; 63 .e-=-i----.a-;s-lmj T IIIIIIIIII ILfI.I.IIIIIIIII. AYI III: Li X a x x I n IIIIIIIIIIIITII IIIVI- . I17 ........ IIII IIIIIIIrI.IIIIII I III X I‘IIIIT I IIIIL I IIIIIIIr I? II...I.I ,I I IIJIIIIIIII .x x I II I ITIII III. III ,I- #:IIII IwIII IIJfiIIII I . . x x x I... I II III ITIIIIIIIfIII I. IIIIIIIIIIII I VIIIIII 1 I IIIII III IIII. YII .II fin IIII I! IIIII A I Ian“ .1 -. TI III. X r. II I I .IIIIIA YI III I. I. II.I I IIIIA T.I .IIII‘ A . II II‘III ‘vIII I III... AIIII .I.I IIIIII 'l I'll T I.I~I III IIII. 'IIIIT IIIIA III III A I IIIIIII I I IIllv IIIII II. -I I I- I; II I'll-II IIOIIII . I T IIIII TIIIII aw». .-FII N5. I.TII,II.I IL. tIIIII IVIII -.IL II III. IL III III 1 .I IEIIII .- as». “hm um LrII In-.- I IIIII Ii. lull III I‘I‘I II .‘III 1.. III..I1 New III.III II.III.IA I- I I I.II ITIII IIIII.I III I IIIII III I IIIIII IIIIII vIIll .III x x x x x x 4 x w.I I. I II III I II AII I. Ilf. III IA x x a. --- - I .......... l-.ILII-i- -.--IIII. ..-IIIIII I- I- I I ---IiIII - -- I III II II+ III? -3. T: . I II‘I LTIIIIIIIIII.|. ,IIA . I III! I ... .II II III II I .5 II II I I IIIIIIIIIII I I ..... I‘vll Il.IIlI ATI III IYIII . L . I. II AT! I -IvII IIIIII ITI III I .(I I I II.TIII.II I I % y x 1.. I I I I: I I4 I I IATII II T I II I I- I III IIgfiI.I. I II. I I..I»,IJII I. II ..TI I I. I TI I III I IIIILYIII. I .. gnfibunlwlu.ad I‘Lrulkhil] FuIII IIIWUIIHflF-JIWFIJINIQ ‘Lr...l-lllr1.ll.li1nalr all ..I lerur'I-dlafirt I it“ I. II!!! I sq.u'1r.§‘.l|lrill'.il IIIIII IIII III .1 64 F‘NMV‘WONOOON 84v cg». Lw¢m 33m wam ..wm «scum 65 BIBLIOGRAPHY AABB, Parentage Testing Program Unit (2005) Annual report summary for testing in 2002. Bethesda, MD. AABB, 7‘h edition of Standards for Relationship Testing Laboratories (2006a) Bethesda, MD. AABB, Guidance for Standards for Relationship Testing Laboratories. 7th Edition (2006b) Bethesda, MD. Applied Biosystems (2006a) AmpF ESTR“:C Profiler Plush” User’s Manual. Foster City, CA. Applied Biosystems (2006b) AmpFCSTR COfilerI" User’s Manual. Foster City, CA. Bagdonavieius A, Turbett GR, Buckleton J S, Walsh SJ (2002) Western Australian sub-population data for the thirteen AMPFISTR Profiler Plus and Cofiler STR loci. Journal of Forensic Science. 47: 1149 — 1153. Barber MD, McKeown BJ. Parkin BH (1996) Structural variation in the alleles of a short tandem repeat system at the human alpha fibrinogen locus. International Journal of Legal Medicine. 108: 180 — 185. Bienvenue JM, Duncalf N, Marchiarullo D, Ferrance JP. Landers JP (2006) Microchip- based cell lysis and DNA extraction from sperm cells for application to forensic analysis. Journal of Forensic Sciences. 51: 266 — 273. Brinkmann B, Sajantila A, Goedde HW, Matsumoto H, Nishi K, Wiegand P(l996a) Population genetic comparison among eight populations using allele frequency and sequence data from three microsatellite loci. European Journal of Human Genetics. 4: 175 - 182. Brinkmann B, Meyer E, Junge A (1996b) Complex mutational events at the HumD21 SI 1 locus. Human Genetics. 98: 60 — 64. Butler J M, Shen Y, McCord BR (2003) The development of reduced size STR amplicons as tools for analysis of degraded DNA. Journal of Forensic Science. 48: 1054 — 1064. Butler JM (2005) Forensic DNA Typing: Biology, Technology, and Genetics o/‘STR Markers. Second Edition. Boston, MA. Elsevier Academic Press. 66 Chambers GK. MacAvoy ES (2000) Microsatellites: consensus and controversy. Comparative Biochemistry and Physiology Part B. 126: 455 -— 476. Coble MD, Butler JM (2005) Characterization of new miniSTR loci to aid analysis of degraded DNA. Journal of Forensic Sciences. 50: 43 — 53. Conner BJ, Reyes AA, Morin C, Itakura K, Teplitz RL, Wallace RB (1983) Detection of sickle cell Bs-globin allele by hybridization with synthetic oligonucleotides. Proceedings of the National Acadenlv of Sciences. 80: 278 — 282. Craig J, Fowler S, Burgoyne LA, Scott AC, Harding HW (1988) Repetitive deoxyribonuclelic acid (DNA) and human genome variation. A concise review relevant to forensic biology. Journal of Forensic Science. 33: 1111 - 1 126. Crow JF (1993) Francis Galton: Count and measure, measure and count. Genetics. 135: 1 — 4. Devor EJ, Behlke MA (2005) Oligonucleotide yield, resuspension, and storage. Intergrated DNA Technologies. Edwards A, Hammond HA, J in L, Caskey CT, Chakraborty R (1992) Genetic variation at five trimeric and tetrmeric tandem repeaqt loci in four human population groups. Genomics. 12: 241 — 253. Einum DD, Scarpetta MA (2004) Genetic analysis of large data sets of North American Black, Caucasian, and Hispanic p0pulations at 13 CODIS STR loci. Journal of Forensic Science. 49: 1381 — 1.385. Evett IW, Gill PD, Scrange JK, Weir BS (1996) Establishing the robustness of short- tandem repeat statistics for forensic applications. American Journal of Human Genetics. 58: 398 — 407. Fairbanks DJ, Anderson WR (1999) Genetics the Continuity of Life. Brooks/Cole, California. Federal Bureau of Investigations (2007) \wwvfbi. gov/hq/lab/codis/ index 1 .htm Federal Bureau of Investigation Quantico, Virginia. Fildes N, Reynolds R (1995) Consistency and reproducibility of AmpliType® PM results between seven laboratories: field trial results. Journal of Forensic Science. 40: 279 — 286. Fisher RA (1951) Standard calculations for evaluating a blood group system. Heredity. 5: 95 — 102. 67 Gloor PA ( 1980) Bertillon's method and anthropoiogical research: a new use for old anthropometric files. Journal of the Forensic Science Soc1ety. 20: 99 — 101. Griffiths RA, Barber MD. Johnson PE. Gillbard SM. Haywood MD. Smith CD, Arnold J. Burke T, Urquhart AJ, Gill P (1998) New reference allelic ladder to improve alleleic designation in a multiplex STR system. International Joumal of Legal Medicine. 111: 267 — 272. Hall TA (1999) Bioedit: a user-friendly biological sequence alignment editor and analysis program for Windows 95.’98/NT. Nucleic Acids Symposium Series. 41 :95—98. Jeffreys AJ. Wilson V, Thein SL (.1985a) Hypervariable ‘minisatellite‘ regions in human DNA. Nature. 314: 67 - 73. ’ Jeffreys A]. Wilson V, Thein SL (1985b) Individual-specific ‘fingerprints‘ ofhuman DNA. Nature. 316: 76 -— 79. Klintschar M, Dauber EM, Ricci U, Cerri N, lmmel UD. Kleiber M. May: WR (2004 ) Haplotype studies support slippage as the mechanism of germline mutation in short tandem repeats. Electrophoresis. 25: 3344 — 3348. Levinson G. Gutman GA (1987) Slipped-strand mispairing: A major mechanism for DNA sequences evolution. Molecular Biology and Evolution. 4: 203 -~ 221. 1\'-I«','>Ilier A, Brinkmann B (1994) Locus ACTBP2 (SE33) Sequencing data reveal considerable polymorphism. International Journal of Legal Medicine. 106: 262 _ 267. ' MOIIier A. Meyer E. Brinkmann B (1994) Different types of structural Variation in STRs: HumFES/FPS. HumVWA and HUMD2ISI 1. International Journalof legal Medicine. 106: 319 — 323. Momhinweg E. Luckenbach C. Fimmers R. Ritter H (1998) sequence analysis and gene frequency in a German population. Forensic Science International. 95: 173 — 178. ‘ Mullis K, Faloona F, Scharf S, Saiki R, Horn G, Erlich H (1986) Specific enzymatic amplification of DNA in vitro; the polymerase chain reaction. Cold Spring Harbor Svmposia on Quantitative Biology. 51: 263 — 273. Radtkey R. Feng L. Muralhidar M, Duhon M, Canter D. DiPierro D. Fallon S, Tu F. McElfresh K. Nerenberg M. Sosnovvski R (2000) Rapid. high fidelity analysis of simple sequence repeats on an electronically active DNA microchip. Ngglgig Acids Research. 28: e1? 68 Rolf B, Schtirenkamp M, Junge A, Brinkmann B (1997) Sequence polymorphism at the tetranucleotide repeat of the human beta-actin related pseudogene H-beta-Ac-psi- 2 (ACTBP2) locus. International Journal of Legal Medicine. 1 10: 69 — 72. Saiki RK, Walsh PS, Levenson CH, Erlich HA (1989) Genetic analysis of amplified DNA with immobilized sequence-specific Oligonucleotide probes. Proceedings of the National Academy of Sciences. 86: 230 — 234. Saferstein R (1993) “DNA Analysis in Biological Evidence: Applications of the Polymerase Chain Reaction” Forensic Science Handbook. Volume III. New Jersey: Prentice Hall. Saferstein R (1998) Criminalistics, An Introduction to Forensic Science. Sixth Edition. New Jersey: Prentice Hall. Schwartz DWN, Dauber EM, Glock B, Mayr WR (1996) AMPFLP-typing of the D218] 1 microsatellite polymorphism: allele frequencies and sequencing data in the Austrian population. Advances in Forensic Haemogenetics. 5: 622 - 625. Singh G, Johns MM, Paul G (1982) Paternity testing: analysis of six blood groups and HLA markers with particular reference to comparison of races. American Journal of Clinical Pathology. 78: 748 - 752. Smithies O (1955) Zone electrophoresis in starch gels: group variations in the serum proteins of normal human adults. The Biochemical Journal. 61: 629 — 641. Sobrino B, Brin M, Carracedo A (2005) SNPs in forensic genetics: a review on SNP typing methodologies. Forensic Science International. 154: 181—194. Stigler SM (1995) Galton and identification by fingerprints. Genetics. 140: 857 —860. Szibor R, Lautsch S, Plate 1, Bender K, Krause D (1998) Population genetic data of the STR HumD3S 1358 in two regions of Germany. Journal of Legal Medicine. 1 l 1: 160 - 161. Tillib SV, Mirzabekov AD (2001) Advances in the analysis of DNA sequence variations using Oligonucleotide microchip technology. Current Opinion in Biotechnology. 12: 53 — 58. Urquhart A, Kimpton CP, Gill P (1993) Sequence variability of the tetranucleotide repeat of the human beta-actin related pseudogene H-beta-Ac-psi-2 (ACTBP2) locus. Human Genetics. 92: 637 — 638. Urquhart A, Kimpton CP, Downes TJ, Gill P (1994) Variation in Short Tandem Repeat sequences — a survey of twelve microsatellite loci for use as forensic identification markers. International Journal of Legal Medicine. 107: 13 - 20. 69 Walsh SJ. Robinson SL, Turbett GR, Davies NP, \A» ilton AN (2003) Characterisation of variant alleles at the HUMD21$11 locus implies unique Australian genotypes and re-classification of nomenclature guidelines. Forensic Science International. 135: 35 — 41. Weingand P, Meyer E., Brinkmann B (2000) Microsatellite structures in the context of human evolution. Electrophoresis. 21: 889 — 895. Zhou HG, Sato K, Nishimaki Y, Fang L, Hasekura H (1997) The HunD2lSl 1 system of short tandem repeat DNA polymorphism in Japanese and Chinese Forensic Science International. 86: 109 — 188. 70 Vw.\.uu,n. ...».