THE ACCURACY AND RELiABlLITY OF POLICE POLYGRAPHIC ("‘LlE DETECTOR") EXAMINERS ~ JUDGMENTS 0F TRUTH AND DECEPTION: THE EFFECT OF SELECTED VARTABLES Dissertation for the Degree of Ph. D. MICHIGAN STATE UNNERSIT’Y FRANK S. HORVATH 1974 Hill ' 117 Mill ill! I! Till; 293 10269 -; LIBRE RY {'Ivtkizygauu'itlflc . 0mm This is to certify that the thesis entitled i“ THE ACCURACY ANDV‘REB'I'ABILITY OF POLICE POLYGRAPHIC ("LIE DETECTOR") EXAMINERS' JUDGMENTS OF TRUTH AND DECEPTION: THE EFFECT OF SELECTED VARIABLES presented by :" Frank s. Horvath has been accepted towards fulfillment -of the requirements for Ph.D. degree in Social Science '0in BLAKE» DateN Ndvember 114, 1971i )V1531_J RETURNING MATERIALS: Place in book drop to LIBRARIES remove this checkout from .—c—. your record. FINES will be charged if book is returned after the date stamped below. I r _» ' ‘zi'F \ it“! :3 13".. e " %”J[ 2 51752 ('33?ng 1*: ' . _. ‘ 133337mm3urfii . Q ‘ r 4"” “‘~. ‘LP‘T' c.‘ ABSTRACT THE ACCURACY AND RELIABILITY OF POLICE POLYGRAPHIC ("LIE DETECTOR") EXAMINERS' JUDGMENTS OF TRUTH AND DECEPTION: THE EFFECT OF SELECTED VARIABLES BY Frank S. Horvath The purpose of this study was to determine the accuracy and reliability of judgments of police polygraphic (lie-detector) examiners in blind analysis of polygraphic recordings obtained in field settings; and, to determine whether the accuracy of and confidence in such judgments and the ease with which physiological data were interperted varied according to the particular category from which recordings were drawn and the experience of the examiner. gethod A stratified random sample of the polygraphic re- cordings of 112 subjects involved in criminal investigations was drawn from the files of a police agency. Recordings were cross-categorized as verified or unverified, as pertaining to subjects considered truthful or deceptive, and as involving crimes against a person or prOperty crimes. In 0" ~- ’3 vi .. ‘\.‘ v. ‘1‘... l.‘ Ctr!- “at, . Frank S. Horvath Ten polygraphic examiners, five with less than three years of experience in lie-detection and five with more, all employed by a law enforcement agency, were recruited to serve as evaluators. Each evaluator independently reviewed the recordings "blind" and indicated: (1) if the subject from whom they were obtained was truthful, deceptive, or inconclusive; (2) his degree of confidence in each truth/deception judgment; and (3) the ease of interpretability of each of three physio- logical indices, reSpiratory, electrodermal (GSR), and cardio- vascular. Analysis Hypothesis-testing procedures were carried out using analysis of variance in a 2.2 x 2 x 2 Split-plot design. The four factors were: Experience (high/low); Verification (verified/unverified); Truthfulness (truthful/deceptive); Crime-type (person/property). Dependent variables treated separately were accuracy scores, the percentage of correct judgments; confidence scores, the sum of confidence ratings; and total ease-of-interpretability scores, the sum of the "ease" ratings for the three physiological indices. Results Overall, the evaluators made 63.1% correct judgments (p< .001). Contrary to expectations, high-experience evalua- tors were neither more accurate (p> .10) nor confident (p> .10) in their judgments nor did they consider recordings easier to interpret than did low experience evaluators (p> .10). a n Cu O‘- I ) . E“ (5' 1 Q...‘ ‘4 v bu. — .H‘ In ‘1! 'L‘ o q. (I) ‘« v,” F.’ “x l .rfl‘ E‘N v V. .. «“\V ‘9 "I‘ ~ Frank S. Horvath Predicted main-effects for the Verification, Truth— fulness, and Crime—type factors for all three dependent vari- ables were complicated by interactions. In essence, analysis of these interactions indicated that: recordings in the "de- ceptive/crime against a person" categories were judged more accurately, and those in the "truthful/crime against a person" categories less accurately, than all others across levels of verification; and that recordings of deceptive subjects were judged with greater confidence and were easier to interpret than those of truthful subjects irrespective of the nature of verification. Intra-class correlation-coefficients calculated separately for evaluators' judgments of verified and unveri- fied recordings indicated that the judgments in both of these conditions were highly reliable, .89 and .85, reSpectively. Both confidence-ratings and total "ease" ratings were higher in correct than in incorrect judgments (p< .002; p< .001, respectively). Further analysis of the ease—of—inter- pretability ratings indicated that evaluators rated reSpiration, cardiovascular activity, and GSR easier to interpret, in order; ratings were higher in correct than in incorrect judgments for respiration (p< .001) and cardiovascular activity (p< .001), but not for GSR (p> .10). Other issues investigated showed that accuracy increased as the number of evaluators in agreement increased, and that accuracy was higher (p< .001) when evaluators' Frank S. Horvath judgments were based on recordings with less rather than more polygraphic data. The results of a numerical scoring-scheme, as carried out by evaluators on a sub-sample of recordings, indicated that GSR-scores were more accurate than were those of the other two indices if inconclusive scores were eliminated, and that GSR was scored more consistently than either respira— tion or cardiovascular activity. Methodoloqical differences between this and other research on the same topic are presented to account for some of the differences in results. Further, it is suggested that differences between polygraphic recordings, due to the nature of lie-detection in the field, account for some of the ob- served interaction effects. THE ACCURACY AND RELIABILITY OF POLICE POLYGRAPHIC ("LIE DETECTOR") EXAMINERS' JUDGMENTS OF TRUTH AND DECEPTION: THE EFFECT OF SELECTED VARIABLES BY Frank S. Horvath A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY College of Social Science 1974 ©Copyright by FRANK S. HORVATH 1974 Dedicated to Jan and Juliann ii ACKNOWLEDGMENTS The writer was requested not to identify either the police agency or the polygraphic examiners who provided for and took park in this study. In Spite of their anonymity, however, the writer gratefully acknowledges their interest, cooperation, and assistance; a sincere thanks is extended to all concerned. In addition the writer is indebted to the following persons and organizations: Dr. Steven Olejnik, Office of Research Consultation, Michigan State University; Mr. James Mullin, Office of Applications Programming, Michigan State University; and Dr. Charles Hanley, Assistant Dean for Graduate Education, College of Social Science, Michigan State University, for their advice and assistance concerning the statistical treatment and evaluation of the data; Dr. Victor G. Strecher, Dr. Robert C. Trojanowicz, Professor Ralph F. Turner, Dr. Lawrence I. O'Kelly, Dr. Hiram Fitzgerald, and Dr. Peter K. Manning, members of my Ph.D. committee, for their interest and advice; and to the U.S. Department of Justice, for its financial support through the L.E.A.A. Graduate Research Fellowship Program. iii TABLE OF CONTENTS LIST OF TABLES . . . . . . . . . . . LIST OF FIGURES . . . . . . . . . . LIST OF APPENDICES . . . . . Chapter I. INTRODUCTION . . . . . . . . . Purpose of The Study . . . . . . Need for the Study . . . . . . II. REVIEW OF THE LITERATURE . . . . . Introduction . . . . . . . . Historical Evaluation . . . . . Field Lie Detection: Procedures . . Relevant-Irrelevant Technique . . Control-Question Technique . . . Peak of Tension Testing . . . . Evaluation of Polygraphic Records Discussion and Summary of Field Procedures . . . . . . . . Laboratory Lie-Detection: Procedures The Validity of Lie-Detection . . . Field Procedures . . . . . . Laboratory Procedures . . Comparison Of The Validity Of Field To Laboratory Lie Detection . . . Deception Indices . . . . . . Level of Subject Affect . . . . Lie-Detection Equipment . . . . Use of Control Questions . . . The Role of Lying . . . . . . Scoring ReSponse Data . . . . The Reliability of Lie-Detection . . Laboratory Studies . . . . . Field Studies . . . . . . . Discussion . . . . . . . . . Summary . . . . . . . . . . iv Page vii ix xi 15 15 16 19 19 23 32 35 39 4O 42 42 50 54 54 56 59 60 63 64 65 66 71 74 81 Chapter III. IV. V. METHOD . . . . . General Considerations . . . Source of Polygraphic Data Examination Procedure Sampling Considerations Population . . . . Procedure . . . . Sample . . . Criteria for Record Sets Characteristics of Subjects Characteristics of Record Sets Procedure . . . The Polygraphic Record Sets . The Evaluators . . . . . Operational Measures Hypotheses . . . . . Accuracy Scores . . . . . Confidence Scores . . . Ease of Interpretability Scores Design and Analysis . RESULTS . . . . . . . . Accuracy of Judgments . Hypotheses . . . . . . Collective Accuracy Effect of Additional Physiological Data . . Reliability of Judgments . . . Confidence in Judgments . . . Hypotheses . . . . Confidence Ratings and Accuracy of Judgments . . Ease- Of— Interpretability of Record Sets . . . . . . . . . Hypotheses . . Total Ease- -of- -Interpretability Ratings and Accuracy Ease— —of- Interpretability of Individual Physiological Components . . Numerical Evaluation . . . . Accuracy . . . . . . . Reliability . . . . . DISCUSSION . . . . . . Accuracy of Judgments . . Page 83 83 83 84 91 91 92 95 96 97 99 99 99 104 108 113 113 114 116 117 121 121 123 128 129 132 134 135 139 141 141 143 145 154 154 158 161 161 Chapter Page Reliability of Judgments . . . . . . 166 Confidence in Judgments . . . . . 173 Base of Interpretability of Record Sets . . . . . . . . . 175 Numerical Evaluation . . . . . . . 180 Summary . . . . . . . . . . . 183 APPENDICES . . . . . . . . . . . . . 184 BIBLIOGRAPHY . . . . . . . . . . . . . 209 vi LIST OF TABLES Table 3.l—-Background Characteristics of Subjects . . . 3.2--Characteristics of Record Sets . . . . . 3.3--Background Characteristics of Evaluators . . 4.1--Accuracy Scores of Individual Evaluators . . 4.2--Accuracy on Record Sets in Verified and Unverified Categories . . . . . . . . 4.3--Accuracy on Record Sets in Truthful and Deceptive Categories . . . . . . . . . 4. 4--Accuracy on Record Sets Classified by Type of Crime . . . . . . . . . . . . 4.5--Accuracy of Collective Judgments of Evaluators . . . . . . . . . . . 4.6--Accuracy of Judgments Based on Number of Respiration Components Recorded . . . . . 4.7--Accuracy of Judgments Based on Number of Control Question Tests in Record Sets . . . 4.8--Percentage of Agreements in Paired Judgments of Evaluators . . . . . . . . . . . 4.9--Mean Confidence Scores on Verified and Unverified Record Sets . . . . . . . . 4.10--Mean Confidence Scores on Record Sets Classified as Truthful and Deceptive . . 4.11--Mean Confidence Scores on Record Sets Classified by Type of Crime . . . . . 4.12—-Mean Confidence Ratings of Evaluators' Judgments . . . . . . . . . . . . 4.13--Ana1ysis of Variance Table for Mean Confidence Ratings on Correct and Incorrect Judgments . vii Page 98 100 107 122 124 125 126 129 131 131 133 136 136 137 139 140 1-- load P..- IAV 4.17-- 1-. l "r t.‘. 4'21‘- 4.22-. 4.23.. Table 4.14--Mean Total Ease-of—Interpretability Ratings of Evaluators' Judgments . . . . . 4.15--Analysis of Variance Table for Mean Total Ease-of—Interpretability Ratings on Correct and Incorrect Judgments . . . . . 4.16--Mean Ease-of-Interpretability Ratings of The Three Physiological Components on All Record sets 0 O O O O O O I O O I O 4.17--Mean Ease-of—Interpretability Ratings of The Three Physiological Components on Correct and Incorrect Judgments . . . . . . . 4.18--Average Percent Accuracy of Evaluators' Judgments Based on Numerical Scores . 4.19--Percent Accuracy of Individual Evaluators Based on Numerical Scores (Excluding Incon- clusives) . . . . . . . . . . 4.20—-Average Percent Accuracy on Verified and Unverified Record Sets Based on Numerical Scores (Excluding Inconclusives) . . 4.21--Correlations of Combined Scores . . . 4.22--Corre1ations of Respiration Scores . . 4.23--Corre1ations of GSR Scores . . . . 4.24--Corre1ations of Cardio Scores . . . 4.25--Comparison of Mean Correlations of Numerical Scores of Verified to Unverified Record sets 0 O O O O O O O O O O 0 viii Page 144 146 146 147 156 156 157 159 159 159 160 160 nua FI.‘ 4.1" i ‘5 ML" 4.3-- 4.4.- I ‘Iv lit It)! 1,9. 1.7.. LIST OF FIGURES Figure 3.1--Stratification Matrix . . . . . . . . . 3.2--Dummy Data Matrix: 2.2x2x2 Split-plot . . . 3.3-—Dummy Data Matrix: 2.2 Split-plot . . . . . 4.1--Mean percent correct judgments on record sets in the truthful and deceptive categories in the two crime classifications . . . . . . 4.2--Mean percent correct judgments on deceptive and truthful crime against a person and property crime record sets for the verified and unveri- fied conditions . . . . . . . . . . . 4.3--Mean confidence scores on record sets in the deceptive and truthful categories for the veri- fied and unverified conditions . . . . . . 4.4--Mean total ease-of—interpretability scores for record sets in the truthful and deceptive cate- gories for the verified and unverified condi- tions 0 O O O O O O O O O O O O O 4.5--Mean respiration ease-of—interpretability scores for record sets in both crime classifications for the verified and unverified conditions . . . 4.6--Mean respiration ease-of-interpretability scores on record sets in the truthful and deceptive categories for the verified and unverified conditions . . . . . . . . . . . . 4.7--Mean respiration ease-of-interpretability scores for record sets in the two crime classifications in the truthful and deceptive categories . . 4.8--Mean respiration ease-of—interpretability scores for high and low experience evaluators on record sets in the verified and unverified conditions. 4.9--Mean GSR ease-of—interpretability scores on record sets in the truthful and deceptive categories for the verified and unverified conditions . . . ix Page 93 118 119 127 128 138 144 149 150 151 151 152 Figure Page 4.10--Mean GSR ease-of—interpretability scores on deceptive and truthful crime against a person and prOperty crime record sets for the verified and unverified conditions . . . . . 153 "E A--Nt B--Ir C--S; D—-Re E-v‘r. F--Cc Ol’ LIST OF APPENDICES Appendix Page A--Number of Folders Assigned to Stratification Levels . . . . . . . . . . . . . . 185 B--Instructions to Evaluators . . . . . . . 187 C--Specimen Copies of Evaluator Answer Sheets . . 191 D—-Resu1ts of Individual Evaluators' Judgments . 194 E--Ana1ysis of Variance Tables . . . . . . . 198 F--Corre1ations of Evaluators' Numerical Scores on Verified and Unverified Record Sets . . . 205 xi .Le a that i EC! t 6 who *1 int 1“. quest F. v . _ . . I. f. «H» 2.. r. us Pry \ NJ .j A u it hv \ C a CC «5 . a «.1» t r. . av A] My CL 5.. u .. . xxu \ 4 1. pl. 1‘ l #2,. H. ~ CC I J \I .51 n6 4. ‘ wag l K|U CHAPTER I INTRODUCTION It has long been known that under certain conditions lying is accompanied by changes in heart rate, blood pres- sure, breathing, and electrical conductivity of the skin.1 For the most part these responses are under the control of the autonomic nervous system, and to a lesser degree, some- what under voluntary control. Although the responses associated with lying are also characteristic of arousal, anxiety, stress, etc., it is possible that discernable patterns of physiological response to apprOpriately framed questions within a structured setting do make possible discrimination between persons telling the truth and persons lying. Such discrimination based upon recorded physiological data forms the basis for the procedure pOpularly known as "lie detection." lV.Benussi, "Die Atmungssyptome der Lage" ("On the Effects of Lying on Changes in Respiration"), Arch. ffi£_gig Gesamte Psychologie, 31 (1914), 244-273; H. Burtt, "The InSpiratISn-Expiiation Ratio During Truth and Falsehood," J. Exp. Psych., 4 (1921), 1-23; N. Chappell and N. Matthew, 1rBlood Pressure Changes in Deception," Arch. Psych., 17 (1929), 1-39: F. Peterson and C. Jung, "Psych0physical Investigations with the Galvanometer and Pneumograph in,Norma1 and Insane Individuals, Brain, 30 (1907), 153-218; W. Marston, "Systolic Blood Pressure Symptoms of Deception," J. Exp. Psych., 2 (1917), 117-163; H. Munsterberg, On the Witness Stand (New York: Doubleday, 1908), 118—133. l p 7:53" C vVvaeb .‘ A“ .n J” . aX. ‘ h I r '7‘ ~ 1., 04“; ‘. I, a I ‘. ".‘y ’. d: b..e Actually, lie detection in one form or another has been used to determine the truthfulness of criminal suSpects for at least the past fifty years.2 And, while within recent years there has been a marked proliferation of persons who practice it for both law enforcement and commercial purposes,3 surprisingly little is known about the validity of the procedure or the reliability of decisions made on the basis of it. There are several reasons for this lack of informa- tion. First, polygraph examiners themselves have not been particularly prone to offer proof of the efficacy of their work.4 Second, research concerning the validity and relia- bility of real-life (field) lie detection has been hampered by the lack of an acceptable "ground truth" criterion for lP. Trovillo, "A History of Lie Detection," J. Crim. Law, Crim., 29 (1939), 848-881 and 30 (1939), 104-119; W. Marston, The Lie Detector Test (New York: Richard K. Smith, 1938); J. Larson, Lying and Its Detection (Chicago: Univ. of Chicago Press, 1932. Reprinted, Montclair, N.J.: Patterson Smith, 1969). 3See: N. Ansley (Ed.), "Actions of the Board of Directors, January 18-20," American Polygraph Association Newsletter, No. l (Dec.-Jan., 1974), 10; R. Paterson, "The Future of Polygraph in Industrial Security," American Polygraph Association Newsletter, No. 8 (Sept. 1972), 1-3. 4Much of this research reported by examiners has been criticized on methodological and other grounds. See: R. Sternbach, L. Gustafson and R. Colier, "Don't Trust the Lie Detectori" Harv. Bus. Rev., 40 (1962), 127-134; J. Orlansky, An AssessmeHE_of_er Detection Capability (De- classified—Version), TecHT Rep. 62-16 (Arlington, Va.: Inst. for Defense Analyses, Res. and Eng. Support Div., July, 1964, 6-18. validity-studies, and the lack of standardized testing procedures for reliability studies.5 Third, the bulk of research in lie—detection has been done in the laboratory where adequate control over data—collection, ground-truth criteria, etc. is possible, but where results do not necessarily pertain to conditions outside the laboratory.6 Finally, field lie—detection is essentially an empirically develOped procedure with a minimal theoretical foundation; it is an art, not a science. Within recent years research in lie-detection has received considerable attention from field practitioners and, within the scientific community, psychologists and psychOphysiologists. The major thrust of field research has been toward validation and improvement of current prac- tices; that of scientific research has been to uncover the precise physiological, and particularly psychological, mechanisms which make lie-detection feasible. In spite of this split in direction of research, there is wide agreement that lie-detection works.7 Exactly how well it works in the field, how valid and how reliable its indications of truth and deception are, these are questions provocative of a 5M. Orne, R. Thackray and D. Paskewitz, "On the Detection of Deception: A Model for the Study of the Physiolo- gical Effects of Psychological Stimuli," Handbook of Psycho- physiology, N. Greenfield and R. Sternbach (Eds.) (New York: Holt, Rinehart and Winston, 1972), 743-785. 6M. Orne, "Implications of Laboratory Research for the Detection of Deception," Polygraph, 2 (1973), 169-199. 71bid., 177. health; Can the l f nt‘ b C) w- H - (I) (J) ) H; healthy skepticism among field examiners and researchers. Can the judgment of a polygraphic examiner be an accurate reflection of a person's truthfulness or deception? And will two examiners, or the same examiner at two different times, interpret the same set of polygraphic recordings in the same way? Purpose of The Study The primary purpose of this study was to determine the "accuracy" and reliability of judgments made by trained polygraphic examiners; the technique used was blind analysis of polygraphic recordings obtained in field settings. In blind analysis judgments of truth telling and lying are made on the basis of polygraphic records exclusively; not con- sidered are such aSpects as behavioral cues of a person undergoing examination, investigators' reports and Opinions, consideration of age, sex, race and other personal charac- teristics, or the examiner's intuitive response to the person being examined. Such sources of information are commonly believed to contribute to the validity and relia- bility of lie-detection.8 However, as recent research suggestsg, current testing procedures which include indivi- dually distinct response patterns tc control questions, make 8See: J. Reid and R. Arther, "Behavior Symptoms of Lie Detector Subjects," J. Crim. Law, Crim., and Pol. Sci., 44 (1953), 104-108; F. Harvath, "Verbal and Nonverbal Clues to Truth and Deception During Polygraph Examinations," J. Pol. Sci. and Adm., l (1973), 138—152. 9This research discussed in detail in the next chapter. ‘— info: to S‘. eras: 93311 '3 St lie-detection relatively independent of outside sources of information. That is, control-question testing is believed to standardize lie-detection so that judgments made by an examiner in actual testing and by trained, independent evaluators of the polygraphic recordings thus obtained, are in substantial agreement. This study incorporated several design character- istics which distinguish it from previous research. First, it dealt exclusively with judgments made by polygraphic examiners (evaluators) employed by law-enforcement agencies. The only prior research having some bearing on this issue was reported by Holmes, who, unfortunately, did not report his data in sufficient detail to allow for valid generali- 10 Other research was concerned with the judgments zations. of polygraphic examiners employed by a commercial agency. These examiners received more initial training in lie- detection theory and practice and had higher educational attainment than most examiners employed for law enforcement purposes;11 it is likely that their training and education limit generalization of their results to examiners having similar backgrounds. loW. Holmes, "The Degree of Objectivity in Chart Interpretation," Academy Lectures on Lie Detection, Vol. II, V. Leonard (Ed.) ISpringfIE1d, 111.: C.C Thomas, 1958), 62-70. 11F. Horvath and J. Reid, "The Reliability of Poly- graph Examiner Diagnosis of Truth and Deception," J. Crim. Law, Crim. and Pol. Sci., 63 (1971), 276-281; F. Hunter and P. Ash,“Th§_Ac53§acy—End Consistency of Polygraph Examiner's Diagnoses," J. Pol. Sci. and_§§m., 1 (1973), 370-375. I T 4d l s .0: th :ezinl ices n ralié' ent ‘11: IF!‘ Jed» 6° C'Il L 5 e Second, judgments were made by evaluators of poly- graphic recordings drawn from both verified and unverified investigations. Previous research utilized recordings drawn only from verified investigations, using corroborated con- fessions as the criteria of verification; accuracy of judg- ments was then assessed in terms of agreement with the criteria. In this study, however, while accuracy was Similarly defined for judgments made on verified recordings, for those made on unverified recordings it was defined as agreement with the testing examiner's judgment. Such a definition, of course, has serious disadvantages since it does not allow for any conclusions to be drawn about the validity of judgments, but it is a useful definition for estimating the contribution which the polygraphic recordings themselves make to lie detection. It is clear that the use of only verified records may considerably bias research. For instance, it has been suggested that persons presumed by examiners to be liars (prior to testing) may undergo examinations somewhat differ- ent from those presumed to be truth-tellers.12 Using similar reasoning one could conclude that persons involved in investigations which are eventually "verified" by con- fession might undergo examinations differing from those not so verified; factual information, behavioral characteristics, etc. might provide more, or "better" clues in the verified lerne, "Implications of Laboratory Research for the Detection of Deception," op. cit., 176. im’e st oven . VSCTVN “'vv- d CARA» 'VMC“I TL 9) L. 0) IL, (u rfi (‘ > Ht '3 x ‘l N.~ H* s w.»- ‘s TA‘ I‘le investigations; or, perhaps, the resulting polygraphic records, for some reason, might be of a better quality to the advantage of independent evaluation. Furthermore, the need for evalua- ting the judgments made of both verified and unverified records is readily apparent when one considers the fact that only a small prOportion of all polygraphic examinations are 13 Findings based only upon verified by any means at all. verified records are not necessarily applicable to the unverified situations. Third, the nature of the investigation from which recordings were drawn was incorporated in the design of the study. That is, recordings were drawn from investigations concerning crimes against a person and property crimes.14 It is apparent when considering these two categories of crimes that an examiner usually has access to more detailed factual information in the former; a victim of an armed robbery, for instance, is usually capable of relating precise details of the crime, and in some cases, of identifying a suSpect. Such detailed information prov;des a firmer basis for formulating appropriate test questions which, as field l3F. Inbau and J. Reid, Lie Detection and Criminal Interrogation (Baltimore: Williams and Wilkifis, 1953), 110- 113. 14The criterion used for classification of crimes was the presumed nature of involvement of the victim; direct in- volvement, such as in rape, murder, armed robbery, assault, and indecent (sexual) liberties, led to classification as "crimes against a person." On the other hand, crimes such as breaking and entering (burglary) arson, larceny, malicious destruction of prOperty, and embezzlement, when victim in- volvement is less apparent, were classified "property crimes". examine physiol Consequ investi accurat than t} that e\ lilygre accural “'10 ha: iiffere that t} examiners are well aware, are important determinants of physiological responsiveness during polygraphic examinations.15 Consequently, it can be suggested that recordings drawn from investigations involving crimes against a person may be more accurately judged than those involving prOperty crimes. Fourth, although previous research suggests that the ability to interpret polygraphic records is a function of experience, it is not clear if such a finding would pertain if experience were defined in a manner somewhat different than that reported. For instance, Horvath and Reid found .that evaluators with less than six months of experience (in polygraph testing), and still undergoing training, were less accurate and consistent in their judgments than evaluators who had completed their training.16 Certainly, such a difference is reasonable since one would not anticipate that the untrained evaluators would do as well as the other group. Hence, in this study experience levels were defined in a more meaningful manner, although it was anticipated that evaluators with more experience would be more accurate than those with less experience. 15See: J. Reid and F. Inbau, Truth and Deception, The Polygraph ("Lie Detector") Technique (Baltimore: Williams and W1lk1ns, 1966), 1621; R. Arther, "Crime Question Wording," J. Polygraph Studies, 4 (Sept.-Oct., 1969), 1-4. l6Horvath and Reid, "The Reliability of Polygraph Examiner Diagnosis of Truth and Deception," 9p. cit., 278-279. lawmatafi-i- a. .4?! fl «loath—gm I 004 Fifth, the recordings used in this study constituted a random sample of a pre-defined population. This is in contrast to previous research dealing only with recordings chosen in accordance with some arbitrary criterion and which, moveover, substantially controlled the nature of the interaction between the examiner and examinee (subject). For instance, Horvath and Reid reported results obtained when evaluators judged recordings selected because they were believed to require sufficient skill to interpret. Moreover, the recordings used by Horvath and Reid were obtained from subjects who were tested by only one examiner. The use of such recordings at least partially controls for the nature of the interaction between the examiner and subject, inter- action believed to have an affect on the nature of the re- cordings obtained.17 It is not known if, when such interaction is not controlled, judgments of independent evaluators would be accurate and in substantial agreement with the testing examiner. But it is clear that prOponents of control- question testing maintain that such would be the case.18 A second purpose of this study was to employ several devices used in experimental "lie detection" studies but l7Orne, "Implications of Laboratory Research for the Detection of Deception." 9p. cit., 175-177. 18Horvath and Reid, "The Reliability of Polygraph Examiner Diagnosis of Truth and Deception," pp. 913., 281; Hunter and Ash, "The Accuracy and Consistency of Polygraph Examiner's Diagnosis," pp. 313., 375. '01: U .l irgs « Q he cl 5 U the r H‘- " I. ’1 5“ l; ’2" r w ~.~ . s i V... a .. QC «3 u .. Y. e I l t .Alu «C .II :1 & hrUt -‘ l R Q\ A .- . a e r E 10 not used at all in studies dealing with polygraphic record- ings obtained from field settings. First, evaluators rated the degree of confidence in their judgments. Second, evaluators indicated the "ease-of—interpretability" of each of the three basic physiological measures used in field lie-detection. And, finally, evaluators judged a sub-sample of recordings in accordance with a numerical scoring system, the reliability of which, although developed by a field examiner, has not been reported in the literature dealing with evaluations of field-derived recordings. The confidence scale used in this study was similar to that employed by Kubis19 and Moroney,20 both of whom reported similar results: independent evaluators had "greater confidence in those decisions ultimately verified as correct 21 The scale than they did in those which were incorrect." in the present study was used to determine if confidence ratings were higher for experienced evaluators than for inexperienced; if such ratings varied depending upon the particular category from which polygraph recordings were 19J. Kubis, Studies ip Lie Detection: Computer Feasi- bility Considerations, Tech. Report 62-205 (Arlington, Va.: Armed Services Technical Information Agency, June, 1962), prepared for Air Force Systems Command, Contract No. AF 30 (602)-22700, Project No. 8834, Fordham University, 1962, 146. 20 . . W. Moroney, "The Detection of Decept1on as a Function of PGR Methodology" (unpublished Ph.D. dissertation, St. John's Upgvi, 1968, Ann Arbor, Mich.: Univ. Microfilms, 1969, No. 69- 5 . 21 . . . . . . . Kub1s, Stud1es Jp Lie Detect1on: Computer Feasi- b111tyConsiderations, 9p. cit., 68. 11 drawn; and, if, as Kubis and Moroney found, greater confidence would be indicated in correct than in incorrect judgments . The scale used in this study dealing with the "ease-of interpretability" of the various physiological measures was sim- ilar to that reported by Kubis .22 The purpose of the scale was to determine if more experienced evaluators judged recordings easier to interpret than less experienced; if ease-of-interpre- tability ratings varied dependingrnxnuthe particular category from which recordings were drawn; and,:Lf,as Kubis found, that records on which correct judgments were made were easier to interpret than those judged incorrectly.23 Kubis reported that the psychogalvanic response (GSR) was judged easier to interpret than either reSpiratory or cardiovascular measures. It is difficult to predict that such a result would pertain in evaluations of recordings obtained from field settings, although such an expectation seems reasonable, primarily because of the simple wave-form of the GSR. However, most field-examiners disclaim the value of GSR and give precedence to respiratory and cardio- vascular activity;24 hence, it is possible that either of 221bid., 146. 231bid.,.71. 24Throughout this paper terms of convenience are used to identify the physiological parameters recorded by the poly- graph instrument. Cardiovascular activity or "card1o" refers to what is commonly termed the "blood-pressure-pulse rate," primarily a measure of complex interaction between blood pres- sure and volumetric changes; respiration refers to changes 1n breathing rate and volume; galvanic skin response (GSR) and electrodermal activity are used interchangably, typically measures of the Skin Resistance Response. e t‘ atio: in; Sj only i i: anj attai: LhiS I of 11. t . ~ “LL Stu " 12 the two latter measures would be judged easier to inter- pret than GSR because of the particular training and orien- tation of field examiners. Evaluators in this study analyzed a sub-sample of polygraphic recordings in accordance with a numerical scor- ing system, the reliability of which has been reported in only one study.25 Such a system, however, has not been used in any reported study dealing with polygraphic recordings obtained from field settings. Hence, it was of interest in this research to explore the overall reliability of the numerical scoring system and to determine which of the various physiological measures was most reliably evaluated by trained field-examiners. Need for the Study Orlansky, in his assessment of the state of the "art" of lie-detection reported that: Except for Kubis (1962) no one has explored the possibility that two examiners working inde- pendently might make different interpretations of the same record. Reliability of the polygraph in the sense of consistency of measurement, i.e., 6 agreement among examiners, is an unknown quantity. Since Orlansky's report there have been only two Studies conducted to determine the accuracy and reliability ‘ 25G. Barland, "The Reliability of Polygraph Chart Evaluations" (paper presented at The American Polygraph lfissociation Seminar, August 15, 1972, Chicago, Ill.). . 26Orlansky, Ag Assessment 9f Lie Detection Capa- Eility, 9p. cit., 8. of set‘ 131'“ IO‘J‘ 83’ 1' 6X3: 13 of "blind" judgments made on data obtained from field settings. Although there have been other such studies involving experimental lie detection, none of them can be routinely generalized as pertinent to the field situation. Both of the field studies were based on judgments made by examiners trained in the same manner and not employed by a public law enforcement agency. Hence, in spite of these studies we still do not know if polygraphic examiners, trained in a somewhat different manner and engaged in lie- detection specifically for police purposes, can achieve high reliability in their decisions. An answer to this question would not only extend our knowledge about lie- detection, but bare implications for our Criminal Justice System, particularly the courts, as well. During the past fifty years only one of the reported federal and state court decisions considering the question admitted unstipulated polygraphic examination results as evidence. The reasons for this exclusionary policy were essentially that the polygraphic technique lacked reliability and a "general acceptancelin the particular field in which 27 Recently, however, there have been several it belongs. court decisions indicating a trend to wider judicial accep- tance of the technique. Altarescu has published an excellent discussion of these decisions and the problems remaining for 27The "general acceptance" test concerning polygraph admissibility was set out in Frye v. United States, 293 F. 1013 (D.C. Cir. 1923). the p! In -u t. I A L gout" ‘ QIO 11"] C A?" C Q ’1’. :.3V 14 the polygraphic field itself if the trend continues.28 Predictably, one of these problems concerns the reliability of the technique, especially in regard to examiners who vary in experience, qualifications, and particular technique employed. It is hoped that this study will ‘rovide a firmer ground for answers to questions concernin) the polygraphic technique which have for so long troubled our courts. More- over, as the study deals directly with reliability of poly- graphic examiners employed by police agencies, the results should have a more direct impact on the judiciary than previous studies. 28H. Altarescu, "Problems Remaining for the 'Gener- ally Accepted' Polygraph," reprinted from: Boston Univ. Law Review, 53 (March, 1973), 375-405. CHAPTER II REVIEW OF THE LITEPXTURE Introduction Essentially the literature dealing with lie—detection can be identified as that written by field practitioners and that written by laboratory researchers. Literature in the former category usually consists of descriptions of proce- dures, instrumentation and some research bearing on the efficacy of these items. On the other hand, reports of laboratory researchers most often are concerned with deter- mining how well and under what conditions lie-detection is possible; that is, what precise physiological and psycho- logical mechanisms contribute most to the detection of deception. Because both goals and methods of these two approaches differ, the literatures will be dealt with separately, considering first procedural differences. The relatively detailed discussion of field procedures will not only provide a more thorough base for assessment of labora- tory procedures, but will also clarify points to be made in discussion of the validity and reliability of lie- detection. But, first, a historical review of lie-detection is in order. 15 gra' . so: are: 3 ul «b .44 a..- . 1 FL; pin a: vi a .. Q... r (A H...» a» t ed .11 3. an t c A u . a... .. at EU a» vi .6 an i. Li. A e as o. . ad a... . i .r . j -l a "v o l \— F. N . ‘uu. FLA. ~ N d .8 .- I P¥ ~ \ \Va Q! . . § Ffl 16 Historical Evaluation There is no need to discuss in depth the early history of lie-detection procedures and the development of the poly- graph instrument, as there are already available excellent accounts dealing with this topic.1 The purpose of the following brief review of this area is simply to put this chapter into perSpective. Historically, the most dramatic attempts at lie- detection relied upon "ordeals" such as hot irons on the tongue of suspects to be protected by their innocence or burned by their guilt. Also described in the literature are relatively objective procedures, such as careful obser- vation of a suspect's behavioral characteristics or changes in pulse rate when under interrogation. It was not until about 1895, however, when Cesare Lombroso, an Italian physiologist, and his student, Mosso, used the hydrOSphyg- mograph and the "scientific cradle", that objective measure- ment of physiological changes became associated with the detection of deception.2 Following Lombroso and Mosso, other investigators took note of physiological changes associated with deception. In 1908 Munsterberg made reference to the effect of lying on lSee: P. Trovillo, "A History of Lie Detection," J. Crim. Law and Crim., 29 (1939), 848-881 and 30 (1939), 104- 119; J. Larson, Lying and Its Detection (Chicago: Univ. Chicago Press, 1932, reprinted, Montclair, N.J.: Patterson Smith, 1969); C. Lee, The Instrumental Detection 9f Deception (Spring- field, Ill.: C.C Thomas, 1953). 2Trovillo, "A History of Lie Detection," 9p. cit., 858. van‘v .4- :J «(I Burt . ‘_ p5... Alu‘ find sv~s I 43" hr r‘ U). G 17 breathing, cardiovascular activity, involuntary movements, 3 In 1914, Benussi and the galvanic skin response (GSR). conducted a series of experiments in which he found a rela- tionship between the inspiration-expiration ratio in breath- ing and deception.4 His findings were later confirmed by Burtt who added that systolic blood pressure was yet more indicative of deception than respiration.5 Marston's findings agreed with Burtt's that discontinuous measures of systolic blood pressure were superior to either respiration or GSR for detecting deception.6 Larson modified Marston's blood pressure test and developed an instrument and proce- dure for making continuous recordings of both blood pressure- pulse rate and respiration.7 Keeler, generally credited with developing the prototype of the polygraph instrument now i used in most field settings, further refined Larson's appar- atus to which he added a device for measuring electrodermal activity.8 3H. Munsterberg, 92 The Witness Stand (New York: Doubleday, 1908), 118-133. 4V. Benussi, "Die Atmungssymptome der Lage" ("On The Effects of Lying on Changes in Respiration"), Arch. ffir Die Gestamte ngchologie, 31 (1914), 244-273, cited by TfaViII5r. "A History of Lie Detection," 9p. gig., 870. 5H. Burtt, "The Inspiration-Expiration Ratio During Truth and Falsehood," J. Exp. Psych., 4 (1921), 1-23; see also, H. Burtt, "FurthEr Technique For Inspiration-Expiration Ratios," J. Exp. Psych., 4 (1921), 106-110. 6W. Marston, "Systolic Blood Pressure Symptoms of Deception," J. Exp. Psych., 2 (1917), 117-163. 7J. Larson, "Modification of The Marston Deception Test," J. Amer. Inst. Crim. Law and Crim., 12 (1921), 390-399. 8L. Keeler, "A Method For Detecting Deception," Amer. J. P01. SCio' l (1930), 38-52. 18 The discussion up to this point should not be taken as an indication that respiration, cardiovascular activity, and GSR are the only physiological processes which have been associated with deception. Limited success at detecting deception has also been accomplished by measurement of other physiological activity, such as: hand tremors,9 electro- 12 encephalic activity,lo pupil dilation,ll oculomotor activity, voice modulation,13 oxygenation of the vascular system,14 and 9A. Luria, "The Union of the Motor Method and the Investigation of the Affective Reaction," State Inst. of Exp. Psych. (Moscos, 1928); "Die Methode der Abbildenden Motorik und ihre Anwendung an die Affekt-Psychologie, Psychol-Forschung, Band 12, 1929; Examination and Psychical Reactions (1930); The Nature of Human Conflicts, Horsley Gannt (Trans. and Ed.), —I_ 1932, ciEed by Trovillo, "A History of Lie Detection," 9p. pip., 114, note 124. 10C. Oberman, "The Effect on the Berger Rhythm of Mild Affective States,” J. Abn. and Soc. Psych., 34 (1939), 84-95. 11F. Berrien and G. Huntington, "An Exploratory Study of Pupillary Responses During Deception," J. Exp. Psych., 32 (1943), 443-449. 12F. Berrien, "Ocular Stability in Deception," J. App. Psych., 26 (1942), 55-63; F. Berrien, "Possibilities in The Use of The Opthalmograph as a Supplement to Existing Indices of Deception," Psych. Bulletin, 37 (1940), 507; D. Ellson, R. Davis, I. Saltzman and C. Burke, A Report pf Research pp Detection pf Deception (Tech. Report prepared for Office of Naval Research, Contract N6onr-18011, Indiana Univ., 1952). l3M. Alpert, R. Kurtzberg, and A. Friedhoff, "Trans- ient Voice Changes Associated with Emotional Stimuli," Arch. Gen. Psych., 8 (1963), 362-365; P. Fay and W. Middleton, "The Ability to Judge Truth-Telling or Lying From the Voice Trans- mitted over a Public Address System," J. Gen. Psych., 24 (1941), 211-215. 14H. Dana, "It is Time to Improve the Polygraph: A Progress Report on Polygraph Research and Development," Academy Lectures on Lie Detection, II, V. Leonard (Ed.), (Springfield, 1117? C.C Thomas, 1957), 84-90; H. Dana and C. Barnett, "The Emotional Stress Meter," Academy Lectures pp :1 l on scu St! .311: WW.“ .urJ 0 5V 5; i f? .d‘ :bFf-x-a ELVJG NF. 1 . to 5.. x. ...«.4 ‘t \a Vi“ «U 19 covert muscular movements.15 But what is now fairly well agreed upon by field examiners is that any attempt at detecting decep- tion mustlxamade witheniinstrument that records both cardio- 16 It is infact illegalin vascular and respiratory activity . some states for a "detection of deception" examiner to use an instrumentrmn:capab1e ofrecording these two parameters, al- though others, particularly electrodermal activitymare also commonly recorded in conjunction with them.17 Field Lie Detection: Procedures There are two major field lie-detection procedures in use today, the relevant-irrelevant (R-I) and the control- question (CQ) techniques. In this section a discussion of these techniques will be made in some detail, to aid in an understanding of the literature concerning the validity and reliability of lie-detection. Relevant-Irrelevant Technique It is clear from the literature on field lie-detection that many of the early practitioners considered the primary Lie Detection (Springfield, Ill.:C.C Thomas, 1957), 73-83; R. Thackray andM. Orne, "A Comparison of Physiological Indices in Detection of Deception," PsychOphysiology, 4(1968),329-339. 15J. Reid, "Simulated Blood Pressure Responses in Lie Detector Tests and a Method for Their Detection," J. Crim. Law and Crim., 36 (1945), 201-214. 16N. Ansley (Ed.), "Inquiry Regarding Dektor PSE-l," American Polygraph Association Newsletter, Number 3 (March, 1972), 18. 17C. Romig, "The Status of Polygraph Legislation of the Fifty States," Part III, Police, 16 (1971), 58. 20 benefit of polygraphic testing to be that it enhanced their own ability to obtain confessions of guilt or admissions of 18 It is not surprising then lying from criminal suspects. that polygraphic testing and "interrogation" (intensive or accusatory questioning designed to secure a confession) were often considered identical, and perhaps inseparable, processes; that is, the two processes were blended or combin- ed in such a way that the psychological effect of the poly- graphic instrument and the consequent physiological record- ings could be maximized to secure confessions of guilt. The complete blending of interrogation and polygraphic testing characterizes the R-I technique.19 Pre—Test interview.--Simply stated, the R-I Technique is relatively unstructured, consisting of an interview, or per— haps intensive questioning, followed by or combined with poly- graphic testing. During the interview the examiner discusses with the subject background information relative to the in- vestigation at hand and exploits any hesitancy or uncertainty in the subject's answers to questions, he also observes the l8See: F. Inbau, Lie Detection and Criminal Inter- rogation (Baltimore: Williams and Wilkins, 1942), 54. 19The R-I Technique is considered outmoded by some leading examiners: See: C. Backster, "Lie Detection Comes of Age," Law and Order (undated, unpaginated reprint supplied by author); C. Backster, "Methods of Strengthening our Poly- graph Technique," Police, 6 (1962), 61-68. I‘Il th “v- be if; R. 21 subject's behavior in order to locate “sensitive areas" which may be useful in the testing. The examiner also explains the purpose of the testing and the nature of the polygraphic instrument, implying that it is futile for the subject to harbor any thoughts of "beating" the test. It is also the examiner's purpose during the interview to establish rapport with the subject and to become familiar with his language and personal history in order to assure that the test questions, which may or may not be reviewed prior to testing, will be effectively worded. The length of the interview is determined by the examiner according to his impression of the subject's emotional accessibility. A high-strung subject generally requires a lengthier interview in order to prepare him for testing; a relatively passive subject must be "aroused", and so forth. Polygraphic testing.--Polygrapnic testing in the R-I Technique generally consists of asking a series of questions relevant to the crime and interspersed between irrelevant, or non-critical questions; other types of questions such as those exposing a guilt complex may be asked at the discretion of the examiner. The precise nature, wording, and ordering of the test-questions is determined by the examiner as testing progresses, as is the length of any one test. Generally, however, generalized questions precede specific questions, an order believed helpful because it recapitulates the steps in commission of an offense. 22 The length of any given test, the asking of the 'relevant and irrelevant questions at least once in a series, is determined by the examiner and is dependent primarily upon the subject's ability to withstand the effects of the apparatus used for recording cardiovascular activity. With- in any given polygraphic examination, two R-I tests may be conducted before a determination of deception (or truthfulness) is made, although prOponents of the method feel that in most cases such adetermination can be made following one test. Pr0ponents of the R-I technique assume that truth- ful people will not differentially react to relevant and irrelevant questions, while peOple lying will. In other words, determinations of truth-telling and lying depend upon perceptible differences in physiological reSponse to the stimulus of non-critical and critical items. Moreover, during any given test or between any two tests such differ- ential reactions constitute cause for intensive questioning of the subject by the examiner. Proponents of this technique believe that "interrogation" for the purpose of securing a confession or admission of lying at any time during the pre-test interview or the testing is justified, if, in the examiner's judgment it seems warranted. Within the R-I tests, of course, there is usually no actual "control" against which responses to the relevant questions can be compared, at least no control similar to that advocated by prOponents of the CQ technique. 'The lack of such a control is believed to maketflmeR-I technique an NW 23 "interrogation" capitalizing on the psychological effect of the polygraphic instrument and recordings; R-I tests, then, for reasons to be further explained here are usually considered by proponents of the CQ technique inadequate for making decisions regardingaperson's truthfulness or deception based upon the polygraphic recordings exclusively.20 Control-Question Technique Many leading polygraph examiners today distinguish between "interrogation" and polygraphic testing. The major impetus of this change in approach was the "control question" 21 Since Reid's first as developed by John E. Reid in 1947. publication on this topic he and other practitioners have so refined the use of control questions and the procedure used for giving polygraphic tests that it is now believed that polygraphic testing and interrogation must be considered separately. That is, most proponents of the CO technique believe that polygraphic testing provides a substantially accurate means of determining a person's truthfulness or deception independent of "interrogation"; in fact, interroga- tion before or during the testing proper is believed detri- mental to testing.22 20The discussion concerning the R-I Technique was con- densed from: L. Harrelson, Keeler Polygraph Institute Training Guide (Chicago: Keeler Polygraph Institute, 1964). 21J. Reid, "A Revised Questioning Technique in Lie Detection Tests," J. Crim._Law and Crim., 37 (1947), 542-547. 22J. Reid and F. Inbau, Truth and Deception: The Pol raph ("Lie Detector") Technique (Baltimore: Williams and WiIkins, 1966), 177. A.» V. A b he arm AH~ ..4¢ 24 The C-Q technique consists of two distinct components: the pre-test interview and polygraphic testing . Although some examiners maintain that post-test interrogation is a third com- ponent,23 such a contention seems out of line with the notion that interrogation and polygraphic testing are separate phenonena. Pre-test Interview.--The pre-test interview as used by proponents of the CQ technique occurs prior to testing, when the examiner discusses with the subject the purpose of the examination, the nature of the polygraphic instrument, and, in general, seeks to prepare the subject for the test- ing. Unlike the interview used in the R-I technique, however, there is no intensive questioning on the issue at hand. More- over, during the interview the examiner makes it a point to review with the subject the exact test questions which will be asked, and the subject himself participates in the formu- lation of these questions. Such participation is considered essential to the functioning of the testing procedure, par- ticularly with respect to the control-questions. There are, of course, variations among examiners in the way a pre-test interview is conducted. Some examiners conduct a lengthy interview and acquire detailed background information, e.g., medical history, etc., while others do not.~ Some use specialized interview techniques to become 23G. Barland and D. Raskin, "The Use of Electrodermal Activity in the Detection of Deception," Prepublication c0py to appear in: W. Prokasy and D. Raskin (Eds.), Electrodermal Activity ip Psychological Research (New York: Academic Press, in press). .1 -h\..l L .14 I 4 I I 1 \ XI M O r .C {a o h a . f. u... a _ A: a. r 2 PH. LN“ e We ...u w: . new 5.... t z a u v.“ .r“ .31 "L .l. n c at .um r.” .n u .C L . r1 . a a... I c z. a. n... I ., .9... A: 5L 3. r... fit. p. 5 Lu 5. a u . . . . u. . PJ 3.‘ H Mia U; \fi V P. \ F. (1.- Pu.» a... 25 familiar with behavioral characteristics which may be helpful in making a diagnosis of truthfulness or deception. Some examiners spend a considerable amount of time explaining the nature of the polygraphic instrument, the way in which autonomic reSponses are used to detect deception, and the futility of trying to beat the test. More detailed informa- tion concerning variations in the pre-test interview can be 26 found in Reid and Inbau,24 Horvath,25 or Barland and Raskin. Polygraphic testing.--While there are differences between pre-test interviews in the R-I and CQ procedures, the essential difference between them lies in the nature of the questions asked during polygraphic testing and the manner in which response data are evaluated. During the CQ testing, three basic types of questions are asked: irrelevant, rele- vant, and control questions, although, as in the R-I technique, other question types may also be used.27 Irrelevant questions are those used for establishing "normal" or truth-telling patterns; they will deal with such matters as: "Do they call you Joe?" and, "Are you over 21 years of age?" Relevant 24Reid and Inbau, Truth and Deception: The Poly- graph ("Lie Detector") Technique, pp. cit., 10-16. 25F. Horvath, "Verbal and Nonverbal Clues to Truth and Deception During Polygraph Examinations," J. Pol. Sci. and Adm., 1 (1973), 138-152. 26Barland and Raskin, "The Use of Electrodermal Activity in the Detection of Deception," pp. cit., 5-8. 27Reid and Inbau, Truth and Deception: The Polygraph ("Lie Detector") Technique, pp. cit., 18; R. Arther, "The Guilt Complex Quest1on, J. Polygraph Studies, 4 (1969), 1-4. 26 questions are those which pertain to the matter under investi- gation, such as "Did you shoot John Doe?" and, "Did you fire the shots that killed John Doe?" Control questions are those growing out of interaction between the examiner and the sub- ject; in general they deal with matters similar to, but of presumed lesser significance than, the offense being investi- gated. While the interaction between the subject and the examiner determines the exact nature of these questions, an example in burglary-investigation might be: "Did you ever steal anything?" or, "Except for what you have already told me about, did you ever steal anything else?" The examiner seeks to frame these questions in such a way that the subject will answer "no" but will, in all probability, be lying or at least will have some doubt or concern about the truth— fulness or accuracy of his answer. After the formulation of all test questions and at the completion of the pre-test interview, polygraphic testing is conducted. In the polygraphic testing, the examiner asks the subject the previously reviewed irrelevant, relevant and control questions in a series of polygraphic tests. Each test generally consists of about ten or eleven questions, four irrelevant, two control, and four or five relevant questions, and will usually last about three minutes. All questions are asked once during one test, and at about twenty- second intervals. A complete examination consists of the repetition of several of these tests. It is generally agreed 27 that for an examiner to ascertain with any degree of accur- acy the deception or truthfulness of the subject's answer to a relevant test question, that question should be asked at least once on each of two separate tests; sometimes, four or five separate tests may be conducted before a determina- tion of deception is made.28 It might be helpful at this point to describe the testing sequence used by many of the proponents of the CQ procedure. Generally, immediately following the pre-test interview, the examiner conducts the first CQ test of 10 or 11 questions, previously reviewed. After this first test, a card (or "numbers") test, or some variation of such a test, is administered. The nature of the card test being fully explained elsewhere,29 its ostensible purpose is to demon- strate to the subject the efficacy of the "lie-detector"; actually, it is more prOperly considered one of the many "stimulation" devices or strategies used by examiners employ- ing the CQ procedure. Such strategies will be discussed later. Following the "card test" the examiner leaves the examination room for a short period, before doing so usually requesting the subject to think carefully about the test- questions while he is out of the room. Upon his return, he asks the subject if there are any questions which concern 28Reid and Inbau, Truth and Deception: The Poly- graph ("Lie Detector") Technique, pp. cit., 26-33. 29 Ibid., 27-28. . . '1' .0 I C7” '1 171 th. in 3L. 28 him more than others, or if there are any which the subject feels should be re-worded. If not, the examiner then tells the subject that another test will be conducted using the same questions asked in the first test, and in the same order; in other words, the third test is a replicate of the first. Upon completion of this third test, the examiner briefly reviews the accrued polygraphic recordings and de- cides if further testing is necessary. It is usually claimed that in some instances, reSponse data contained in the first two control question tests are sufficient to indicate the subject's truthfulness or deception.30 In the majority of instances, however, further testing is indicated and con- ducted via one or more of the specialized tests discussed below. Specialized tests.--l) Mixed Question Test. In most instances of additional testing the first test will be a "mixed question test." In this test the subject is asked the questions of the first two control-question tests but in a different order. The ordering of the questions is flexible, usually based upon the examiner's knowledge of the response-data observed in the prior tests.31 2) Silent Answer Test. A Specialized test which some examiners have recently incorporated as the fourth test 3oIbid., 30-37. 311bid., 30-32. 29 in the series (usually in the position where the mixed question test is placed) has been termed the "silent answer test". Its usefulness has been adequately described elsewhere.32 3) The "yes" or Affirmation Test. The "yes" or affirmation test is one in which the subject is instructed by the examiner to answer "yes" to all of the test questions (which, of course, are the same questions already asked on previous tests), including the relevant questions to which he had answered no" before. The purpose of the "yes" test is to ascertain whether or not the subject is engaging in deliberate attempts to distort his polygraphic recordings. Ordinarly the tracings (response data) obtained during the "yes" test are not interpreted in the same manner or for the same purpose as they are in the tests mentioned previously. The purpose and method of interpretation of the "yes test" is thoroughly discussed in Reid and Inbau.33 Stimulation procedures.--Pr0ponents of the CQ proce- dure have develOped various strategies to clarify response data; that is, these strategies are used not only to augment responsiveness to testing but, more importantly, to direct the subject's attention (or psychological set) to those test 32F. Horvath and J. Reid, "The Polygraph Silent Answer Test," J. Crim. Law, Crim., and Pol. Sci., 63 (1972), 285-293. 33Reid and Inbau, Truth and Deception: The Poly- graph ("Lie Detector") Technique, pp. Eit., 32. 30 questions which constitute the greatest threat to his well- being; presumably, for persons telling the truth these 1 strategies augment responses to control questions; for those lying, to relevant questions. Such strategies may take the form of specialized tests, e.g., the "card test", "silent answer test", etc., or, may consist of various forms of examiner-subject interaction. Regardless of which form they take, however, these strategies are considered to be much less direct than ordinary interrogational devices. For instance, when compared to direct questioning, implications by either verbal or nonverbal communication, concerning the subject's polygraphic records are considered to be much more effective and less apt to adversely affect polygraphic recordings, i.e., cause a person to respond beyond the normal to relevant test questions when he is telling the truth to them.‘ Perhaps an example would clarify this point. Assume that an examiner has conducted a series of three tests with a subject (CQ-Test One, a card test, and CQ Test Three -- a repetition of Test One) and feels that the reSponses are too ambiguous to permit accurate appraisal of the subject's truthfulness in answer to the relevant questions -- the responses to the control questions cannot be clearly differentiated from those to the relevant questions. In such an instance, the examiner may feel that a mixed-question test is warranted. Before conducting such a test he may ask the subject if any particular test questions concern him more 31 than the others; while doing so he implies that the testing is not "clear" at this point. Further he may tell the sub- ject that he would like to conduct another test but that before he does so, he wants to be certain that the subject clearly understands all of the test questions so far asked and is certain that he has answered all of them truthfully. The examiner may then carefully re-read all of the test questions, requesting answers as he does so. He then asks the subject something like: "Are you certain that you under- stand all of these questions?" "Is there any answer you have given that may not be the complete truth?" When the subject acknowledges he has answered the questions truthfully and that he understands all of them, the examiner explains how the next test is to be conducted, i.e., the same questions will be asked in a different order than they were asked on prior tests, and then proceeds with the testing. The various strategies used by examiners to "stimulate" subjects are too numerous to detail here. It should be noted, however, that the strategies are rather indirect in nature; they are not accusatory and do not usually make reference to particular test-questions, and most importantly, they pre- sumably make a significant contribution to the functioning of the CQ procedure.34 34J. Reid, "Stimulation Technique Outline," undated, unpublished manuscript supplied by J.E. Reid and Associates, Chicago; C. Klump, "Principles of Controlled Stimulation" (paper presented at American Academy of Polygraph Examiners, Eighth Annual Seminar, Washington, D.C., Sept., 1961). 32 While the general testing procedure outlined above is representative of that used by many field examiners employing the CQ procedure, there are other specialized tests and other variations of the procedures. Some of these variations con- cern the number of individual tests which will be conducted during an examination, the organization of the tests, the order of questions within tests, and the procedure followed by the examiner during the break between tests. For a more thorough discussion of these variations see Reid and Inbau,35 Barland and Raskin,36 or Backster.37 Regardless of the various administrations of the CQ test, its prOponents argue that control questions imbedded within the series provide a better tool for assessment of a person's truthfulness or deception to relevant issues than does the R-I procedure. The variations do not imply un- structured procedure, however, each variation being controlled by its particular rules for conducting examinations. Presumably, once informed of each others ' rules, examiners using the different procedures of examination can evaluate each other's results. Peak of Tension Testipg A type of testing infrequently encountered in field settings is the POT (peak of tension) test. Although the 35Reid and Inbau, Truth and Deception: The Polygraph ("Lie Detector") Technique, pp. cit., 10-36. 36Barland and Raskin, ”The Use of Electrodermal Activity in the Detection of Deception," pp. cit., 13-17. 37C. Backster, Standardized Polygraph Notepack and Technique Guide (New York: Backster Research Foundation, 1969). 33 principle behind this test is often relied on by proponents of both the R-1 and CQ procedures, especially in the order- ing of questions in the test series, the POT is not a standard part of either of these procedures. Arthur has termed the two general forms of the PCT tests, the "searching" test and the "known—solution" test.38 The searching POT consists in the asking of a series of similar questions, usu- ally, with Specific focus, such as to locate amurder weapon, etc. For example, a subject tested by control-question type testing may give the examiner reason to think that he is in fact implicated in a certain murder and further has hidden or discarded the murder weapon. Under these circumstances the searching POT test would include a series of questions such as: "Do you know if the gun used to kill John Jones is under water?", "Do you know if the gun used to kill John Jones is buried in thepround?‘ , etc. , such questions being asked throughout a number of individual tests until the examiner feels he has determined the location of the murder weapon .39 On the other hand, the known-solution POT test, while similar to the searching test consisting of a series of about seven questions presupposes that the examiner is aware of particular details of a crime of which the subject denies any knowledge. For example, the examiner may know that in a 38R. Arther, "Peak of Tension: Basic Information," J. Polygraph Studies, 1 (Jan.-Feb., 1967), 4. 39See: Reid and Inbau, Truth and Deception: The Pply- graph ("Lie Detector") Technique, pp. Cit., 37-40; R. Arther, "Peak of Tension: Examination Procedures," J. Polygraph Spudies, 5 (July-Aug., 1970), 1-4. — t. E .1 Cu LL . at 03 l a... m. o . o D: D. si \Uu. NH“ 34 certain burglary two hundred dollars in quarters has been stolen. The subject is then asked a series of questions such as: "Do you know if dimes were stolen in X burglary?", "Do you know if nickels were stolen in X burglary?", etc., the critical question, in this case the one about the quarters, usually placed in the fourth position in the series. Regardless of the type of POT test employed, inter- pretation of the polygraphic records thus obtained is standard. It is assumed that if a subject is in fact familiar with the critical item in the series, the poly- graphic recordings (especially the "cardio" and GSR tracings) will appear to "peak" at the critical item or will Show a reaction of the greatest magnitude at the "critical" item. Further ramifications of the POT test and its interpretation, as well as necessary precautions in its use are recorded in the literature .40 For the purposes of this study it should be noted that in the POT test examiners rely heavily on reactions in electrodermal activity as indications of deception.41 Contrary to some writings,42 the POT test is not a lie-detection "technique" in the sense that the control 40R. Arther, "Peak of Tension: Dangers," J. Polygraph Studies, 2 (March-April, 1968), 1-4;Reid and Inbau, Truth. and D_e- ception: The Polygraph ("Lie Detector") Technique, pp._c'1_t'. , 37-40. 41 . .Reid and Inbau, Truth and Deception: The Poly- graph ("L1e Detector") Technique, pp. cit., 219-225. 42 M. Orne, R. Thackray and D. Paskewitz, "On the Detection of Deception: A Model of the Study of the Physio- logical Effects of Psychological Stimili," N. Greenfield and R. Sternbach (Eds.), Handbook of Psychophysiology (New York: Holt, Rinehart and Winston, 1972), 743-780. I]: Q r!- 3v {'17 L. U). 35 question and relevant-irrelevant procedures are techniques. Rather, the POT is merely a specialized type of polygraphic test normally used only after testing by either the control- question or relevant-irrelevant procedures; the POT test is used to determine if a given person has "guilty knowledge" of speCific details of a particular offense.43 Hence, its use is limited to those types of offenses where such details are evident. On the other hand, the CQ and R-I procedures are diagnostic techniques not predicated on awareness of particular details of an offense. Generally, these CQ or R-I techniques can be administered in a variety of ways, the examiner having at his disposal the Specialized "card test", "mixed question test", "yes test", "Silent answer test",44 and "yes-no test",45 and others, all of which can be used within the framework of either the CQ or R-I technique. Evaluation of Polygraphic Recofds Visual insPection technique.--Field examiners rarely, if ever, employ strictly objective measurements in interpret- ing the significance of response-data, changes in cardio- vascular, respiratory, or GSR tracings recorded polygraphically. 43R. Arther, "Peak of Tension: Basic Information," pp. pip., 4. 44 See page 28. 45 R. Golden, "The Yes-No Technique" (paper presented at American Polygraph Association Seminar, August, 1969, Houston, Texas). 111E dbl 1n 36 Rather, visual inSpection techniques, progressing from a general appraisal of all records (tests) down to particular analysis of reactions to particular test questions, are usually performed. Generally, changes - extent and duration -of cardiovascular, respiratory, or GSR response - in any of the recorded parameters are evaluated according to Specifi- able criteria for each parameter as set forth in texts,46 or in training manuals.47 Such criteria, however, serve only as guidelines, since the "deception-reSponses" of one person may not be those of another. In other words, field examiners do not claim that any particular reSponse, or pattern of responses is pathognomic of lying, only that changes from the "normal" for any given person may indicate deception.48 Some writers have over-generalized the evaluation of field-derived polygraphic records to the point where any change from pre-stimulus levels is said to be indicative of deception. While it is true that polygraphic records indi- cate any changes from pre-stimulus levels, such changes must be considered both quantitatively and qualitatively, they 46Reid and Inbau, Truth and Deception: The Poly- graph ("Lie Detector") Technique, pp. cit., 41-50. 47C. Backster, Tri-Zone Polygraph (New York: Backster Research Foundation, 1969). 48See: C.N. Joseph, "Analysis of Compensatory Responses and Irregularities in Polygraph Chart Interpretation," Academy Lectures on Lie Detection, V. Leonard (Ed.) (Springfield, 111.: C. C ThomSS, 1957), 93-99; P. Trovillo, "Deception Test Cri- teria," J. Crim. Law, Crim. and P01. Spi., 33 (1942), 338-358; J. Reid,-"Interpretation of Truth and Deception in Polygraph Test Records," undated, unpublished manuscript supplied by author. Elia: 37 cannot be summarily assumed indications of deception. Con- sider record-evaluation in the control-question technique, for example. Simply stated, responses in the polygraphic para- meters which occur more consistently over a series of tests and which are of a greater intensity to control-questions than to relevant questions, indicate truthfulness to the relevant questions. Conversely, responses of a consistently greater intensity to the relevant question than to the con- trol questions suggest deceptiveness regarding the relevant questions. The key points in this vastly over-Simplified description, are that any changes have little Significance unless they occur consistently, and even then they are not significant until compared with other changes. Numerical evaluation technique.-—One of the note- worthy variations in evaluation of polygraphic recordings is a numerical scoring system deveIOped by Backster, a well- known field examiner.49 In this system examiners assign a number ranging from -3 to +3 to reflect the perceived difference between responses to control and relevant question pairings for each of the physiological parameters recorded; the magnitude and direction of the numbers assigned to such comparisons forms the basis for decision- making. For example, the examiner pairs relevant and control 49Backster, Tri-Zone Polygpaph, pp. cit., 14. quest: quest: the re from ' is ass respo: there assigr each ( parame are tl tiVe 1 Points Total inCOnc 38 questions and then observes whether or not a particular question in each pair provokes outstanding reSponse. If the response is greater to the relevant question, a number from -1 to -3, depending upon the extent of the difference, is assigned. On the other hand, if the control-question response is greater, a number from +1 to +3 is assigned; if there is no difference between the paired responses, a 0 is assigned. Such a procedure is carried out separately for each control/relevant-question pair for each physiological parameter of all the tests administered. The numbers assigned are then added; a positive total greater than 5 and a nega- tive total less than 5 usually are established as "cut off" points to indicate truthfulness and deception, respectively. Total scores ranging between +Sanxi-5 are usually considered inconclusive. There are some disadvantages apparent in the numeri- cal scoring system: (1) It is possible that scoring data in such a way filters out recorded trends which might be useful in evaluation. (2) It assumes that response-data are the only indices of deception. In actuality, deception is sometimes indicated not so much by specific response as by generally abnormal or erratic recordings. (3) It makes no provision for artifacts deliberately produced by some subjects.50 Within its limits, however, the numerical-scoring 50See: Reid and Inbau, Truth and Deception: The . Polygraph ("Lie Detector") Techfiique, pp. oit.,7for specific examples of these three phenomena, 53-124, 185-218. 39 system appears to be highly reliable and an especially useful research tool.51 Discussion and Summary of Field Procedures It should be evident from this discussion of the major procedures used in the field, that it is extremely difficult to separate the polygraphic testing or the poly- graphic records themselves from the procedure used in obtaining them. .That is, the examiner-subject interaction before and during polygraphic testing is an integral part of the procedure; one must view field "lie-detection" as a diagnostic technique whether or not R-I or CQ procedures are considered. The most prominent distinction between these procedures seems to be that if one were to place these two "lie-detection" procedures on a subjective-objective continuum, prOponents of the CQ procedure would place themselves more to the right, or towards the objective extreme, of the continuum. It is clear that they believe the use of control questions a necessary basis for objec- tivity, that the polygraphic recordings themselves are highly valid and reliable indicators of a person's truthfulness or deception. 51G. Barland, "The Reliability of Polygraph Chart Evaluations" (paper presented at American Polygraph Associa- tion Seminar, Aug. 15, 1972, Chicago, 111.). 40 Laboratory Lie-Detection: Procedures Laboratory studies of lie-detection usually involve either a guilty-person or a guilty-information paradigm, the two not mutually exclusive.52 Following the guilty-person paradigm, a mock crime is contrived; the task of the examiner is to employ lie-detection apparatus to determine which of a given group of subjects committed the crime, which were accomplices, and which were free of any complicity. This testing is closely akin to the relevant-irrelevant tests used in field settings; control—question testing, somewhat similar to that used by field examiners, is recorded in only one laboratory study.53 In the guilty-information paradigm the subject is instructed to lie about a card, number, or some other item he selects from a group of such items; the examin- er' s task is to determine which item was selected,.hence, the pro- cess can be generally viewed as a "peak-of—tension" test. One of the noteworthy variations of the two laboratory paradigms is termed the "guilty-knowledge technique", origin- ally reported by Lykken.54 Using this technique, subjects assigned to groups who may have committed one or more, or no mock crimes are interspersed among irrelevant, or non-critical 52Orne, pp pJ., "On the Detection of Deception: A Model for the Study of the Physiological Effects of Psycho- logical Stimuli," pp. cit., 775. 53G. Barland, "An Experimental Study of Field Tech- niques in Lie Detection" (unpublished M.A. Thesis, University of Utah, 1972). 54D. Lykken, "The GSR in The Detection of Guilt," J. Appl. Psych." 43 (1959), 385-388. 41 items. It is presumed that those guilty of the crimes, aware of certain information about them, will give augmented physiological responses to test items pertaining to such information. And, therefore, in a series of such tests (or questioning) guilty persons could be expected to respond to the critical items more often than would innocent persons; hence, some estimate of whether a person is "guilty" or "innocent" is possible. The guilty-knowledge technique appears to be a variation of the known-solution POT test used by field examiners. Lykken, however, argues otherwise, believing that it is a "very different thing to use the polygraph to determine whether the subject can identify the signifi- cant alternative, than to use autonomic arousal or "tension" as evidence that the subject is lying."55 Typically, laboratory studies use college students as subjects, employ only a measure of electrodermal activ- ity as the physiological (dependent) variable, use labora- tory personnel as examiners, and, most often analyze response data by some objective technique. These factors, of course, tend to insure rigorous statistical analysis and adequate control over data-collection although the generalization of results is greatly restricted. Moreovery it is clear that laboratory research approaches lie-detection 55D. Lykken, Psychology and The Lie Detector Industry (Minneapolis: Department of Psychiatry, Univ. of Minnesota, Report No. PR-74-l, January 25, 1974), 14. .c . . . . e t -h we” -rl. cw. m .15 hf A.» in Q 'A r”, t rL L.» «D 2» .~« nlu~ :L A} ~13 an a u .. r. .9 rllh in 42 in a manner quite different from that in the field; examiner- subject interaction seldom has a very dramatic impact. The Validity of Lie-Detection Field Procedures The validity of field lie-detection procedures, i.e., the accuracy with which lie-detection can discriminate between truthful and lying persons, has been a constant source of debate between field practitioners, laboratory researchers and others concerned with this problem and its social implications.56 Because there are already available excellent discussions of this tOpic,57 the presentation here will be relatively brief, only the most prominent research results and related problems discussed. As noted previously, many of the early lie-detection practitioners used procedures and instrumentation which by today's standards appear unSOphisticated. In spite of this deficiency, however, there are numerous reports of impres- sive validity. Bennussi, for instance, claimed that he was 56See, for example: U.S. Congress, House, Subcommittee of the Committee on Government Operations, Use of Polygraphs as "Lie Detectors" by the Federal Government, HEErings, 88th CEngress, 2nd Sess.7_and 89tHICongress, 1st Sess., Parts 1-6 (Washington, D.C.: U.S. Government Printing Office, 1964-1966). 57See: 8. Abrams, "Polygraph Validity and Reliability: A Review," J. Forensic Sciences, 18 (1973), 313-326; Barland and Raskin, "The Use of Electrodermal Activity in the Detec- tion of Deception," op. cit., 1-62; J. Orlansky, Ap Assessment of Lie Detection CapEEiliEy (Declassified Version), Tech. Rep. E2-16 (Arlington, VA: Inst. for Defense Analyses, Res. and Eng. Support Div., July 1964), 6-17; Orne, et al.,"On the Detection of Deception: A Model for the Study of—EhE—Physiological Effects on Psychological Stimuli," pp. pip., 743-780. .t at ' tim A F» 43 able to successfully detect liars by evaluating the reSpira- tion-inSpiration-expiration ratio; the ratio was greater before truth-telling than after, and greater after lying than before.58 Marston claimed greater success with dis- continuous systolic-blood pressure as a test of deception, and reportedly could discriminate between truth-tellers and liars with an accuracy of 96 percent.59 In contrast, Summers rejected the value of both respiration and blood pressure and relied on a measure of electrodermal activity. He claimed 98 percent success in discriminating between truth-tellers and liars in the laboratory and 100 percent success when dealing with actual criminal suspects.6O Benussi, Marston, and Summers, of course, did not use a polygraph -- but a single-channel recorder. Larson and Keeler, using polygraphic recording equipment, claimed to have accuracy rates varying between 90 and 100 percent.61 Inbau and Reid claimed an accuracy of 95.6 percent in their initial report on this tOpic.62 Likewise, Arther, estimating 8 - u . . . Benuss1, On the Effects of Ly1ng on Changes in ReSp1ration," cited tw'Trovillo, "A History of Lie Detec- t1on," pp. cit., 870. 59 n . . Marston, Systol1c Blood Pressure Symptoms of De- cept1on," pp. cit., 123 0 . - u . . . C1ted by"Trov1llo, A H1story of L1e Detection," pp. 913°! 108. 61 . Larson, Ly1ng and Its Detection, op. cit., 405-416; Keeler, "A Method For Detecting Deception/'55. EiE., 38-52. 62 . . . F. Inbau and J. Re1d, L1e Detect1on and Criminal Interrogation (Baltimore: Williams and Wilkins, 1953), 110-113. lit hm In] 9. 3. 9?. W3. .1‘ LL I‘l‘ L9 44 from the results of a five-year study, reported an accuracy of over 96 percent with a 3 percent margin of inconclusive determinations and a 1 percent margin of maximum error; he reported that his known error was actually less than .0005.63 In view of such favorable reports of the accuracy of lie-detection in the field setting, it is logical to question how well such reports stand up in objective assessment. Inbau and Reid's early claim of 95.6 percent accuracy had been arrived at by adding instances in which examiners made judgments of lying (31.1 percent) or truth-telling (64.5 percent) in a number of cases. The remaining 4.4 percent of the judgments were inconclusive and the reported error was 0.0007 percent, which was later pointed out as being in arithmetical error to be corrected to 0.07 percent.64 The verification of the Inbau and Reid data rested on confessions made by the persons tested. However, only 486 out of 1334 (36.4 percent) persons who were judged to be liars actually confessed, and only 11.7 percent of the judgments made on the truth-tellers could be verified. Thus, Inbau and Reid defined accuracy as the percentage of cases in which the examiner made a determination of either lying or truth-telling irrespective of actual verification. This 63R. Arther and R. Caputo, Interrogation For Inves- tigators (New York: W.C. COpp, 1959), 214. 64Orlansky, Ap Assessment pp Lie Detection Capp- “a we I‘.‘ -AU «.v C \l¢ Cu N.‘ : s L. 6v he... If... A Is Tu~ 45 was an unusual interpretation of "accuracy" and has since 65 Many other field examiners have been strongly criticized. interpreted their accuracy in the same manner and are thus subject to the same criticism. Field practitioners have also reported studies approaching the question of validity in a more acceptable manner. It is unfortunate that the majority of these studied are quite old and either did not employ polygraphic instrumentation66 or did not use procedures commonly used today.67 Moreover, many field reports of the accuracy of the polygraph rely on anecdotal evidence which, while inter- esting, is not an acceptable method of determining validity. Larson, for instance, reported an investigation in which he gave polygraphic tests to a number of girls living together in a large hall in order to determine which of them was reSponsible for a series of thefts amounting to about $600.00. He reportedly was able to "clear" all but one of the girls who subsequently confessed; thus, an accuracy of 100 percent 65Orlansky, An Assessment pp Lie Detection Capa- bility, op. cit., 11; R. Sternbach, L. Gustafson, and R. Colier, FDonTE_Trust the Lie Detector," Harv. ppp. pr., 40 (1962), 130. 66W. Summers, "Science can get the Confession," Fordham Law Rev., 8 (1939), 334-354; R. MacNitt, "In Defense of the Electrodermal Response and Cardiac Amplitude as Measures of Deception," J. Crim. Law and Crim., 33 (1942), 266-275. 67V. Lyon, "Deception Tests with Juvenile Delinquents,‘ J. Gen. Psych., 48 (1936), 494-497. was isl per: 50} inf! flat .0... S'JC tes I is cri exc Pd! in. in: th ‘4 CC 46 was claimed. The problem with such an "accuracy", of course, is that the group of girls tested contained only one guilty person, the likelihood of being innocent or guilty was not 50 percent. Moreover, as Larson points out, the factual information available was sufficient to enable him to determine in advance of the testing that certain of the girls were more likely to have been "guilty" than others; such information could easily have influenced the polygraph testing.68 The most enlightening validity-study reported to date is by Bersh; he drew a random sample of cases from a pool of criminal investigations carried out by the military services and submitted complete dossiers of all evidence in the cases, except for any reference to polygraphic examinations, to a panel of four military lawyers. All evidence was reviewed independently by the lawyers and determinations of guilt or innocence were made irrespective of legal technicalities; these determinations were then used as the criteria for comparison with the examiners' judgments. In those instances in which all four lawyers agreed on a subject's guilt or innocence, the judgments of the polygraphic examiners were in agreement with the lawyers 92.4 percent of the time. When a majority determination by the lawyers was used as the criterion of guilt or innocence, agreement with the polygraphic 68Larson, "Modification of the Marston Deception‘ Test," pp. cit., 395-396. exam and : agre 47 examiners' judgments was 74.6 percent; and, when unanimous and majority decisions were combined, an 87.5 percent agreement obtained.6 While the Bersh study is of considerable interest and may represent a very useful approach to validity, it is not without some serious deficiencies. Foremost among these is the fact that the examiners' judgments may have been influenced as much by the polygraphic recordings themselves as by their knowledge of other information; as Bersh points out: "No attempt was made to disentangle the influence of the polygraph examination and record from that of the extra- polygraph sources of information available to the examiner."70 Accordingly, Bersh's results bear only upon the validity of the examiners' judgment, not upon the validity of the poly- graphic procedure or of the polygraphic recordings them- selves. In an attempt to disentangle the judgments made on the polygraphic recordings from those made on other informa- tion, Holmes submitted to a group of six experienced poly- graphic examiners the recordings of 32 persons involved in criminal investigations. Twenty of the persons were known to have lied during their examination, twelve to have told the truth. The criteria used for such verification were corroborated confessions. 69P. Bersh, "A Validation Study of Polygraph Examiner Judgments," J. Appl. Psych., 53 (1969), 399-403. 7OIbid., 400. WEI' €Xd' .l\ \ - f: u ‘ r k rs .C .V I \c bl” . "v.7.“ a \LL \(U AV ‘-\ K 48 The examiners were initially asked to evaluate the polygraphic recordings and to identify which were those of truth-tellers and which of liars. Correct determinations were made, on the average, 75 percent of the time by the examiners. When Holmes gave the examiners additional infor- mation about the subjects, such as their behavioral charac- teristics during the testing, investigators' reports and opinions, and witnesses' accounts of the offenses, etc., accuracy rates increased to 83 percent overall. Moreover, Holmesiknnxithat errors more often favored the lying persons, liars, more often judged to be truth—tellers than vice versa.71 Unfortunately, Holmes did not report details concerning the testing procedure used in obtaining the poly- graphic records or the experience levels and the nature of the training of the examiners who evaluated the records; these variables could have significantly affected the re- sults.72 71W. Holmes, "The Degree of Objectivity in Chart Interpretation," Academy Lectures pp Lie Detection, II, V. Leonard (Ed.) (Springfield, Ill.: C.C Thomas, 1958), 62-70. 72F. Horvath and J. Reid, "The Reliability of Poly- graph Examiner Diagnosis of Truth and Deception," J. Crim. Law and Crim., and Pol. Sci., 62 (1972), 276-281; F. Hunter and P. Ash, "The Accuracy and Consistency of Polygraph Examiner's Diagnoses," J. Pol. Sci. and App., 1 (1973), 370- 375; A. Suzuki, "An AnaIySIS—of_R51aEiVe Effectiness (sic) of the Physical Indices and the Influence of Polygraph Examiner's Experience Upon Judgment of Polygraph Records in Detection of Deception," Japanese Journal, Title unknown, reprint supplied by author, 21 (1968), 51-59. 49 Reported in the literature are several other studies utilizing a design somewhat similar to that used by Holmes. However, for reasons which will be discussed at a later point, these studies are more appropriately viewed as reliability rather than validity studies. While the validity of field lie-detection procedures is a crucial concern, it is clear that as yet the evidence supporting extremely high accuracy in the field is incon- clusive. The major reason for the lack of supporting evidence, of course, is that there is no completely adequate ground-truth criterion with which examiners' judgments can be compared. The criteria which have been or can be used, such as confessions, independent evaluations of extrapoly- graphic information, and the outcome of judicial proceedings, do not establish with certainty a person's actual truthful- ness or deception.73 And, since procedures used in giving polygraphic examinations are, in essence, diagnostic pro- cedures, it is difficult to separate the influence of the examiner's interaction with the subject from the polygraphic recordings themselves; that is, the recordings are not necessarily independent of the examiner's attitudes, be- havior, and information concerning the subject's involvement 73For a discussion of the problems associated with the use of confessions as a ground-truth criterion see: H. Dearman and B. Smith, "Unconcious Motivation and the Poly- graph Test," Amer. J. Psych., 119 (1963), 1017-1021; R. Ferguson, The ScienEific Informer (Springfield, Ill.: C. C Thomas, 1971). will othe Such ‘ :82 Labc in t has .1... LM. E. e :u . _ E \Q S a i «is all; r . an“. 4 P h c. s \Je \Ji F t ~hli 50 in the offense under investigation. For this reason it has been argued that the prOper approach to validity is one which compares the validity of the various aspects of the polygraphic technique separately and collectively against other methods of determining truthfulness or deception.74 Such an approach is quite reasonable but as yet has not been reported in the literature. Laboratory Procedures Because laboratory research typically uses electro- dermal activity to indicate deception, the discussion here will be restricted to the validity of this phenomenon. It is well established that during the early 1900's electro- dermal activity was known to be associated with "psychic phenomena" such as lying.75 However, attempts at detecting deception with electrodermal activity probably did not receive full impetus until the 1930's. At that time many investigators reported substantial success with the method. Ruckmick, using the guilty-information paradigm reported that 74M. Orne, "Implications of Laboratory Research for the Detection of Deception," Polygraph, 2 (1973), 169-199. 75See: C. Landis, "Electrical Phenomenon of the Skin," Psych. Bull., 29 (1932), 693-752; C. Landis and H. DeWick, 1'The Electrical Phenomenon of the Skin (Psychogalvanic Re- flex), Psych. Bull., 26 (1929), 64-119; J. Larson, "The Cardio- Pneumo Psychogram and Its Use in the Study of Emotions, with Practical Applications," J. Exp. Psych., 5 (1922), 323-328; F. Peterson and C. Jung, TrPsycho-Physical Investigations with the Galvanometer and Pneumograph in Normal and Insane Indivi- duals," Brain, 30 (1907), 153-218. uti} the m: Gert sock 51 a 66 percent detection rate was achieved with numbered cards, and using the same paradigm with a series of three-letter words, achieved 78 percent correct judgments. Moreover, he found that if the scores of an inexperienced evaluator were eliminated, an 83 percent accuracy was achieved for the three- letter words.76 Geldreich, also using the guilty-information paradigm with decks of cards, claimed that by "fatigue- adapting" a group of subjects to non-critical cards he could improve detection rates from 74 percent for a non- adapted group to 100 percent for an adapted group.77 Fatigue-adapting, Geldreich concluded, shunted ex- traneous stimuli to non-critical items, although there is no indication that he also controlled for differential response-capabilities between groups prior to his experiment. Summers, in what is perhaps the earliest attempt to utilize the guilty-person paradigm, claimed to have improved the galvanometer and the technique used for scoring responses. With his Fordham Pathometer he reported that he was able to correctly detect "guilty", "innocent" and "accomplices" in 78 mock crimes 98 percent of the time. He apparently 76C. Ruckmick, "The Truth About the Lie Detector," J. App. Psych., 22 (1938), 50-58. 77E. Geldreich, "Studies of the Galvanic Skin Response as a Deception Indicator," Trans. Kans. Acad. Sci., 44 (1941), 346-351. 78Summers, "Science can get the Confession," pp. cit., 334-354. att 'la on Pg. .e AU LVL 52 attributed his failure to achieve 100 percent accuracy to "laboratory conditions."79 However, MacNitt, commenting on the accuracy of electrodermal response in experimental cases, "mock crimes", and actual field conditions, reported that his interpretations were 99 percent accurate whereas, in guilty-information situations he was able to achieve only a 75 percent accuracy.80 Hence, Summers' failure at perfection may not have been due to only laboratory condi- tions. While the early reports of nearly perfect accuracy in detecting deception with electrodermal actitivity measures have not, in general, been confirmed in more scientifically acceptable experiments, recent investigations have shown that detection rates far beyond chance can be achieved. Ellson, Davis, Saltzman and Burke for instance, using the galvanic Skin response (GSR) as an indicator, conducted a series of lie-detection experiments. Initially, they were concerned with the accuracy of GSR reSponses in detecting guilty-information and the effect of repetition on accuracy. Their results indicated an 80 percent accuracy-rating for mere detection of information; this figure dropped slightly to 70 percent in one repetition of the experiment. When they repeated their experiment to test for the effect of the 79Cit3d by: Trovillo, "A History of Lie Detection," pp. cit., 108. 80MacNitt, "In Defense of the Electrodermal Re- sponse and Cardiac Amplitude as Measures of Deception," pp; cit., 266-275. 1’85 II, al‘ a L... Du \ 53 subject's knowledge of successful lying on a first trial compared to a second trial, they found that by combining the results of their two experiments an accuracy of 79 percent was achieved against a chance-expectancy of 17 percent.81 Other studies have substantially confirmed the findings of Ellson pp 3A., in both the guilty-information82 and guilty- person paradigm.83 Using the guilty-knowledge technique and by estab- lishing an arbitrary cutoff point for objective analysis of GSR reactions, Lykken was able to correctly classify subjects by group 89.9 percent of the time and to identify the guilty and the innocent 93.9 percent of the time.84 In a follow-up study to assess the effects of faking the guilty-knowledge technique, Lykken achieved 100 percent correct classification of subjects who concealed items of 85 personal information. Studies by other investigators have 81Ellson, David, Saltzman and Burke, A Report pp Research 92 Detection pp Deception, pp. cit., 11. 82D. Van Buskirk and F. Marcuse, "The Nature of Errors in Experimental Lie Detection," J. Exp. Psych., 47 (1954), 187- 190. 83Barland, "An Experimental Study of Field Techniques in Lie Detection," op. cit.; L. Gustafson and M. Orne, "The Effects of Task and—Metfiad of Stimulus Presentation on the Detection of Deception," J. App. Psych., 48 (1964), 383-387; J. Kubis, "Experimental afid SEEtistical Factors in the Diag- nosis of Conciously Suppressed Affective Experiences," J. Clin. Psych., 6 (1950), 12-16. 84Lykken, "The GSR in the Detection of Guilt," pp. cit., 385-388. 850. Lykken, "The Validity of the Guilty Knowledge Technique: The Effects of Faking," J. Ap . Psych., 44 (1960), 258-262. al 911 EKE am the is 54 also reported varying degrees of success using GSR in the guilty-knowledge technique.86 Comparison Of The Validity Of Field To Laboratory Lie Detection There is general agreement that lie—detection, whether in the field or laboratory, is a valid procedure. The question, is whether or not it is as valid as field- examiners claim. As yet, the evidence is not conclusive, and it may never be. But field-practitioners often claim that given the conditions of their situation, lie-detection is more valid than it is in the laboratory. Several major reasons have been offered for the dissimilarity between laboratory findings and claims of field-examiners. Deception Indices In Spite of the typically high accuracy of electro- dermal measures in the laboratory, examiners who work in field settings almost universally agree that for their purposes cardiovascular and reSpiratory measurements are far more effective.87 Early accounts of the accuracy of lie-detection using cardiovascular activity reported fairly high accuracy-rates 86G. Ben Shakhar, I. Lieblich and S. Kugelmass, "Guilty-Knowledge Technique: Application of Signal Detection Measures," J. App. Psych., 54 (1970), 409-413; P. Davidson, "Validity of the Guilty Knowledge Technique: The Effects of Motivation," J. App. Psych., 52 (1968), 62-65. 87Reid and Inbau, Truth and Deception: The Polygraph ("Lie Detector") Techniqpe, pp. cit., 40. EVE COI tel liar "/ -.1 (I) H~ H (‘D (_.‘.’ ti] ,1 a r.- E: H L—‘ U) 31.4 l_' (D I4 :3 H / n) J“: 55 even in mock crimes.88 Chappell and Matthew, claimed a correct discrimination rate of 87 percent between subjects telling the truth and lying about details of a mock crime.89 Marston reported a 94 percent correct classification of liars and truth-tellers.90 Recent investigators have not reported results as outstanding as these; in fact, recent evidence seems to indicate that for laboratory purposes at least, cardio- vascular activity is inferior to electrodermal measures.91 88N. Chappell and N. Matthew, "Blood Pressure Changes in Deception," Arch. Psych., 17 (1929), 1- 39; C. Landis and R.Gu11ette, "Studies of Emotional Reactions," J. Comp. Psych., 5 (1925), 221- 253; C. Landis and L. Wiley, "Changes of Blood Pressure and Respiration During Deception," J. Comp. Psych., 6 (1926), 1-19; W. Marston, "Systolic Blood Pressure Symptoms of Deception," pp. pip., 117- 163. Chappell and Matthew, "Blood Pressure Changes in Deception," pp. cit. 90W. Marston, "Psychological Possibilities in the Deception Test," J. Amer. Inst. pp Crim. Law and Crim., 11 (1921), 551-570. 91J. Kubis, Studies in Lie Detection: Computer Feasi- bili§y_Considerations, Tech. Report 62- 205 (Arlington, Va.: Armed Services Technical Information Agency, June, 1962), prepared for Air Force Systems Command, contract No. AF 30 (602)-2270, Project No. 5534, Fordham University, 1962; S. Kugelmass, Effects p: Three Levels of Realistic Stress pp Differential Psychological ReactiviEies, Tech. Report 63-61 (report prepared for Air Force Office of Scientific Research, European Office, Aerospace Research, U.S. Air Force, Hebrew University of Jerusalem, Isreal, Aug. 1963); S. Kugelmass, I. Lieblich, A. Ben-Ishai, A. Opatowski and M. Kaplan, "Experi- mental Evaluation of Galvanic Skin Response and Blood Pres- sure Change Indices During Criminal Interrogation," J. Crim. Law, Crim., and P01. Sci., 59 (1968), 632-635; S. Kugelmass I. Lieblich,_wEffEEts—5i Realistic Stress and Procedural In- terference in Experimental Lie Detection," J. App. Psych., 50 (1966), 211-216; R. Thackray and M. Orne, "A_Comparison of Physiological Indices in Detection of Deception," Psycho- ppysiology, 4 (1968), 329- 339, R. Violante and S. Ross, Research on Interrogation Procedures (Interim Report, pre- pared for_ U. 3. Navy, Office of Naval Research, Contract Nonr. 4129(00), Stanford Research Institute, Menlo Park, California, Nov. 1964). IS in i'h to 11'. -. - . mo. _ ~ c 1 “nay-W. DU 3. A 0 v5. hku Test 56 In spite of the fact that early investigators dis- agreed on the relative values of either cardiovascular acti- vity or respiration as indicators of deception most of them did find that respiratory measures were fairly good indi- cators of deception.92 This is a particularly interesting point since almost all recent investigations have found respiratory measurement to have little, if any, significance in the detection of deception in the laboratory, at least when compared to other physiological parameters.93 Level of Subject Affect One of the reasons that cardiovascular and respira- tory activity may be less effective in indicating deception in the laboratory than is electrodermal activity, is that in such settings the level of affect is lower than in real-life. In order to investigate this possibility many laboratory investigators have employed stress and motivational devices 92 . Benu381, "On the Effects of Lying on Changes in Respiration," pp. pip.; Burtt, "The Inspiration-Expiration Rat1o During Truth and Falsehood," 0p. cit., Burtt, "Further Techn1que for InSpiration-Expiration—Ratios," pp. cit.; C. Landis and R. Gullette, "Studies of Emotional Reactions," J. Comp. Psych., 5 (1925), 221-253; Larson, "Modification of the Marston Deception Test," pp. cit. 93Loc. cit., Note #91. sud per sit inc {ha CCU 57 such as, electric shock,94 rewards,95 loss of self esteem,96 personally relevant material,97 and awareness of the testing 98 While many of these devices have apparently situation. increased motivation to deceive, there is little evidence that the level of affect approaches that in real-life. Of course, it is also possible that no artificial device used in the laboratory can make the consequences of deception as real as those encountered in life. In other words, labora- tory motivational-devices are ipso facto rewards for success- ful deception; the subject loses nothing for failing to deceive. On the other hand, real-life subjects may lose something very consequential if they fail to deceive; the liar may be subject to criminal prosecution, lose a job, etc. Likewise, the truthful person in real-life fears the conse- quences of being erroneously found to be a "liar"; he is highly motivated not to deceive and to do all he can to succeed in "passing" his test. 94Lykken, "The GSR in The Detection of Guilt," pp. cit. 95Davidson, "Validity of the Guilty Knowledge Technique: The Effects of Motivation," 0p. cit., 62-65; Lykken, "The Vali- dity of the Guilty Knowledge—Technique: The Effects of Faking," pp. cit.; Barland, "An Experimental Study of Field Techniques in Lie—Detection," pp. pip. 96L. Gustafson and M. Orne, "Effects of Heightened Motivation on the Detection of Deception," J. App. Psych., 47 (1963): 408-411. 97R. Thackray and M. Orne, "Effects of the Type of Stimulus Employed and the Level of Subject Awareness on the Detection of Deception," J. App. Psych., 52, 3 (1968), 234-239. 98Ibid. who tan ‘s 58 Two studies which purported to assess the effects of real-life stress on laboratory lie—detection were conducted 100 In the by Kugelmass and Lieblich,99 and Kugelmass, pp pi. first of these studies, card-tests given to police trainees who apparently considered their successful deception impor- tant to their future, were evaluated. Both GSR and heart rate were considered, but GSR was clearly more indicative of deception than heart rate. In the second study, card-tests given to actual criminal suspects as part of their examination, were evaluated. Again GSR-responses were clearly superior indicators of deception; heart rate responses were not significantly different from chance as deception indices. The results of these studies seem to indicate that GSR is superior to heart rate as an indicator of deception. How- ever, it is questionable whether or not the level of affect during card-tests, even though included as a part of a real- life examination, is the same as the level of affect which accompanies personal questioning concerning possible criminal involvement. In fact, because many field examiners report that GSR is highly effective during card-tests in actual 99$. Kugelmass and I. Lieblich, "Effects of Realistic Stress and Procedural Interference in Experimental Lie Detec- tion," J. App. Psych-, 50, 3 (1966), 211-216. 1008. Kugelmass, I. Lieblich, A. Ben Ishai, A. Opatowski, and M. Kaplan, "Experimental Evaluation of Galvanic Skin Re- sponse and Blood Pressure Change Indices During Criminal In- terrogation," J. Crim. Law, Crim., and Pol. Sci., 59 (1968), 632-635. exa: pre< tha- wfiu 1' ing ele t Ca in in, Sta a: Car 9 E r. 8 ¥ k Am» New . C I \ T o D. 3 at 5.1 ‘LP 8 .1 Lu» sh“ “U4 cs 59 examinationslOl and yet relatively ineffective in tests preceeding and following the card—test, it seems indicated that either a subject's level of affect varies with the type of questions asked, or that "arousal" or "attention" is more important to the success of GSR than is affect per SE. Lie-Detection Equipment Another reason for the disparity between laboratory and field lie-detection studies concerns the type of test- ing-apparatus employed. Laboratory apparatus, particularly electrodermal measuring devices, are usually highly SOphisti- cated, while field equipment is relatively simple. However, in spite of the differences in equipment used, there is increasing evidence that this does not account for any sub- stantial difference in results. Orne has found no signifi- cant difference between the two types of equipment with reSpect to results obtained and laboratory studies have employed field apparatus without noticeable differences in results; electrodermal activity, regardless of the type of equipment employed, maintained superiority in lie-detection.102 101Reid and Inbau, Truth and Deception: The Poly- graph ("Lie Detector") Technique, pp. cit., 33. 102M. Orne, untitled manuscript (paper presented to American Polygraph Association, Silver Springs, Maryland, 1969); see also: Barland, "An Experimental Study of Field Techniques in Lie Detection," pp. pip.; and Orne, "Implications of Labora- tory Research for the Detection of Deception," pp. pip. wherein he expresses the belief that field GSR electrodes can be im- proved to increase the effectiveness of this measure (in the field), 196. 60 Use of Control Questions Leading field examiners invariably employ some variation of the control-question technique in conducting lie-detection tests. Simply stated, control questions are designed to channel the psychological set of truthful sub- jects away from relevant questions and towards the control questions. Lying subjects, on the other hand, are presumed to be psychologically set to the relevant questions. Hence, consistently greater physiological responses to control questions are considered indicative of truthfulness regard- ing relevant questions, while consistently greater reSponses to relevant questions are suggestive of lying. The use of control-questions reportedly has significantly increased the ability of field examiners to discriminate between truth- ful and lying persons and at the same time has lowered the number of inconclusive tests.103 The fact that control-questions are generally not used in laboratory studies may be one reason that laboratory studies find cardiovascular and respiratory activity less effective in detecting lies than is electrodermal activity. For example, control questions as used in field settings are generally "worked up" with the subject to insure that the question involves personally relevant material, and that the subject will either lie or have doubts about the accuracy of 103Reid, "A Revised Questioning Technique in Lie Detection Tests," pp. cit., 547. 61 his answer to the question.104 In laboratory studies then, control questions could conceivably heighten a person's interest or concern for the test and possibly would result in greater differential reSponse. The fact that personally relevant material does increase response in laboratory stud- ies has been consistently reported,105 and at least one laboratory study using control-questions has found that both respiration and cardiovascular activity did significant- lY discriminate the "liars" from the "truth-tellers".106 Summers107 and Kubis,108 both claimed accuracy-rates of over 95 percent. Significantly both of them employed "emotional standard" questions, "highly charged emotional issues selected from a study of the life history of the sus- 109 pect." While these "emotional standard" questions only remotely resemble control questions used today, it is clear 104G. Harman and J. Reid, "The Selection and Phrasing of Lie-Detector Test Control-Questions," J. Crim. Law, Cr1m. and Pol. Sci., 46 (1955), 578-582. 105J. Berkhout, D. Walter and W. Abey, "Autonomic Responses During a Replicable Interrogation," J. App. Psych., 54 (1970), 316—325; Thackray and Orne, "Effects of the Type of Stimulus Employed and the Level of Subject Awareness on the Detection of Deception," pp. plp., 234-239. 106Barland, "An Experimental Study of Field Techniques in Lie Detection," pp. cit. 107 108J. Kubis, "Electronic Detection of Deception," Electronics, 18 (April, 1945), 192-212. 109J. Kubis, "Experimental and Statistical Factors in the Diagnosis of Conciously Suppressed Affective Exper1ences," J. Clin. Psych., 6 (1950), 13. Summers, "Science can get the Confession, pp. cit. from W65 ques arbi Ol’l 7“." EK-‘m 62 fnmnSummers'description that their function in the test was the same: to evoke reactions from a suspect which could be compared to reactions on relevant (crime-related) ques- tions. In other words, the use of control-type questions provides a means of using each person as his own control. This is in contrast to some laboratory studies wherein there may be no real individual "control"; reactions to questions are evaluated across individuals according to some arbitrarily-assigned value; hence, all are judged truthful or lying according to the same criterion. Moreover, even in those laboratory studies which use individual "controls", the "control response" is that which occurs to irrelevant or non-critical items. That is, many laboratory researchers simply do not understand the "controls" used by field exam- iners;llO they term as control-questions those kinds of questions which field-practitioners label as irrelevant. In the most recent study which attempted to approximate the use of control-questions as used by field practitioners, it is questionable if the controls were entirely adequate, primarily because they were not individually tailored to subjects.111 110See: Lykken, Psychology and the Lie Detector Indus- try, pp. cit., 24-26. lllBarland, "An Experimental Study of Field Techniques in Lie Detection," pp. Cit'l 40- real ‘QNF ,vy.‘ T‘ \(I\ ”1‘s ‘119 63 The Role of Lying Lykken has proposed that field examiners are not really in the business of lie-detection but rather guilt- 112 If this is so then it seems that the act of detection. lying per se would have little effect on field procedures. Recent evidence tends to support this hypothesis, at least 13 The use of a "silent-answer test" for some persons.1 wherein the person is instructed not to vocalize answers to questions and thus not really lie, has been shown to pro- duce deception-criteria equal to and at times superior to tests which require vocal answers. Unfortunately, the main- tenance of deception-responses in such a silent-answer pro- cedure does not hold true for all persons; nor is there at present any complete understanding of the psychological mechanisms involved in such a silent-answer test. Contradictory evidence concerning the role of lying can be found in laboratory studies. Kugelmass, Lieblich and Bergman reported that there were no significant differences in detection rates whether subjects answered "yes" or "no" to cards chosen from a deck.114 On the other hand, Gustafson 112Lykken, "The GSR in the Detection of Guilt," pp. cit., 385; Lykken, "The Validity of The Guilty Knowledge Technique: The Effects of Faking," pp. cit., 258. 113Horvath and Reid, "The Polygraph Silent Answer Test," 0p. cit. 114S. Kugelmass, I. Lieblich, Z. Bergman, "The Role of Lying in Psychophysiological Detection," Psychophysiology, 3 (1967), 312-315. and car ver E LS. «we. .3 S. E. .1\ \Hr F; QM 6 \b... HRH Lb 64 and Orne reported that subjects answering no to chosen cards were detected more often than subjects giving no verbal answer; subjects required to make a word association to each question were detected less frequently than subjects in the other two groups.115 ScorinngeSponse Data In the final analysis, there is at least one other possible explanation for differences between laboratory and field lie-detection: objectively scoring response-data may "mask out" important information. Indeed, the complex pro- cedures necessary for the objective scoring of both cardio- vascular and reSpiratory activity have been one reason that laboratory investigators, even though recording such activity, have not evaluated it.116 Moreover, from the evidence gathered by Kubis it is evident that visual inSpection of electrodermal response-data by experienced personnel is equal or perhaps superior to objective techniques, and that visual inspection of cardiovascular, reSpiratory, and electrodermal activity as a unit can lead to high accuracy-rates independent 117 of interaction between the subject and examiner. This is 115L. Gustafson and M. Orne, "The Effects of Verbal Responses on the Laboratory Detection of Deception," Psycho- physiology, 2 (1965), 10-13. 116S. Kugelmass, Effects of Three Levels pp Realistic Stress on Differential Physiological Reactivities, Tech. Report, 63-61 (raport prepared for Air Force Office of Scientific Research, European Office, AerOSpace Research, U.S. Air Force, Hebrew University of Jerusalem, Israel, Aug., 1963). 117Kubis, Studies pp Lie Detection: Computer Feasi- bility Considerations, pp. cit. C011 pol 65 consistent with the results of studies using field-obtained polygraphic records.118 The Reliability of Lie-Detection ~The reliability of polygraphic procedures has re- ceived considerably less attention than its validity. And, of course, this is quite natural since reliability refers only to the degree of consistency of judgments between poly- graphic examiners or examinations irrespective of the "correctness" of the judgments. For example, Dearman and Smith reported an instance of an individual being given independent polygraphic examinations by several different examiners, all of whom concluded that the individual had not told the truth in answering the question, "Did you steal any money from the bank or its customers?" In other words, in this instance the reliability of the examiners' judgments was perfect. However, Dearman and Smith pointed out that in their judgment, based on psychiatric evaluations, the indivi- dual in question had told the truth to the test question; in other words, while the reliability between the examiners was high, validity, according to Dearman and Smith's interpretation, was low.119 This example, of course, concerns the reliability 118See: Holmes, "The Degree of Objectivity in Chart Interpre- tation, " pp_. cit.; Horvath and Reid, "The Reliability of Polygraph Examiners Dia—girosis of Truth and Deception," pp. cit. ; Hunter and Ash, "The Accuracy and Consistency of Polygraph Examiner's D1agnos.1s, " pp. cit.; S. Hathaway and C. Hanscom, "The StatisticalEvaluation of PB—l—ygraph Records, " Academy Lectures on Lie Detection, II , V. Leonard (Ed.) (Springfield, Ill.: C.C Thcimas, 1958), 118-136. 119Dearman and Smith, "Unconcious Motivation and the Polygraph Test," pp. cit. beex IEC! 66 of the complete polygraphic procedure; as such it has not been adequately reported in the literature. The reported field reliability studies deal rather with the degree of agreement between evaluators when judging the same polygraphic recordings, or with the consistency of one evaluator's judg- ment of the same recording two or more times. It is these latter studies which will be discussed here shortly; it should be noted that many of them deal indirectly with the issue of validity, although such a consideration is not essential for reliability-studies. Laboratory Studies The earliest of the reliability-studies was reported by Rouke in 1941. Two groups of subjects, 80 delinquent and 90 non-delinquent boys, were tested in an "experimental situation designed to simulate closely the elements in the "120 The tests given actual investigation of criminal cases. used only a psychogalvanic (GSR) measure. There was, how- ever, a very close correspondence (C, contingency coeffi- cient, = .72) between the ratings (evaluationS) of the same records (tests) by the same evaluator at different times, and two judges independently reviewing the records of the delin— quent and non-delinquent boys agreed in their judgments 88 percent and 91 percent of the time, respectively. 120F. Rouke, "Evaluation of the Indices of Deception in the Psychogalvanic Technique" (unpublished Ph.D. disser- tation, Fordham University, 1941), 80. IEDC H 67 The most thorough study of reliability to date was reported by Kubis who conducted an elaborate series of experiments on lie-detection. While it is not necessary to detail them here, there are several points of interest. First, recordings were obtained by means of a polygraph; that is, reSpiration, electrodermal activity (GSR), and cardiovascular activity were recorded. Second, the examiner- evaluators used by Kubis were trained psycholoqists, all of whom were given a Special "three-month training course in the theory and practice of 'lie detection'".121 Third, Kubis was able to assess the reliability with which each of the physiological measurements was interpreted and was able to compare the reliability of examiners who interacted with subjects to that of evaluators who had not engaged in such interaction. In Kubis' study each of the polygraphic recordings was evaluated by the examiner who had done the testing, and by two independent evaluators. While all evaluations were quite accurate the reliability of the judgments is of major interest here. Kubis found in one section of his experiment that there was an average 78 percent agreement between the judgments made by examiners and independent evaluators; judg- ments made by only independent evaluators agreed, on the 121Kubis, Studies ip Lie Detection: Computer Feasi- bilipy_Considerations, pp. cit., 28. .. vii t 68 122 average, 81 percent of the time. Similar results, ranging from 72 percent to 87 percent were reported in another section of Kubis' experiments.123 It should be noted that the reliability reported by Kubis varied with the particular physioloqical parameter evaluated, GSR being judged more reliably than either respir- ation or cardiovascular recordings. Similar results have been reported by Barland who submitted experimentally—derived polygraphic recordings to a group of independent evaluators, all trained polygraph examiners.124 Kubis also reported that independent evaluators had "greater confidence in those decisions which were ultimately verified as correct than they did in those which were in- correct."125 Moroney, using an experimental lie-detection situation but recording only GSR, substantiated Kubis' results: the more confident evaluators were in their decisions,. the more likely they were to be correct; that is, the more ambiguous the recordings, the greater the likelihood of 126 error. lzzIbid., 44. 123Ibid., 48. 124 'Barland, "The Reliability of Polygraph Chart Evaluations," pp. cit. 125Kubis, Studies pp Lie Detection: Computer Feasi- bility Considerations, pp. cit., 68. 126W. Moroney, "The Detection of Deception as a Func- . tion of PGR Methodology" (unpublished Ph.D. dissertation, St. Johns University, 1968, Ann Arbor, Mich.: University Microfilms, 1969, No. 69-7125). 69 In a recent study Barland submitted the polygraphic recordings of 72 subjects involved in a hypothetical crime to a group of five independent evaluators, all experienced polygraphic examiners. Rather than having the evaluators make dichotomous or trichotomous (i.e., "guilty", "innocent", or "inconclusive") judgments, he asked them to evaluate the recordings in accordance with the numerical scoring-system developed by Backster.127 Hence, a total numerical score was obtained for each subject's records (tests) from each of the evaluators. By considering evaluators in pairs, and including his own evaluations, correlations (Pearson product- moment) between all possible pairs of evaluators were com- puted; such correlations ranged from .78 to .95 with a mean of .86, indicating a very high reliability among the evalua- tors. Said another way, Barland found that out of 559 instances of two examiners arriving at a definite judgment of truth or deception, agreement occurred 534 times, or 95.5 percent of the time.128 Other investigators have also reported high reliabil- ity in the evaluations of physiological data gathered in experimental lie-detection settings. Van Buskirk and 127See pages 37-39. 128 Barland, "The Reliability of Polygraph Chart Evaluations," pp. cit., 5. 70 Marcuse, for example, using standard field polygraphic equip- ment and the card-test, had two evaluators judge the same 50 records at two different times one month apart. "The results indicated 84 percent agreement on cards and 94 percent agree- "129 Bitterman ment on records between these two judgments. and Marcuse reported that their judgments concerning the classification of response-data in cardiovascular tracings were highly reliable (C = .96 and .92); a third classifica- tion by an independent evaluator of the recordings demon- strated that the authors' classification was substantially 130 And, in a study reported by Heckel, pp pp., reproducible. a hypothetical crime was set up in such a way that three groups of five subjects each were led to believe that they were suspected of stealing money from the experimenter's wallet. One group consisted of "normal" males recruited from a local educational institution; the other two groups consisted of males under phychiatric care and diagnosed as either "non-delusional" (psychoneurotics) or "delusional" (psychotics). Although none of the subjects were, in fact, guilty of the theft, they were all given polygraphic tests by a skilled examiner; the purpose of giving such tests was to determine if physiological reactions to the testing differed between the groups, affecting the interpretation of recordings. 129Van Buskirk and Marcuse, "The Nature of Errors in Experimental Lie Detection," pp. cit., 188. 130M. Bitterman and F. Marcuse, "Cardiovascular Responses of Innocent Persons to Criminal Interrogation," Amer. J. Psych., 60 (1947), 407-412. 71 Following the administration of all polygraphic tests, the recordings were submitted to a group of four trained examiners asked to judge if the recordings indicated decep- tion or no deception, or were inconclusive. Complete agree- ment on the control-subjects prevailed between the four evaluators, and, in general, reliability decreased for the "psychiatric" subjects although "overall reliability of "131 This suggests that polygraphic ratings was quite high. recordings of persons indicating psychiatric maladjustment may be subject to erroneous judgments, i.e., less valid, and that examiners' agreement on recordings obtained from such persons may be less than the recordings from "normal" persons. It is important to note that all of the above studies dealt with data derived from experimental lie-detection situations. It is generally agreed that such data are not necessarily related to those obtained in field situations. Therefore, we must turn to an analysis of field studies which have looked at the issue of reliability. Field Studies In a recent study, Horvath and Reid submitted the polygraphic recordings of forty subjects, 20 verified truth- tellers and 20 verified liars, along with brief factual information of the investigations in which the subjects were 131R. Heckel, J. Brokaw, H. Salzberg and S. Wiggins, "Polygraphic Variations in Reactivity Between Delusional, Non-Delusional and Control Groups in a 'Crime' Situation," J. Crim. Law, Crim. and Pol. Sci., 53 (1962), 382. 72 involved, to a group of ten examiner-evaluators. The evalua- tors were asked to identify the truth-tellers and liars. In spite of their minimal information about the investigations, they were able to achieve an average rate of agreement of 87.8 percent, the more experienced 91 percent, the less 79 percent. It is noteworthy that the evaluators in this study were deliberately given polygraphic recordings felt by the authors to be difficult to interpret, that is records not dramatically indicative of truth-telling or lying.132 Hunter and Ash have reported the results of a study which essentially dealt with test-retest reliability. The polygraphic records of ten verified truth-tellers and ten verified liars were given to a group of seven examiner- evaluators at two different times; a minimum of three months elapsed between the two evaluations, no evaluator being told that he would be dealing with the same polygraphic records on both occassions. The results of the Hunter and Ash study were quite similar to those reported by Horvath and Reid, even though the evaluators and the polygraphic records were different. The evaluators achieved an average accuracy of 86 percent in correctly identifying the truthful and deceptive subjects, the range was between 82.5 and 90 percent. Moreover, the reliability between initial and subsequent evaluations was 132Horvath and Reid, "The Reliability of Polygraph Examiner Diagnosis of Truth and Deception," pp. cit., 278. 73 quite high, 85 percent ranging from 75 to 90 percent. How- ever, unlike Horvath and Reid, who found that errors seemed to favor the lying subject, Hunter and Ash reported that errors were almost identically balanced, that is, "false positives" (reporting a truth-teller as a liar) were made as often as "false negatives".133 The Horvath/Reid and Hunter/Ash studies appear to deal with the issue of validity. In a sense they do; how- ever, they should not be viewed as providing direct evidence of the validity of field lie-detection procedures. This is primarily because in these studies the polygraphic records evaluated were selected from cases where the testing examiner correctly identified the guilty person. It can be argued that in such cases the non-polygraphic sources of information available to the examiner aided considerably in conducting the examination; better factual information might have allowed him to formulate more appropriate test questions (affecting the response data on the recordings); or to vary his pre-test interview in a way that made it possible to obtain more suitable recordings than would otherwise be obtained. In other words, while these studies suggest that blind analysis of physiological data can lead to considerable accuracy, the chief value of the studies is as reliability assessments; that is, independent evaluators trained in the same tradition, can 133Hunter and Ash, "The Accuracy and Consistency of Polygraph Examiner's Diagnoses," pp. cit., 372. 74 consistently identify those physiological changes believed to be associated with deception. Discussion Most research dealing with "lie-detection" has been done in the laboratory. Unfortunately, such research, while important for understanding the mechanisms which underlie detection of deception, is not necessarily applicable to real life. For example, laboratory researchers almost without exception report that electrodermal activity is the most valid and reliable indicator of deception; field prac- titioners, on the other hand, claim that for their purposes other physiological measures are more useful. Nor are the types of testing most often used in the laboratory — rele- vant-irrelevant tests - believed to be adequate in the field where control-question tests predominate. While it is unlikely that the reasons for these and other differences between labora- tory and field lie-detection will be easily and quickly resolved, there are some approaches to these issues which provide sugges- tive evidence. ‘ Studies such as Bersh's using completely independent criteria to validate field polygraph examiners' judgments, seem to hold promise. Also, studies which require indepen- dent evaluators (trained polygraphic examiners) to make judgments of truth and deception solely on the basis of physiological data obtained in the field appear to be useful. Unfortunately, reported studies in the latter category raise as many questions as they answer. 75 The major deficiencies in those studies requiring independent judgments to be made on polygraphic recordings obtained in field settings are these: 1) Except for the study of Holmes whose data were inadequate for making reliability-assessments, none of these studies have dealt with judgments made by polygraph examin- ers employed by law-enforcement agencies. Horvath and Reid and Hunter and Ash, for example, evaluated the judgments of examiners employed by a private agency; these examiners were more highly educated and had received more training in the polygraphic technique than most police polygraphic 134 It is not unreasonable to suspect that such examiners. education and training influenced the results. Thus, it is important that such studies be replicated with examiners who are representative of those employed by police agencies and as such more likely to deal with persons whose liberty may depend on the outcome of the polygraphic examination. Moreover, in judicial proceedings the police examiner is more likely to be called upon to testify as to the results of polygraphic examinations, assuming that such evidence becomes generally admissable for such purposes in the future. 134Horvath and Reid and Hunter and Ash evaluated judgments of examiners who by state law were required to pos- sess at least a Baccalaureate degree and to undergo a training program of six months duration. At the present time such minimal qualifications are not required of polygraphic exam- iners employed in other jurisdictions. For a discussion of this topic, see: C. Romig, "The Status of Polygraph Legisla- tion of the Fifty States," Police, 16, No. l, 2, 3 (1971), 35-41, 54-61, 55-61, reSpectively. 76 2) All of the field studies, as well as many of the laboratory studies, have used polygraphic records obtained from persons tested by only one examiner. For instance, Horvath and Reid and Hunter and Ash employed recordings in each instance originally obtained by one of the authors. Obviously, the use of such records at least partially con- trols for the nature of the interaction between the subject and the examiner, interaction believed to have an affect on the nature of the recordings obtained. Hence, these studies show only that when this interaction is controlled in such a manner, other examiners trained within the same tradition can make independent judgments which are "accurate" and re- liable. Whether or not similar results would obtain if the interaction were not accounted for in the manner described above is not known. 3) The recordings used in the reported field studies do not necessarily constitute a representative sample of any pre-defined population. Horvath and Reid reported results obtained when evaluators judged recordings selected because they were believed to require skill to interpret. Selection of recordings in such a manner makes it difficult to draw any valid general conclusions from results. Moreover, all of the studies of recordings obtained from field situations used those ultimately verified as being of either a "truth- teller" or "liar", according to corroborated confessions. Such criteria, of course are necessary for estimating 77 accuracy, but are not required for assessments of reliability. And by using only recordings "verified" in such a way, gener- alizations are seriously restricted since the majority of all persons tested by polygraphic examiners are not verified as truth-tellers or liars by confessions or any other infor- mation.135 It has been suggested that persons presumed prior to testing to be liars undergo an examination somewhat different from those presumed to be truth-tellers. By sim- ilar reasoning persons involved in investigations which are eventually verified by someone's confession, would undergo examinations differing from those not so verified; factual information, behavioral characteristics, etc. would provide more, or "better", clues to the examiner in the verified investigations, or perhaps, the resulting polygraphic records, for some reason, would be of a better quality. In other words, it is important to assess accuracy and reliability in terms of records obtained from both verified and unverified investigations, using in the latter instance the testing examiner's judgment for comparison with inde- pendently made judgments. As Holmes,136 and others137 have 135See: Inbau and Reid, Lie Detection and Crim- inal Interrogation, pp. cit., 110-113. 136Holmes, "The Degree of Objectivity in Chart Inter- pretation," op. cit. 137Horvath and Reid, "The Reliability 0f Polygraph Examiner Diagnosis of Truth and Deception," op. cit.; Hunter and Ash, "The Accuracy and Consistency of Polygraph Examiner's Diagnosesf'pp. cit. ,____v__ 6): IE IE it la d3 ‘ 1‘ Pu... 78 pointed out, there is some reason to believe that the judg- ment of the testing examiner, because it includes an eval- uation of non-polygraphic sources of information, is more apt to be correct than an evaluation based on polygraphic records alone. 4) Some of the previous research suggests that experience in giving polygraphic examinations affects the reliability of judgments of truth and deception. Horvath and Reid, for instance, compared judgments of examiners with less than six months' experience and still undergoing train- ing, to those of examiners with more than six months' experience; the former group were less reliable than the latter. Hunter and Ash reported similar results using a different group of examiners and, moreover, found that the examiner with the most experience was more consistent in his judgments than all other examiners. In further recog- nition of experience as an important determinant of the ability of an examiner to interpret polygraphic records is the proposal by Reid and Inbau that examiners selected for giving testimony in a courtroom should have more than five 138 While there are many years experience in field testing. ramifications to such a proposal, the implication that exper- ience is an important determinant of success is clear. 138Reid and Inbau, Truth and Deception; The Polygraph ("Lie Detector") Technique, pp. cit., 257. 79 One aspect of the difference in results between experience levels as discussed above is easily accounted for: the inexperienced examiners had not yet completed their training. It is still not known if fully trained and exper- ienced examiners will be more "accurate" and consistent in their judgments than those with less experience, although the Hunter and Ash results suggest that this is so. 5) Another shortcoming of the previous studies is that the nature of the investigation from which polygraphic records were obtained was not controlled. Do differences in results depend upon the nature of the crime? For instance, if recordings were drawn from investigations classified according to crimes against a person or property crimes,139 it seems reasonable to suspect that in the former category a testing examiner would have more factual information at his diSposal than in the latter. 'Offenses such as rape or armed robbery involve a victim who is usually capable of identifying a suSpect or, at least, of relating precise details regarding how and where the offense occurred. Even in homicide cases, where naturally a victim is incapable of providing details, the details possible seem to be gen- erally quite adequate, such offenses usually giving the 139Such classification based on the presumed nature of involvement of the victim; direct involvement, such as in rape, murder, armed robbery, assault, and indecent (sexual) liber‘ ties, leading to classification as "crimes against a person"; less apparent involvement of the victim, such as breaking and entering, arson, and larceny, leading to classification as "property crimes". high as b not rela tion p. / L, ‘ I :17 r1 ) *t H 80 highest police-clearance rate.140 In prOperty crimes such as burglary, larceny, and arson, on the other hand, usually not directly involving either a victim or witness capable of relating precise details about the offense, factual informa- tion is less apparent. The advantages which an examiner may have in testing persons suspected of committing a crime against a person, then, could conceivably influence his judgment of truth and deception. That is, the examiner might be inclined to give more credence to factual information when testing persons involved in crimes against a person, and to give less weight to the physiological responses observed in polygraphic records; or, assuming that crimes against a person involve more detailed information for an examiner's use, it seems probable that such information profoundly affects the outcome of the examination; detailed information provides a firmer basis for formulating test questions, which, as field exam- 141 is an important determinant of iners are well aware, physiological responsitivity. Also, as Orne has suggested, persons who examiners prior to testing presume to be liars may undergo an examination somewhat different from those 140Federal Bureau of Investigation, Uniform Crime Reports.for the United States: 1972 (Washington: Government Printing Office, 1973), 115. 141Reid and Inbau, Truth and Deception, The Polygrpph ("Lie Detector") Technique, pp. cit., 16-21; R. Arther, "Crime Question Wording," J. Polygraph Studies, 4 (Sept.-Oct., 1969), 1-4. 81 presumed to be telling the truth,142 the examiner's bias influencing the nature of the questions and therefore the responses as polygraphically recorded. Rosenthal's work too makes it difficult to believe that such bias does not 143 It seems reasonable then exist in polygraphic testing. to conclude that bias is more likely when an examiner has access to detailed factual information which may strongly implicate or exculpate a person in involvement in a criminal offense. And, as has already been suggested, such detailed information seems more available in investigations involving crimes against a person than in those involving prOperty crimes. Summary In this chapter the literature pertaining to the pro- cedures, validity, and reliability of lie-detection in both field and laboratory settings was discussed. The procedures used in the field-setting make lie-detection there akin to a diagnostic technique whose efficacy is determined by the interaction of examiner and subject as well as by polygraphic recordings. In contrast, laboratory procedures are rarely affected by such interaction. Rather, polygraphic recordings 142Orne, "Implications of Laboratory Research for the Detection of Deception," pp. cit., l75-l77. 143R. Rosenthal, Experimenter Effects pp Behavioral Research (New York: Appleton-Century—Crofts, 1966). 82 alone, i.e., physiological measurements made during a series of tests, which also differ in nature from those used in the field, constitute laboratory lie-detection. Because of the numerous and significant differences between laboratory and field procedures and goals, it is, in general, misleading to apply the results of laboratory research to the typical field situation. In spite of this difficulty, however, there is substantial agreement that lie-detection is a relatively valid and reliable method of determining truthfulness and deception; that is, judgments based upon lie-detection tests are correct too often to be considered coincidental, and the physiological reSponses thus measured and recorded provide a basis for substantial'repli- cation of judgments made on them. Many field-practitioners of lie-detection claim that relatively recent developments in administering such tests, e.g., the control-question procedure as well-as the standard- ization of procedures between examiners, provide an adequate basis for conducting meaningful research on field-gathered data. And some research recently reported suggests that this is indeed true, deSpite the many important questions still unanswered. There remains a need to replicate this research and to introduce innovations to clarify and supplement the findings so far reported in the literature. (D C) mic Chapter III METHOD In this chapter the characteristics of the sampling procedure, the polygraphic records used, the evaluators, the Operational measures, hypotheses, and statistical analysis and design will be presented. First, however, a discussion of certain general characteristics of the study is in order; the source of the data as well as the nature of the testing procedure and apparatus employed will be considered. General Considerations Source of Polygraphic Data A large state police department (SPD) located in the mid-western states provided the researcher access to its files containing data pretaining to polygraphic examinations conducted by employees of this agency. While the SPD had such files at ten locations or posts throughout the state, it was decided that only those files at a post located in a large metrOpolitan area would be used for this study. There were two major reasons for this choice. First, it was be- lieved that the method of filing data at this post would facilitate sampling procedures. And, second, the examinations conducted there generally involve a wider variety of serious 83 84 criminal offenses than at other posts, which, for purposes of the study, was desirable. At the site selected, data are compiled and filed in the following manner. When a complaint of criminal con- duct comes to the attention of the law-enforcement agency1 the person against whom the complaint is made is asked to undergo polygraphic examination. If he agrees, the data pertaining to this examination is placed in a case folder, on the outside of which are written the person's name, the nature of the investigation (homicide, rape, etc.), other identifying data (e.g., complaint number) and the outcome of the examination. If another person is given an examina- tion with reSpect to the same complaint, data pertaining to that examination are added to the same folder as are appropriate notations on the outside. Hence, a common folder contains all data pertaining to examinations relevant to the same complaint, the outside of such folders indicating the nature of the contents. Examination Procedure All polygraphic records used in this study were obtained from examinations conducted by employees of the SPD. These examiners had all received their initial training lThe SPD conducts polygraphic examinations not only for its own investigations but also for other law enforcement agencies making appropriate requests. at 1'1( 3...\rj L. . i n.» O C D. ‘5 LL 1.3. .l 85 at a nationally recognized training school2 certified by the American Polygraph Association.3 All examinations were conducted in accordance with the Control Question Technique noted in Chapter II. The following discussion further describes certain aspects of this technique in greater detail, highlighting differences between the procedures employed by the SPD examiners and those pre- sented elsewhere. Pre-test interview.--The SPD examinations consist of a pre—test interview and polygraphic testing. The interview, however, is essentially an eclectic one, combining certain aSpects of the interview procedures advocated by prOponents of the various approaches to CQ-testing. For instance, dur- ing the interview the examiner discusses in depth with the subject, the subject's likes and dislikes, hobbies, education, etc. Such a discussion is similar to that used by military examiners as reported by Barland and Raskin.4 Also included in the interview are questions specifically designed to elicit 2National Training Center of Lie Detection, New York City, New York. 3N. Ansley (Ed.), "A.P.A. Accepted Polygraph Schools," American Polygraph Association Newsletter (December/January, 1974), 14. ' 4G. Barland and D. Raskin, "The Use of Electrodermal Activity in the Detection of Deception," pre-publication copy to appear in: W. Prokasy and D. Raskin (Eds.), Electrodermal Activity in Psychological Research (New York: Academic Press, in press), 5-8. 86 behavioral cues from the subject, questions "borrowed" from the interview procedure used by proponents of the Reid technique.5 The interview ends with a procedure advocated by Arther, an extended explanation of the polygraph instru- ment and the nature of "lie detection", etc.6 In spite of the eclectic nature of this SPD inter- viewing, the procedure is consistent with CQ testing: there is no intensive or accusatory questioning prior to (or during) polygraphic testing, and all test questions are re- viewed exactly as they will be asked during actual testing; the questions are worded or phrased in such a way that the subject is certain that he understands them and that he can answer them with either a "yes" or a "no . Polygraphic testing.--The interview, which usually lasts between 45 and 90 minutes, is followed by the poly- graphic testing. The polygraphic attachments are placed on the subject, the pneumograph (respiration recording) and the GSR units are activated and recordings made for about a minute in order to assure that they are suitable. The subject is told that the testing is about to begin and is reminded to 5See: J. Reid and F. Inbau, Truth and Deception, The Polygrapp ("Lie Detector") Technique (Baltimore: Williams and Wilkins, 1966), lO-l6; F. Horvath, "Verbal and Nonverbal Clues to Truth and Deception During Polygraph Examinations," J. App. Sci. and Adm., l (1973), 138-152. 6R. Arther, "The Heart and You” (unpublished, undated manuscript, National Training Center of Lie Detection, New York). 87 answer all questions either "yes" or "no", as he did during the run—through. The cardio-cuff is then inflated and the questioning begins. The test questions are asked at about 15-20 second intervals in a pre-determined order. During the questioning the examiner marks the following on the chart paper: the sensitivity setting of the GSR amplifier, the pressure in the cardio-cuff, the points at which each question starts and ends, the number of each question, and the subject's answers. Any adjustments to the tracings made by the examin- er or "artifacts" caused by the subject's movements, etc. are also apprOpriately noted. When each test question has been asked once, and the test concluded the pressure in the cardio- cuff is again noted and the cuff, deflated. The pneumograph and GSR units remain in operation for a short period follow- ing deflation of the cuff. A test usually lasts for about three minutes, after which the examiner notes the subject's name, the date, his own initials, and the number of the test in the sequence. This is done on the chart paper at'a point prior to where the cardio-cuff was inflated at the beginning of the test. As explained in Chapter II, however, a battery of such tests is conducted with each subject before the examination is completed. The basic battery of tests used by the SPD examiners consists of at least CQ Test #1, a "card" or "number" test am the var aln 88 and then a third test, a repetition of Test #1. In some instances, only these three tests are conducted; in others, the examiner may conduct additional tests, making use of various stimulation strategies.7 Such additional tests almost always include a "mixed question" test as the fourth in the series, although in rare instances the fourth may also be a "yes" test, or a "yes-no" test.8 Regardless of which test follows the basic battery, however, additional tests are always consecutively numbered and the nature of the stimulation strategy used by the examiner is indicated on the chart paper according to standardized notation. In other words, it is possible on review of any subject's records (tests) to determine where in the sequence a test was conducted as well as the nature of the test itself. Sequencing of test questions.--As explained above, the sequencing of the questions in a given test is pre- determined, and consistent with the variation of CQ testing used by the SPD examiners.9 Perhaps, the sequence can be best explained by discussion of an example. Assume a burglary has taken place, a polygraphic instrument stolen. Questions considered pertinent in such a case would be as follows: 7See Chapter 11, pages 29—32. 8R. Golden, "The Yes-No Technique" (paper presented at the American Polygraph Assocication Seminar, August 1969, Houston, Texas). 9The SPD examiners sequence questions in a manner ad- vocated by Arther. See: R. Arther, "Irrelevant Questions," J. Polygraph Studies, 3 (May—June, 1969), 3-4. 89 Position Numerical Type in Designation of Sequence on Charts Question Example 1 l Irrelevant "Are you in Michi- gan now?" 2 3T "Known Truth" "Did you sell that polygraph to (a fictitious person)?" 3 3K Relevant- "Do you know for Guilty sure who stole that Knowledge polygraph?" 4 5 Relevant- "Did you steal that Crime polygraph?" Related 5 6 Control "Did you ever steal anything?" 6 8 Relevant- "Do you know where Crime that missing poly- Related graph is now?" 7 86C "GUilt Com- "Did you steal (fic- plex"-ficti- titious item) from tious Crime (fictitious person or place)?" 8 9 Relevant- "Did you break into Crime (building or loca- Related tion from which the polygraph was stol- en)?" 9 10 Control "Did you ever lie about anything im- portant ?" 10 11 Relevant- "Did you tell the Crime complete truth about Related that missing poly- graph?" 90 While other publications discuss in detail the rationale and purpose of such a sequence of questions,10 of significance here are several points in particular. (1) The sequence, although pre-determined is not inflexible; that is, the specific nature of the investigation determines the precise wording of the questions and the elimination of certain question types. A guilt-complex question, for in- stance, if not useful in certain types of investigations,11 would be replaced by a different question in the seventh position in the sequence. (2) Two control questions, each individually prepared with each subject, are always imbedded in the series. Actually, the known—truth and guilt complex questions, when asked, serve as quasi—control questions; responses to them permit estimation of a subject's response to relevant test questions when he is telling the truth (to the relevant questions). (3) The sequencing of the ques- tions in CO test #2 following the card test, is identical with that in CQ test #1. (4) Additional irrelevant questions, pre-reviewed with the subject, can be inserted in the sequence at the examiner's discretion; such questions would be designated on the charts as #2, #3, or #7. Finally, the designation of 10R. Arther, "Crime Question Wording," J. Polygraph Studies, 4 (September-October, 1969), 1-4; R. Afther, "Cover- ing Two Crimes in One Examination," J. Polygraph Studies, 4 (May-June, 1970), 3-4. _ llR. Arther, "Irrelevant Questions," J. Polygraph Studies, 3 (May-June, 1969), 3-4. 91 questions on the charts or records is standardized; the number 3T, for instance, always indicates that a known- truth question was asked; the numbers 6 and 10 always refer to control questions, etc. Such standardized notation facilitates one examiner's review of another's polygraphic records and allows a determination of the nature or type of questions asked. Polygraphic apparatus.-—The recording instruments used by the SPD examiners in conducting polygraphic examin- ations were standard field equipment made by the two major manufacturers,12 recording respiration, cardiovascular activity, and GSR. Between the years considered in this study, 1969-1972, however, a change in instrumentation used by the SPD examiners was made; dual pneumograph units, by which both abdominal and thoracic breathing patterns could be recorded simultaneously, were added. Sampling Considerations POpulation The case folders pertaining to all polygraphic examinations conducted at the aforementioned SPD post during the years 1969-1972, inclusive, were reviewed; eliminated were those investigations involving violations of narcotic 12During the years from which the sample was drawn the polygraphic instruments used by the SPD were manufactured by either the Stoelting Company, 424 N. Homan Ave., Chicago; or Associated Research, Inc., 3758 W. Belmont, Chicago. 92 laws, traffic laws, and certain other violations not readily classifiable as crimes against a person or property crimes (e.g., drunkenness). The remaining 1446 folders were used as the population from which a stratified random sample of folders was drawn.13 Procedure Sampling was carried out in essentially two stages. The first stage consisted of assigning case folders to, and randomly drawing sub-samples from, eight categories accord- ing to a pre-determined stratification scheme. The stratifi- cation matrix shown in Figure 3.1 exemplifies this scheme, folders categorized as data pertaining to either verified or unverified investigations, truthful or deceptive subjects, crimes against a person or property. A verified investigation was defined as one in which a subject made a complete con- fession, e.g., if 10 persons were given polygraphic examin- ations as part of a homicide investigation, and subsequent to all examinations the tenth person made a complete confession, the investigation (folder) was considered verified.14 13The number of folders assigned to each stratifica- tion level in the population is given in Appendix A. 14In some instances notations made on the folders categorized as "verified" also indicated that the deceptive subject had either plead or been found guilty by judicial proceedings. Such notations were possible because of an informal "follow-up" procedure practiced by the SPD examiners. 93 An unverified investigation was defined as one in whichrua confession was made but in which the examiner issued a written report stating that the subject either was or was not truthfully answering questions concerning the issue at hand. "Truthful" and "deceptive" were defined by the outcome of polygraphic examinations. Crimes against a person were those with direct victim involvement, e.g., in homicide, assault, armed robbery, rape, and certain other sexual offen- ses; property crimes were arson, burglary, larceny, forgery, embezzlement, and malicious destruction of property. Categpries of Folder Assignment Verified Truthful Deceptive. Crimes Against PrOperty Crimes Against Property a Person Crimes a Person Crimes Unverified Truthful Deceptive Crimes Against Property Crimes Against Property a Person Crimes a Person Crimes Figure 3.1.--Stratification Matrix As is apparent from the previous discussion regard- ing the nature of case-folders, some folders contained data pertaining to more than one subject. This presented a problem of assignment to categories. For instance, consider the investigation of a rape case where both the victim and a 94 suspect are given polygraphic examinations. The victim is found to be truthful; the suspect is deceptive and subse- quent to his examination confesses his guilt. It is obvious that such a folder could have been assigned to the category "verified—truthful-crime against a person" (the victim) or "verified-deceptive-crime against a person." When such a problem was encountered the folder was assigned to the "verified-truthful" category irrespective of other data included in the folder. On the other hand, if a folder contained data pertaining to a "verified-deceptive" subject and did not include "verified truth-teller" data it was, of course, assigned to the former category. In instances of unverified examinations, folders were assigned to categories according to predominating data. If, for example, a folder contained the data of three subjects, two of whom were reported truthful and one deceptive, it was assigned to the "unverified truth-teller" category, and depending on the nature of the investigation, to either the "crime against a person" or "prOperty crime" classification. If outcomes were balanced, a "coin toss" resolved the assign- ment problem. Following the assignment procedure discussed above, all folders of each category were consecutively numbered. Then, according to a table of random numbers15 a sample of 15L. Chao, Statistics: Methods and Analyses (New York: McGraw-Hill, 1969), 471-476. 95 112 folders, 14 from each category, was drawn. The sample, however, in Spite of the assignment procedures mentioned above, still included some folders which contained data per- taining to subjects who fell into the same category. In the instance of the homicide previously mentioned, the case folder would have been assigned as "verified-truthful-crime-against- a-person" and would have contained the polygraphic records of the nine subjects so classified. Hence, a second stage in sampling was required. The second stage in sampling consisted of a coin-toss decision of which subject's records should be drawn from a folder when two or more subjects fell into the same category. The purpose of the procedure was to prevent possible inclusion of records of more than one subject from each investigation. By such a restriction the records themselves were insured as independent of each other as possible. For example, if more than one subject's records were drawn from the same folder, the examiner could be reasonably assumed influenced by his knowledge of the outcome of the examination of the first sub- ject when testing the second subject; insights gained while testing the first subject would affect the testing of the second. Sample The sample, then, consisted of the complete battery of polygraphic tests (record sets) of 112 persons (subjects) 96 involved in separate criminal investigations. The record sets of fifty-six of the subjects were verified, that is, the truthfulness of these subjects' responses (answers) to the relevant test questions was "known". An additional 56 record sets were drawn from subjects whose truthfulness was not "known" but whose responses had been designated in examiners' written reports as truthful or not. Within each of the two major categories (verified- unverified) one-half (28) of the record sets were those of persons considered NDI (no deception indicated to relevant questions); and one-half (28) DI (deception indicated). Further, one-half (14) of the record sets within each of the NDI-DI groupings pertained to property crimes, and one- half (14) to crimes against a person. Criteria for Record Sets All record sets drawn in the initial sampling were reviewed by the researcher16 and the Chief Polygraph Examin- 17 r e of the SPD before final selection. The purpose of the review was to insure that all record sets (or tests for each subject) met the following criteria: 1) Physiological data recorded for each subject during each test in respiration, GSR, and cardiovascular activity. 16The researcher had over six years of experience as a practicing polygraph examiner. 17The Chief Examiner did not serve as an evaluator in this study. 97 2) At least two separate control-question tests in which the relevant, irrelevant, and control questions were asked at least once per test. In addition, a standard stim- ulation test, commonly called a "number" or "card" test ad- ministered between the aforementioned control-question tests. 3) Records substantially free of "artifacts" such as those resulting from the subject's effort to "beat" the polygraph;18 exception to this criterion only when such "artifacts" were apparent during a subject's "yes" test but not other tests. 4) All relevant test questions pertinent to the same specific criminal offense, i.e., burglary, rape, etc. Mutual agreement of the Chief Examiner of the SPD and the researcher was required for retention of each record set in the sample. In the few instances when such agreement was not possible the records of another subject were sub- stituted in accordance with the sampling procedure discussed earlier, until the sample quota was met. Characteristics of Subjects A summary of the background characteristics of the subjects from whom the final sample of records was obtained is displayed in Table 3.1. Further summarization of these data indicates that 92 of the subjects were Caucasian, 20 Negroid; 98 male and 14 female. The age of the subjects 18See: Reid and Inbau, Truth and Deception, The Polygraph ("Lie Detector") Technique, pp. cit., 163-165, 98 o.H m.~ m.H 0.0 n.H m.~ o.H o.a end :hflm mg: m.m «.3 93 m4: md add ode cam: mane «Hum maum melee «Hue «Hum mane mane mmcmm 68.63 [sou .mumv cowumospm H.0H e.m m.ma H.m o.m m.HH m.w >.b .>mo .cmum m.om ~.mm m.vm o.vm H.Hm o.om m.mm H.m~ new: vmlma henna nouma mvlna omnma nmuma omuva mvnma mmcmm m? o o N o o o m m mamgpm .02 «a va NH ea «a va Ha m mama .oz xwm m e a N e m m a 3988 .02 Ha oa ma NH oa Ha NH ma cmwmmosmo .oz Qfifln mmEHHO conned a mmEHHO comumm_< mmeflxu comumm_< mmEHuO conned g moflumwumuomnmnu xuummoum umcflmmm xuummoum unnamed >uummoum uncamm< xpummoum emcemmé pesoumxomm mmEHnU mmEHHU mmEHuO mmEHHU 93808 gmfifie 9.3808 3.28pm. UGHMHHm>CD pofimenm> mmauoqmumo vomnpsm .muommnom mo wowumeuonomumnu pesoumxommul.a.m magma 99 ranged from 13-67 with a mean of 24.8; the years of formal schooling completed ranged from 2-12 with a mean of 10.4. Characteristics of Record Sets The criteria cited previously were minimal, some of the record sets, due to the procedure used by the SPD examin- ers containing more tests than others. Moreover, due to the instrumentation change during the years from which the sample was drawn, some record sets reflected the use of a dual respiration—tracing, while others contained only one tracing. A breakdown of these differences in the record sets is given in Table 3.2, which shows that 64 of the record sets contained only the basic battery of tests (CQ Test #1, the "card" Test, and CQ Test #2) while 48 contained additional CQ tests such as the "mixed question test". A dual respiration-tracing was evident in 26 of the record sets. And, although the "yes" tests were eliminated from all record sets for reasons which will be explained shortly, it is clear from the data displayed in Table 3.2 that such tests were administered predominately to "deceptive" subjects. Procedure The Polygraphic Record Sets Preparation.--Following the selection of the sample all record sets were prepared for use in the study. Such preparation consisted of obscuring from the records all 100 .mumme =mmx= meHuesoo poz . m N MH OH O O OH OH oz o NH H 0 OH O o 0 mm» ummu=mm?. HH NH NH m m HH NH OH oz m N H w m m N v mow popuoomm coflumnwmmmm Assn OH e O O O e O n mst mumuumm onmm O OH O OH OH OH 5 e cho Numuumm onmm emummfi 00 .Oz mmEHHU GOmHmmudt mmEHHU conned d mmEHHU conned d. mmEflHU common d pom puoomm muumeoum unnamed xunwmonm unnamed muummoum uncemmm muummoum umcamme mo mmEHHU mmEHHO mmEHHU mmEHHU OHHmHHouomuan m>flummomo Homnushe o>Hummomo Homnuoue pQHMHHw>CD emanaum> meow pnoomm mo mofluommuwu .mumm puoomm mo moaumflumuomquUII.N.m magma 101 writing which identified either the subject or examiner, as well as any other notations not pertinent to the numbering of questions asked, the subject's answers, or adjustments to the recordings. Each subject's tests were then arranged in the sequence in which they were given by the examiner, the sequence for all subjects consisting of at least CQ test #1, the "card" test, and CQ test #2. In cases where addition- al tests had been conducted they were prOperly placed in the sequence, the only exception being "yes" tests. These tests were eliminated from the study for two reasons. First, the interpretation of the "yes" test is not consistent with the interpretation of other tests, such "response" data not being evaluated in the same manner as that of other tests. Second, the majority of all subjects in the sample who were given "yes" tests were indicated as "deceptive". Hence, it was believed that by excluding the "yes" tests, the evalua- tors would not, by noting the mere presence of such a test, be able to infer that a given subject was "deceptive" without having to consider response data. After the masking of extraneous data in the records, all tests for each subject were stacked one on the other, 'with CQ Test #1 on tOp of the "card" test, the "card" test 5 on pcommmuhoo no: mmop mHQm.» $5 5 muoumngm mo 05.898 men. « 4. oooH o m 0 NH me OHM 4. monH o O o mN mv mm 4. omOH h m m OH NO mm 4. oomN N o o m qv Om anm 4. omvH m m o vH Nm mm 4. omm o N o o He mm 4 hNN o H o m mm em m OHH m o O NH mm mm 25 m OHH m o o b He Nm 4. mNN H H m mH mm Hm $8834 @3898 Amos. its $05 Etc 84 «H93 Hoocom mCOHumcHmem mocwHmexm mocmHuwmxm mocwflummxm mqflndha “Hemmd%30m .Qbumma mSfifixfiumgfinn “Rbmqigm .oc .xonmm4 oecmmum>Hom moHHom 38 €8,568 .muoumon>m mo moHumHHmuomnmnO pcdoumxommll.m.m mHnme 108 polygraphic examinations. On the other hand, low experience evaluators had a mean age of 37.6, an average of 1.1 years of experience in polygraphic testing, and had conducted an average of 204 examinations. Operational Measures Evaluators were requested to make several judgments concerning each record set. Such judgments were indicated on two separate answer sheets: an Evaluator Answer Sheet and a Numerical Evaluation Score Sheet. Specimen copies of each of these are diSplayed in Appendix C. An Evaluator Answer Sheet was completed by each evaluator for each of the 112 record sets. On this sheet each evaluator indicated his judgment of "truthfulness- deception" indications, "confidence" and "ease of interpre- tability" of the three basic physiological measures. Accuracy scores.--The truthfulness-deception judg- ment was a tripartite one; that is, each evaluator reviewed each record set blind, i.e., without any knowledge of the characteristics of the subject from whom the records were obtained or the nature of the investigation, and decided if it indicated truthfulness (NDI: no deception indicated to relevant questions), deception (DI: deception indicated to relevant questions), or was inconclusive, (INC: reSponse data did not allow for a determination). Since all evaluators were familiar with the standard notational system used for 109 indicating the various question-types it was unnecessary to identify these in the record sets. Moreover, evaluators were told that their truthfulness-deception judgments were to be based on the complete record set for each subject and that any system of evaluation (visual inspection or numerical evaluation) could be used in forming such a judgment. The accuracy of truthfulness-deception judgments was of particular interest in the study. Thus, for such judg- ments made on verified record sets accuracy (correct judg- ments) was defined as agreement with the known truthfulness or deception of the subject from whom the records had been obtained, using a confession as the criterion measure. It is obvious that such a criterion was unavailable when con- sidering unverified record sets. Hence, judgments made on these sets were defined as correct if the evaluator's judg- ment agreed with that of the testing examiner. By definition all inconclusive judgments were incorrect. Since there were eight categories from which record sets were drawn it was possible for each evaluator to make 14 correct judgments within each category. These raw number scores were, however, transformed to percentages; hence, accuracy-scores refer to the percentage of correct judgments made. Confidence scores.--Each evaluator indicated the degree of confidence in his truthfulness-deception judgment 110 on a six-point scale ranging from no-confidence (l) to almost- certain (6). The scale was similar to that used by Kubis25 and Moroney26 in studies of experimental lie detection. Confidence scores for each evaluator were defined as the sum of the values, or ratings, indicated on the scale for all record sets within each of the eight categories from which the sets were drawn. Hence, for each evaluator such scores had a theoretical range of 70 points (varying from l4—84) in each category, higher scores indicating greater confidence in the judgments made. Ease of interpretability scores.--For each record set evaluators rated the "ease of interpretability" of each of three physiological measures: respiration (abdominal respiration only where a dual recording was apparent), GSR, and cardiovascular activity. Such ratings were indicated on a five-point scale for each measure ranging from "very difficult" (l) to "very easy" (5). Again, the scale used was similar to that of Kubis.27 25J. Kubis, Studies in Lie Detection: Computer Feasi- bility Considerations, Tech.—Report 62-2-5 (Arlington, Va.: Armed Services Technical Information Agency, June, 1962), prepared for Air Force Systems Command, Contract No. AF 30 (602)-22700, Project No. 8834, Fordham University, 1962, 146. 26W. Moroney, "The Detection of Deception as a Function of PGR Methodology" (unpublished Ph.D. dissertation, St. John's University, 1968, Ann Arbor, Michigan: University Microfilms, 1969, No. 69-7125). 27Kubis, Studies pp Lie Detection: Computer Feasi- bility Considerations, pp. cit., 146. 111 There were, in effect, four ease-of—interpretability scores: one for each individual physiological measure with a theoretical range of 56 points (14—70) per category and one for a total ease-of—interpretability considering the three individual scores collectively. The latter score had a theoretical range of 168 points (42-210) in each category. In all cases higher scores indicated greater ease-of—inter- pretability. Numerical evaluation.--The numerical evaluation score sheet was completed by evaluators for the forty record sets which had been identified by a QC on the cardboard retain- ers.28 On this sheet evaluators were required to assign a number on a 7-point scale ranging from -3 to +3 for each of three physiological measures to indicate the perceived difference between each of four relevant-control question pairings in each of two control question tests. A score of -3 to one of the control-relevant question pairs for each measure indicated a dramatically greater response to the relevant question in that pair; a score of +3 indicated a dramatically greater response to the control question. To assure that evaluators consistently paired (and scored) the same relevant-control questions, all such pairs were pre-determined and indicated on the numerical evaluation 28The letters QC refer to "quality control" which is sometimes used synonymously with numerical evaluation although, they are not, in fact, identical concepts. See: R. Brisentine, "Quality Control," Polygraph, 2 (1973), 278-286. 112 sheet. Moreover, as discussed earlier, evaluators were re- quired to score only abdominal respiration in those instances where a dual respiratory recording was evident and only CQ test #1 and CQ test #2 in those instances where additional tests were included in a record set. There were eight basic scores generated for each evaluator for each record set numerically evaluated: a score for each of three measures and a total score (the algebraic sum of the individual scores for the three measures), for each of two tests. However, for purposes of the study such scores were combined in the following manner: a score for each of the three measures was obtained by algebraically summing the scores for each measure for the two tests; a combined score for all measures was derived by algebraically summing the total scores for both tests. Hence, there were four scores obtained for each record set numerically evaluated by each evaluator: a score for each of three measures (physiological components) each with a theoretical range from +24 to -24 and a combined score for the record set with a theoretical range from +72 to -72. Evaluator experience.-—Evaluators were categorized as high-experience, more than 3 years of experience in conducting polygraphic examinations, and low experience, less than 3 years. Although such categorization was arbitrary, it will be noted on inSpection of Table 3.3, page 107, that the criterion naturally sorted the evaluators into two equal groups. 113 Hypotheses Hypothesis-testing procedures were carried out for a series of research hypotheses developed with respect to accuracy scores, confidence scores and ratings, and total ease-of—interpretability scores and ratings. These hypo- theses are presented below along with a summary of their rationale. Accuracy Scores Hypothesis I: High-experience evaluators will attain higher accuracy scores than low-experience evaluators. Rationale.-—Horvath and Reid and Hunter and Ash have reported that experienced evaluators are more accurate (and consistent) in their judgments of polygraphic records than less experienced evaluators; Hypothesis 1 is consistent with these investigators' findings. Hypothesis II: Accuracy-scores on record sets drawn from verified investigations will be higher than those on sets drawn from unverified investigations. Rationale.--Verified investigations are those where the testing examiner correctly identified the guilty person. It is argued that such identification depended upon an appro- priate pre-test interview, stimulation strategies, etc., which in turn led to clearly recognizable physiological responses. Thus, Hypothesis II is based on the assumption that record sets drawn from verified investigations are more dependable than those drawn from unverified investigations. 114 Hypothesis III: Accuracy-scores on record sets of truthful subjects will be higher than those on sets of deceptive subjects. Rationale.--Hypothesis III is consistent with the findings of Horvath and Reid, and the claims of many field- examiners, that errors are made more often on deceptive than truthful subjects; that is, "false negatives" occur more often than "false positives". Hypothesis IV: Accuracy—scores on record sets drawn from investigations concerning crimes against a person will be higher than those on sets concerning property crimes. Rationale.--Hypothesis IV is based upon the assump- tion that when testing subjects involved in crimes against a person, an examiner has access to more detailed information concerning the offense than is typically available in property- crime investigations. Such detailed information leads to more apprOpriate question-formulation and thus more clearly recog- nized physiological responses. Confidence Scores Hypothesis V: High-experience evaluators will attain higher confidence scores than low- experience‘evaluators. Rationale.--It is not known if experienced evaluators have greater confidence in their judgments than do ineXper- ienced. It is reasonable to suSpect that they do, particue larly in view of Horvath and Reid's suggestion that exper- ience enables an evaluator to apply consistently the "fine points" of the theory of control-question testing when making judgments. 115 Hypothesis VI: Confidence—scores will be higher for judgments made on record sets drawn from verified investigations than for those made on sets from unverified investigations. Rationale.--Hypothesis VI is based on the assumption that physiological data in record sets drawn from verified investigations are more dependable than in those from unver- ified investigations. In other words, confidence will in- crease when more clearly recognizable physiological responses are apparent. Hypothesis VII: Confidence-scores will be higher for judgments made on record sets of truthful subjects than those of deceptive subjects. Rationale.--Fie1d examiners maintain that truthful subjects are easier to detect than deceptive subjects, pre- sumably because of clearer response patterns. Hence, confi- dence scores will be greater in such judgments. I Hypothesis VIII: Confidence-scores will be higher for judgments made on record sets drawn from investigations concerning crimes against a person than those concerning property crimes. Rationale.--Assuming that response-patterns are more clearly recognizable when considering record sets of subjects involved in crimes against a person, confidence scores will be greater for judgments of such records. Hypothesis IX: Confidence-ratings will be' higher for correct than for incorrect judgments. Rationale.--Kubis reported that evaluators of experi— mentally derived polygraphic records had greater confidence in correct than in incorrect judgments. Hypothesis IX is consistent with Kubis's findings. Ease of 116 Interpretability Scores Hypothesis X: High-experience evaluators will have higher total ease-of-interpretability scores than will low experience evaluators. Rationale.--If, as Horvath and Reid suggest, exper- ience enables an evaluator to apply consistently the fine points of the theory of control question testing, it is reasonable to suspect that experienced evaluators will re- port polygraphic records easier to interpret than less experienced evaluators. of more will be Hypothesis XI: Total ease-of—interpretability scores will be higher in judgments of record sets drawn from verified investigations than those made on sets drawn from unverified investigations. Rationale.--Assuming that verified records consist clearly recognizable physiological reSponses, they judged easier to interpret than unverified records. Hypothesis XII: Total ease-of-interpretability scores will be higher in judgments of record sets of truthful subjects than those of de- ceptive subjects. Rationale.--Field examiners maintain that the poly- graphic records of truthful subjects are easier to interpret than those of deceptive subjects. Hypothesis XII is consis- tent with this claim. Hypothesis XIII: Total ease-of—interpretability scores will be higher in judgments of record sets drawn from crimes against a person than those of sets drawn from property crimes. fi we re. Pl: A E om res th] dEI £01 117 Rationale.--Hypothesis XIII is consistent with the assumption that clearer response-patterns are evident in those records drawn from subjects involved in crimes against a person than in prOperty crimes. Hypothesis XIV: Total ease-of—interpretability ratings will be higher for correct than for incorrect judgments. Rationale.--Hypothesis XIV is consistent with Kubis's findings that records on which correct judgments were made were easier to interpret than those judged incorrectly. Design and Analysis The design used for hypotheses testing, except with respect to hypotheses #IX and #XIV, was a 2 . 2 x 2 x 2 Split- plot (repeated measures) described by Kirk as type SPF P .qru.29 A dummy data matrix defined in terms of the study is shown in Figure 3.2. Using the design indicated in Figure 3.2 a four-way Analysis of Variance (ANOVA), repeated measures, was carried out to simultaneously test appropriate (null) hypotheses with reSpect to the (research) hypotheses developed for each of three dependent measures generated, accuracy scores, confi- dence scores, and total ease-of—interpretability scores. The four factors were: Verification (verified and unverified); Truthfulness (truthful and deceptive); Crime-type (crimes 29R. Kirk, Experimental Design: Procedures for the Behavioral Sciences (Belmont, Calif.: Brooks/Cole, 1968), 308. 118 Categories of Record Sets Evaluator b b b b b b b b Experience Level 1 l l l 2 2 2 2 c1 c1 c2 c2 c1 c1 c2 c2 d1 d2 d1 d2 d1 d2 d1 d2 81 e2 al e3 94 es 86 e7 a2 ea e9 e10 A = Experience (al=low, a2=high) B = Verification (bl=verified, b2=unverified) C = Truthfulness (cl=truthful, c2=deceptive) D = Crime type (dl=person, d2=property) e = evaluators:A Figure 3.2--Dummy Data Matrix: 2.2x2x2 Split-plot. against a person and property crimes) and Experience of evaluators (high and low), the first three treated as re- peated measures. The testing of appropriate null hypotheses for hypo- theses #IX and XIV was carried out with two-way ANOVA, repeated measures, in a 2.2 Split-plot design as shown in Figure 3.3. 119 The two factors were: Evaluator-experience (high and low) and Judgments (correct and incorrect), treated as repeated measures. Dependent variables treated separately using this design were mean confidence ratings and mean total ease-of-interpretability ratings for correct and incorrect judgments. In all instances the .05 level of significance was established as the decision rule regarding the testing of hypotheses. That is, null hypotheses were rejected when the probability of a Type I error was equal to or less than .05. Judgments Correct Incorrect Evaluator Experience Level Low 3 High Figure 3.3--Dummy Data Matrix: 2.2 Split-plot. 120 To determine the reliability of the numerical scor- ing system the Pearson product-moment correlation coeffi- cient (r) was used. Such correlations were calculated for the set of scores between all possible pairs of evaluators for each of the four numerical scores generated, respiration, GSR, cardio, and combined scores, for the record sets. Analysis of data other than that explained above is more apprOpriately described in the next chapter. Chapter IV RESULTS Accuracy of Judgments Overall, the ten evaluators made 1120 truth/deception judgments; of these, 707, or 63.1 percent, were correct (p< .001).1 The discard of the fifteen "inconclusive" judg- ments made by the evaluators was not sufficient to substan- tially alter the grouped results. Accuracy-scores for individual evaluators in each of the eight categories of record sets are displayed in Table 4.1; as indicated, the total accuracy-scores for the low- experience evaluators ranged from 61.6 to 64.3 percent, for the high-experience evaluators from 53.6 to 69.6 percent.2 The evaluator with the lowest total accuracy score also made the greatest number of "inconclusive" judgments, which, as pointed out in Chapter III, were scored as errors; were these "inconclusives" eliminated, this evaluator's score would be consistent with other evaluators' scores.3 1Using the binominal approximation to the normal dis- tribution and treating the data as though there were two legit- imate outcomes, correct and incorrect. 2The raw numbers on which these percentages are based are displayed in Appendix D, Table D.l. . 3This evaluator reported eight "inconclusive" judgments, one more than all such judgments made by all other evaluators. 121 H.mm m.vo o.om v.om m.hm h.mh m.m> H.Nm 0.0m H4605 m.mm m.Nm H.hh ¢.Hh m.vv v.Hh m.Nh h.mm m.Nm Hmuoannsm H.0m m.vo m.mb w.m> H.hm m.mh m.w> o.om m.Nv OHM w.mm m.Nv m.vw v.Hh m.Nv o.om m.mN_ v.Hn H.5m mm m.vo o.om o.mh w.mh h.mm v.Hh v.Hh m.vm m.vm mm m.vm h.mm h.mm H.nm m.mN v.Hh m.Nm h.mm H.5m m.mo v.H> o.mh v.HO H.hm h.mm m.Nm H.5m m.Nv mm mocwfinmmxm 50H: O.Nm h.mm m.Nm v.Hm v.Hm 0.0m m.vm m.mv H.Ov Hmuoalnsm m.Nm m.Ne O.mm m.vo >.mm m.mn m.m> m.vw 0.0m mm m.Hm h.mm v.Hh e.Hh m.vH o.m> m.Nm a.Nv >.mm em 2 MM «.mw o.mh m.Nm m.vm v.HN h.mm w.wh h.mm o.om , mm m.Hm m.mn m.Nm h.mm v.HN c.00H m.Nm O.mm >.mm Nm m.vm m.Nv v.H> v.Hh m.vo H.nm m.m> m.vo m.em Hm e O m w w w w w e .488ng 30H mucmsmpoh mEHHU conned mEHHU downed mEHHU conned mEHHO pounce muopmon>m uoouuoo huuwmonm umcflmm4 >uuweosm umcflmm4 munwmoum umchm4 auuwQOQm umCHmm4 unwoumm mEHHU mEHHU mEHHO mEHHU Hmuoa m>Humwomo Hsmnushe m>flummo®o Homnusue pmHMHHm>cD peHMHum> meow puooem mo mmHuomoumo 38982.1. H8333 Ho 888 88868..-.H. e and. 123 Hypotheses Main effects.--A four-way analysis of variance (ANOVA), repeated measures, was conducted on the individual accuracy- scores shown in Table 4.1. The four factors were: Verifica- tion (verified and unverified); Truthfulness (truthful and deceptive); Crime Type (crime against a person and property crime), all treated as repeated measures; and Experience of evaluators (high and low). The Hypotheses formulated with respect to accuracy- scores are presented below, along with the results of the ANOVA.4 Hypothesis I: High-experience evaluators will attain higher accuracy-scores than low-exper- ience evaluators. Although overall the high-experience evaluators did attain higher accuracy-scores than did the low-experience group (63.6 and 62.7 percent correct, reSpectively), Hypo- thesis I is not supported by the results of the ANOVA. The main effect pertaining to differences between groups (exper- ience levels) of evaluators with respect to accuracy-scores was not significant [F (l,8)=.ll, p> .10]. There were no significant interaction-effects involving the experience- groupings of evaluators. 4All ANOVA tables not in the text are displayed in Appendix E; Table E.l details the ANOVA results for accur- acy scores. 124 Hypothesis II: Accuracy-scores on record sets drawn from verified investigations will be higher than those on sets drawn from unverified investi- gations. The accuracy-scores of both groups of evaluators combined indicate that 64.1 percent of the judgments made on verified record sets were correct, as Opposed to 62.1 percent on the unverified sets. These data are shown in. Table 4.2. The difference in accuracy between verified and unverified record sets was in the predicted direction but was not significant [F (1,8)=1.42, p> .10]. Apparent, however, was a significant interaction effect involving the verification categories, an effect to be discussed later in this paper. TABLE 4.2.--Accuracy on Record Sets in Verified and Unverified Categories. Category Evaluator . . . Experience Level Verified Unverified Low 65.0% 60.4% High 63.2% 63.9% Combined 64.1% 62.1% Hypothesis III: Accuracy-scores on record sets of truthful subjects will be higher than those on sets of deceptive subjects. Hypothesis III is not supported. As shown in Table 4.3, the evaluators were correct in 51.6 percent of their 125 judgments on "truthful" record sets, and 74.6 percent on "deceptive". This difference was significant [F (1,8)=10.70, p< .01], and contrary to the predicted direction. Two signifi- cant interaction effects complicating the meaning of this re- sult will be subsequently discussed. TABLE 4.3.--Accuracy on Record Sets in Truthful and Deceptive Categories. Category Evaluator Experience Level Truthful Deceptive Low 47.1% ‘ 78.2% High 56.1% 71.1% Combined 51.6% 74.6% Hypothesis IV: Accuracy-scores on record sets drawn fromiinvestigations concerning crimes against a person will be higher than thOse on sets concerning property crimes. Classification of record sets by type of crime (Table 4.4) indicates that the evaluators were correct in 61.6 per- cent of their judgments on record sets concerning "crimes against a person" and 64.6 percent on those concerning "property crimes". This result contradicted the predicted direction but not to a statistically significant extent [F (1,8)=1.54, p> .10]. Two interaction effects involving the classification of record sets by type of crime are dis- cussed below. 126 TABLE 4.4.--Accuracy on Record Sets Classified by Type of Crime. Crime Classification Evaluator Crime Against PrOperty Experience Level a Person Crime Low 61.4% 63.9% High 61.8% 65.4% Combined 61.6% 64.6% Interaction effects.--The ANOVA conducted on accuracy scores revealed two significant interaction effects, a Truth- fulness x Crime type [F (l,8)=55.83, p< .001] and a Verifica- tion x Truthfulness x Crime type interaction [F (l,8)=20.87, p< .002]. A discussion of these effects, considering first the two-way interaction,.follows. Figure 4.1 displays the means for the Truthfulness x Crime type interaction. As shown, these means plot ordin- ally, the record sets of deceptive subjects being judged correctly more often than those of truthful subjects, irres- pective of crime. The higher-order interaction, however, complicates the meaning of the data regarding the two-way interaction. Figure 4.2 displays the mean-accuracy scores for the Verification x Truthfulness x Crime type interaction. Inspec- tion of this figure shows that record sets in the crime-against- 127 ------- Deceptive 100‘ Truthful 90. 3 g 804 79 o 3 ORR...“ Pg 70‘ M““‘-0 70.0 o 59.3 g 601 m h o 50. U m 8 m / 5 / w z 4 - 1 1 Crimes Against Property a Person Crimes Figure 4.1.--Mean percent correct judgments on record sets in the truthful and deceptive categories in the two crime classi- fications. a-person classification were correctly judged more often than all others if they were also in the deceptive category, less often if they were in the truthful category, regardless of the verification. It can also be seen that there was an ordinal effect considering only the crime-against-a-person classification, record sets in the deceptive category being correctly judged more often than those in the truthful cate- gory whether verified or unverified. A disordinal relation- ship obtained considering only the prOperty-crime classifica- tion; record sets in the deceptive category of this classifi- cation were correctly judged more often than those in the truthful category only in the verified condition. 128 ..... __ Deceptive - Crime Against Person Deceptive - Property Crime Truthful - Property Crime M Truthful - Crime Against Person 100. m 90‘ U 5 g 80.. _____________ .. 80.0 ’3 78.6::::: ’ '— n 70. 75-7 “‘“----..._“_‘__N 66.4 *4 ‘ ---- / é 60" / -‘"° 64.3 3 52.1 U 50, 50.0 4.) § 40‘ \— 3 37.9 m 304 g / g f 7 1 1 Verified Unverified Figure 4.2--Mean percent correct judgments on deceptive and truthful crime against a person and property crime record sets for the verified and unverified conditions. Collective Accuracy While no predictions were made with respect to the accuracy of collective judgments of evaluators such accuracy will be briefly discussed here. There were 104 record sets on which six or more evaluators made definitive judgments of truthfulness or deception. When six evaluators agreed, collective judgments were correct in three of thirteen 129 (23.1 percent) such occurrences; when all ten evaluators agreed, eighteen of twenty-one (85.7 percent). When agree- ment between six or more evaluators obtained, sixty-seven of the 104 such agreements were correct (64.4 percent). These data, along with the accuracy of the intermediate levels of evaluator-agreement are shown in Table 4.5, which also shows that there was a positive relationship between collective accuracy and the number of evaluators in agree- ment in their truth/deception judgments. TABLE 4.5.--Accuracy of Collective Judgments of Evaluators. Judgments Number of Correct Incorrect Evaluators Agreeing No. (%) No. (%) 6 3 (23.1) 10 (76.9) 7 10 (55.6) 8 (44.4) 8 17 (68.0) 8 (32.0) 9 19 (70.4) 8 (29.6) 10 18 (85.7) 3 (14.3) TOTAL 67 (64.4) 37 (35.6) Effect of Additional Physiological Data As explained in Chapter III, the record sets used in this study were not uniform in nature, some containing Control-Question tests beyond the basic battery (CQ test #1, "card" test, CQ test #2) and some recorded by a polygraphic 130 instrument with dual respiration-components. Although no predictions were made concerning the effect which these variables would have on the accuracy of judgments, it is of some interest to examine this effect. The percentage of each evaluator's correct judgments for two conditions for each variable vuus calculated. Using these percentages as ) a dependent-variable, t-tests for correlated means (tdep. were conducted to determine if there was a significant difference in accuracy between the two conditions for each variable. It should be noted, however, that the variables themselves were not necessarily independent. Table 4.6 compares the mean percentage of correct judgments on record sets containing a dual respiration tracing to those sets of only a single such tracing. While the table shows percentages for both groups of evaluators, the groups were not treated as a factor in the analysis. As indicated in Table 4.6 correct judgments were made an average of 67.7 percent when a dual tracing was apparent, 61.7 percent when a single tracing was used. Although the accuracy was higher for the first condition, the difference was not significant (t 1.84, p< .10); a two-tailed test dep.= was used since no predictions were made concerning this variable. Table 4.7 displays the mean accuracy of judgments when record sets are dichotomized, with respect to the num- ber of CQ tests, containing only the basic battery of tests 131 and the basic battery plus additional tests. The mean accuracy on record sets in the former category was 71.1 percent, in the latter, 52.5 percent. This difference was significant (t =8.21, p< .001), when a two-tailed test dep was used. TABLE 4.6.--Accuracy of Judgments Based on Number of Respiration Components Recorded. Dual Respiration Recorded Evaluator Experience Level Yes No Low 64.6% 62.1% High 70.8% 61.4% Combined 67.7% 61.7% t dep. = 1.84, df= 9, p.<.10 TABLE 4.7.--Accuracy of Judgments Based on Number of Control Question Tests in Record Sets. Number Control Question Tests Evaluator Experience Level Basic Battery Only Basic Battery + Low 68.4% 55.0% High 73.8% 50.0% Combined 71.1% 52.5% ttdep. = 8.21, df = 9, p.<.001. 132 Reliability of Judgments The extent of agreement of all evaluators on all record sets irrespective of the correctness of judgments, is apparent from inspection of the data presented in Table 4.5, page 129. It is of interest to examine these data and evaluator-reliability in greater detail. Of the 112 record sets, 104 were agreed upon by six or more evaluators, as indicative of either truthfulness or deception. In the eight instances where such agreement was not apparent, five were even splits (five evaluators making judgments of truthfulness and five of deception). In the remaining three instances, inconclusive judgments were rendered by one or more evaluators, precluding majority agreement, because of the distribution of definitive judg- ments. To determine the extent of inter—evaluator agreement, irrespective of accuracy, the percentage of agreements in judgments between all possible pairs of evaluators were calculated; since there were ten evaluators, forty-five pairings were pdssible. These percentages, displayed in Table 4.8, ranged from 53 to 90 percent, with a mean of 69 percent. In other words, two evaluators agreed on an average of 69 percent of the time that any particular record set indicated truthfulness or deception, or was inconclusive. Further analysis of the reliability of evaluator- judgments was made by calculating Hoyt's intra-class 133 ow I m on ow I m be we vm I h Hh Hh mm me I 0 mm «b mm mm Hm I m no mm mm v0 ms he I 4 v5 Nb mo Nh vb he Hm I m we H5 mm vb on Nm mm mm I N on mm me He mm mm H5 on ow H OH a m h o m v m N muoumsam>m mocmHummxm anm mocwHHmmxm 30H .mnoumsHm>m mo mucmampoh confine c«.nucmfimwum4 mo mmmucmoummII.m.¢ mqm4a 134 (reliability) correlation-coefficient for ratings, as described by Ebel.5 Such correlations were calculated separately for judgments made on verified and unverified record sets by converting all evaluators' judgments to numerical values (1=truthfu1, 2=inconc1usive, 3=deceptive) and conducting a two-way analysis of variance on these values; the two factors were Records (N=56), and Evaluators (N=lO). The resulting mean squares were then used to deter- mine reliability-coefficients.6 The reliability coefficients for both verified and unverified record sets were quite similar, .89 and .85, respectively, indicating that there was substantial relia- bility for the ratings (judgments) of all evaluators on record sets in both categories. Said in another way, the variability between the ten evaluators with respect to their judgments of truthfulness/deception indications in the record sets was relatively low. Confidence in Judgments Confidence scores were the sum of the values, or ratings, indicated by evaluators on a six-point scale for each record set in each of the eight categories from which 5R. Ebel, "Estimation of the Reliability of Ratings," in W. Mehrens and R. Ebel (Eds.), Principles pf Educational and Ppychological Measurement (Chicago: Rand McNally, 1967), 116-131. 6In terms of analysis of variance, for this situa- tion, the coefficient was the ratio of the mean square for records minus that for error to the mean square for records. 135 the sets were drawn. Such scores had a theoretical range of 70 points (14-84) per category, higher scores indicating greater confidence.7 Hypotheses Main effects.--A four-way ANOVA, repeated measures, was conducted to test the main-effect hypotheses formulated with respect to evaluators' confidence-scores. These hypo- theses, along with the results of the ANOVA, are presented below.8 Hypothesis V: High-experience evaluators will attain highér confidence scores than low- experience evaluators. While the high-experience evaluators did report greater confidence in their judgments than the low-experience group, with mean confidence scores of 56.5 and 51.7, respec- tively, this difference was not significant [F (l,8)=l.77, p> .10]. Thus, Hypothesis V is not supported. There were no significant interaction-effects associated with experience- 1evels of evaluators pertaining to confidence scores. Hypothesis VI: Confidence-scores will be higher for judgments made on record sets drawn from verified investigations than for those made on sets from unverified investigations. As indicated in Table 4.9 the mean confidence-scores for all evaluators on record sets in the verified category 7Confidence scores for individual evaluators are dis- played in Appendix D, Table 0.2. 8The ANOVA table for confidence scores is displayed in Appendix B, Table E.2. 136 was 54.5, in the unverified category, 53.7. This differ- ence, although in the predicted direction, was not signifi- cant [F (1,8)=.53, p> .10]. However, a significant inter- action effect involving the Verification factor did emerge from the analysis; this effect will be discussed shortly. TABLE 4.9.--Mean Confidence Scores on Verified and Unverified Record Sets. Evaluator Category Experience Level Verified Unverified Low 52.9 50.5 High 56.2 56.9 Combined 54.5 53.7 Hypothesis VII: Confidence-scores will be higher for judgments made on record sets of truthful subjects than those of deceptive subjects. Table 4.10 displays the mean confidence-scores for both groups of evaluators on record sets in the truthful TABLE 4.10.--Mean Confidence Scores on Record Sets Classi- fied as Truthful and Deceptive. Category Evaluator .. Experience Level Truthful Deceptive Low 49.3 54.1 High 55.0 58.1 ‘Combined 52.1 56.1 137 and deceptive categories; as shown, for all evaluators the mean in the truthful category was 52.1, in the deceptive category, 56.1. This difference was significant [F (1,8)= 64.17, p< .001] but Opposite the predicted direction, and its meaning is complicated by an interaction effect. Hypothesis VIII: Confidence-scores will be higher for judgments made on record sets drawn from in- vestigations concerning crimes against a person than those concerning prOperty crimes. As predicted, confidence-scores were higher on judg- ments of record sets in the crime-against—a-person category than in the prOperty-crime categoryy the mean scores being 54.2 and 54.0, respectively. These data are shown in Table 4.11. However, Hypothesis VIII is not supported by the results of the ANOVA since the difference between the con- fidence-scores pertaining to crime classification was not significant [F (1,8)=.08, p> .10]. There were no significant interaction-effects with respect to confidence-scores in- volving crime classification. TABLE 4.11.—-Mean Confidence Scores on Record Sets Classi- fied by Type of Crime. Crime Classification Evaluator . Experience Level Crime Against A Person Property Crime Low 52.3 51.1 High 56.2 56.9 Combined 54.2 54.0 138 Interaction effects.--A significant Verification x Truthfulness interaction-effect was apparent in the results of the ANOVA conducted on confidence-scores [F (l,8)=6.23, p< .03]. The nature of this interaction can be discerned from inSpection of Figure 4.3 which displays the mean con- fidence-scores for record sets in the truthful and deceptive categories for the verified and unverified conditions. s4} 4[ ------- Deceptive 60. Truthful " 59. V °.° v 58< H 57.2. I 57 . 0) “.‘§ U" ~~‘\‘ g 56 ‘ “~\~‘~‘. m 55 ‘ ”NM... 54. 9 a) O u 8 54‘ m g 53‘ m 4_fi 52.3 :3 52‘ 51. 9 ,1 “a o 51 1 U 5 )’ °’ { 2 I I Verified Unverified Figure 4.3.--Mean confidence scores on record sets in the deceptive and truthful categories for the verified and unverified conditions. 139 As shown in Figure 4.3, the interaction mentioned above was ordinal in nature; that is, mean confidence- scores were greater for record sets in the deceptive than the truthful category across the two levels of verification. Moreover, it is apparent that mean confidence-scores de- creased from verified to unverified for sets in the deceptive category while they increased slightly for sets in the truth- ful category. Confidence Ratings and Accuracy of Judgments Table 4.12 diSplays the mean confidence-ratings for both groups of evaluators for correct and incorrect judg- ments. As shown, the high-experience evaluators had a mean confidence-rating of 4.2 on correct judgments, 3.8 on in- correct; the low—experience evaluators mean ratings of 3.8 and 3.5 for correct and incorrect judgments, respectively. TABLE 4.12.--Mean Confidence Ratings of Evaluators' Judgments. Judgments Evaluator Experience Level Correct Incorrect Low 3.8 3.5 High 4.2 3.8 Combined 4.0 3.6 140 Using each evaluator's mean confidence-rating on correct and incorrect judgments as the dependent variable, a two-way ANOVA, repeated measures, was carried out. The two factors were Judgments (correct and incorrect), treated as repeated measures, and Evaluator-experience (high and low). No prediction was made concerning the main effect for experience levels; Hypothesis IX, however, concerning the main effect for judgments, is presented below, along with the results of the ANOVA. Hypothesis IX: Confidence-ratings will be higher for correct than for incorrect judgments. As indicated in Table 4.12, the mean confidence- rating for all evaluators on correct judgments was 4.0; on incorrect, 3.6. As can be seen from inSpection of Table 4.13, which details the results of the ANOVA, this differ- ence was significant [F (l,8)=21.55, p< .002]; Hypothesis IX is supported. The main effect for experience, as also shown in Table 4.13 was not significant, nor was there a significant Experience x Judgment interaction. TABLE 4.13.-—Ana1ysis of Variance Table for Mean Confidence Ratings on Correct and Incorrect Judgments. Source SS df MS F P< A (A=Experience) .58 1 .58 1.82 .25 E (E=Eva1uators):A 2.54 8 .32 J (J=Judgments) .72 1 .72 21.55 .002 A x J .00 1 .00 0.00 - J X E:A .27 8 .03 TOTAL 4.11 19 141 Ease—Of—Interpretability Of Record Sets For all record sets in each of the eight categories, evaluators rated the ease-of-interpretability of respira- tion, GSR, and cardiovascular activity. Ease-of—interpre- tability scores for each component had a theoretical range of 56 points (14-70) per category. A total ease-of—inter- pretability score for each record set was derived by summing the ratings for individual components; this score had a theoretical range of 168 points (42-210) in each of the eight categories. In all cases, higher scores indicated greater ease-of—interpretability.9 Hypotheses Main effects.--A four-way ANOVA, repeated measures, was conducted on evaluators' total ease-of-interpretability scores.10 The hypotheses formulated with respect to these scores and the results of the ANOVA are discussed below. Hypothesis X: High-eXperience evaluators will have higher total ease-of-interpretability scores than will low-experience evaluators. The high-experience evaluators had a mean total ease- of-interpretability score of 116.2, the low-experience group, 106.6. Although this result was in the predicted direction, it was not significant [F (l,8)=l.03, p> .10]; Hypothesis X 9Total ease-of—interpretability scores for indivi- dual evaluators are displayed in Appendix D, Table D.3. 10The ANOVA Table for total ease-of-interpretability scores is displayed in Appendix E, Table E.3. 142 is not supported.‘ There were no significant interaction- effects associated with the Experience factor regarding total ease-of—interpretability scores. Hypothesis XI: Total ease-of—interpretability scores will be higher in judgments of record sets drawn from verified investigations than those made on sets drawn from unverified investigations. The mean total "ease—of—interpretability" score for all evaluators on record sets in the verified category was 112.9, in the unverified category, 109.9. This difference was significant [F (l,8)=7.65, p< .02]; therefore, Hypo- thesis XI is supported. However, because of a significant interaction-effect involving the Verification factor, the meaning of this main effect will be discussed later. Hypothesis XII: Total ease-of-interpretability scores will be higher in judgments of record sets of truthful subjects than those of de- ceptive subjects. Total "ease-of—interpretability" scores for record sets in the deceptive category had a mean of 115.9; in the truthful, 106.9. This result was significant [F (1,8)= 37.99, p< .001], but opposite the predicted direction; Hypothesis XII is not supported. A significant interaction- effect involving the Truthfulness factor will be discussed shortly. Hypothesis XIII: Total ease-of-interpretability scores will be higher in judgments of record sets drawn from crimes against a person than those of sets drawn from property crimes. 143 Classification of record sets by type of crime shows that the mean total "ease" score for sets in the crime- against-a-person category was 113.7, in the prOperty-crime category, 109.0. This difference was significant [F (1,8)= 8.22, p< .02], and therefore Hypothesis XIII is supported by the results of the ANOVA. There were no significant inter- action-effects associated with classification of record sets by type of crime. Interaction effects.--A significant Verification x Truthfulness interaction-effect was apparent from the results of the ANOVA conducted on total "ease" scores [F (l,8)=9.l3, p< .02]. The ordinal nature of this interaction can be seen in Figure 4.4; mean total ease-of—interpretability scores were higher for record sets in the deceptive category than in the truthful category across both levels of the Verifica- tion factor. It is also apparent that such scores increased for record sets in the truthful category from the verified to the unverified condition, while they decreased for sets in the deceptive category. Total Ease-of—Interpretability Ratings and Accuracy Table 4.14 diSplays the mean total ease-of-interpre- tability ratings for both groups of evaluators on correct and incorrect judgments. On correct judgments the mean rat- ing for the low-experience group was 7.8, for the high 144 5‘ g 210 :§ ------- Deceptive 4.) a / —-—-- Truthful e 125. 3 A g g 120 . 120...“ *r ‘r ~~~~~~~~ ‘2; 2 115. ........... m I ‘‘‘‘‘‘‘ U) Q) 110 .. " 112 :3 m HH. 108 '3 a 105 .4 106? p v o 00 m 8 f 224 _ J l Verified Unverified Figure 4.4.--Mean total ease-of-interpretability scores for record sets in the truthful and deceptive categories for the verified and unverified conditions. experience group 8.5; for each group the mean ratings were higher on correct than incorrect judgments. TABLE 4.14.--Mean Total Ease-of—Interpretability Ratings of Evaluators' Judgments. Judgments Evaluator Experience Level Correct Incorrect Low- 7.8 . High 8.5 7.9 Combined 8.1 7.6 145 Using each evaluator's mean total ease-of—interpre- tability rating on correct and incorrect judgments as the dependent variable, a two-way ANOVA, repeated measures, was carried out. The two factors were Judgments (correct and incorrect) treated as repeated measures, and Evaluator- experience (high and low). No prediction was made concern- ing the main effect for the Experience factor; the hypothe- sis pertaining to the main effect for the Judgment factor is discussed below. Hypothesis XIV: Total ease-of—interpretability ratings will be higher for correct than for incorrect judgments. As shown in Table 4.14 the mean total ease-of- interpretability rating for all evaluators on correct judgments was 8.1; on incorrect, 7.6. As can be seen from inSpection of Table 4.15, which details the results of the ANOVA, this difference was significant [F (l,8)=41.32, p< .001]; Hypothesis XIV is supported. The main effect for experience was not significant nor was there a significant Experience x Judgment interaction-effect. Ease-of—Interpretability of IndividuaIiPhysiolqgical Components Ease-of-intpppretability ratings.--For both groups of evaluators for all record sets, mean ease-of-interpre- tability ratings were highest for respiration, followed by cardiovascular activity and GSR, reSpectively. These data are shown in Table 4.16. This result is not consistent with 146 the results reported by Kubis in a study of experimental lie-detection where such ratings were highest for GSR, cardio- . . . . . ll vascular act1v1ty, and respiration, in order. TABLE 4.15.—-Analysis of Variance Table for Mean Total Ease-Of-Interpretability Ratings on Correct and Incorrect Judgments. Source 55 df MS F P< A (A=Experience-Low, high) . 2.31 1 2.31 1.06 .25 E (E=Eva1uators):A 17.45 8 2.18 J (Judgments-correct, incorrect) 1.25 l 1.25 41.32 .0003 A X J .02 l .02 .59 - J X E:A .24 8 .03 TOTAL 21.27 19 TABLE 4.16.--Mean Ease-Of—Interpretability Ratings of The Three Physiological Components on All Record Sets. Physiological Component Evaluator Experience Level ReSpiration GSR Cardio Low 2.87 2.30 2.40 High 3.00 2.55 2.75 Combined 2.93 2.43 2.57 11J. Kubis, Studies pp Lie Detection: Computer Feasi- bility Considerations, Tech. Report 62-205 (Arlington, VA.: Armed Services Technical Information Agency, June, 1962), pre- pared for Air Force Systems Command, Contract No. AF 30 (602)- 22700, Project No. SS34, Fordham University, 1962, 70. 147 Treating each evaluator's mean-rating for correct and incorrect judgments for each component as a dependent variable, t-tests for correlated means (t ) were con- dep. ducted. Since no predictions were made concerning differ- ences between the two judgments for individual components, two-tailed tests were used and although mean ratings for both experience-levels of evaluators are presented in Table 4.17, the levels were not treated as a factor in the analy- sis. As shown in Table 4.17, the mean ease-of—interpre- tability ratings were higher in correct than incorrect judgments for both respiration, 3.11 and 2.74, and cardio- vascular activity, 2.64 and 2.47; these differences were significant (p< .001; t =6.4 and 7.3, respectively). dep For GSR the mean rating of correct was very slightly lower than of incorrect judgments and not significant (tdep =-.25, p> .10). TABLE 4.17.--Mean Ease-Of—Interpretability Ratings of The ‘ Three Physiological Components on Correct and Incorrect Judgments. Physiological Component Evaluator ReSpiration GSR Cardio Experience Level Cor; Incor.* Cor. Incor.: Cor. Incor. Low 3.07 2.73 2.25 2.32 2.46 2.29 High 3.15 2.74 2.57 2.52 2.81 2.64 Combined 3.11 2.74 2.41 2.42 2.64 2.47 148 Ease-of—interpretability scores.--While no pre- dictions were made concerning the ease-of—interpretability scores of individual physiological components, a limited discussion of these results will be undertaken here. Separate four-way analysis of variance, repeated measures, was conducted, treating as dependent variables the ease-of- interpretability scores for the three components. The four factors were identical to those discussed in previous sec- tions of this chapter dealing with such analysis: Experi- ence, Verification, Truthfulness, and Crime-type, the latter three treated as repeated measures. The results of these three analyses are discussed below.12 (1) ReSpiration.--The ANOVA conducted on the respir- ation ease—of—interpretability scores revealed no signifi- cant main effects for the experience or crime-type factors. However, reSpiration was judged easier to interpret on record sets in the verified than the unverified category [F (l,8)=40.87, p< .001], and easier in the deceptive than the truthful category [F (l,8)=102.65, p< .001]. The inter- pretation of these main effects, however, is complicated by interaction-effects which emerged from the analysis. The mean respiration "ease" scores pertaining to two of these 12ANOVA tables for ease-of—interpretability scores for respiration, GSR, and cardiovascular activity are displayed in Appendix E, Tables E.4, E.5, and E.6, respectively. 149 interactions are shown in Figures 4.5 and 4.6, which, generally, indicate that respiration was judged easier to interpret on record sets in the crime-against-a-person category than in the property-crime category, and easier on record sets in the deceptive category than in the truth- ful, across the levels of the Verification factor, respec- tively. 70 d 471’ 46‘ 4sj 44. 43. 42‘ 41‘ 39* 38. Mean Resp. Ease-of-Interpretability \ Scores (range - 14-70) .___umu-u Crime Against Person Property Crime 38.7’ I l Verified Unverified Figure 4.5.--Mean respiration ease-of-interpretability scores for record sets in both crime classifications for the verified and unverified conditions. 150 4 ------- Deceptive 47 -( 46.2 . -—-—-- Truthful 46 . ‘‘‘‘ "4 45 A ‘‘‘‘‘ H “\ «4 ‘s‘. a 44 q 44 “~. 0’ “~\ § 43 1 “\“~ 0 “\ A “m. 41. g o 42 4 9 7' '7 ‘§:: 41 4 g I w m 40“ m 2‘ 39 39. 3 H 3'. 3 ‘ H. 38.8 g 8 38 fl, 5 .. ’ “’ 8 1 2 U) - 1 I Verified Unverified Figure 4.6.--Mean respiration ease-of—interpretability scores on record sets in the truthful and deceptive categories for the veri- fied and unverified conditions. Two other significant interaction effects which] emerged from the analysis are shown in Figures 4.7 and 4.8. As can be seen from inspection of these figures the inter- actions are disordinal in nature; in Spite of this it can be stated that reSpiration was judged essentially easier to interpret for both crime types in the deceptive category than in the truthful category (Figure 4.7), and for both groups of evaluators in the verified condition than in the unverified (Figure 4.8). 151 4 411 464 45‘ 44. 43,4 424 414 40. 394 384 I .... .......... Property crime Crime against person 45.4 I Mean Resp. Base-of-Interpretability Scores (range-l4—70) / T . . Truthful Deceptive Figure 4.7--Hean respiration eaee-of—interpretability ecoree for record 70. 47‘ 46- 454 444 434 42. 41. 404 39< 384 eete in the two crime classifications in the truthful and deceptive categories. \v High lap. ____.____. Lav Exp. Mean Resp. Ease-of-Interpretability Scores (range-14-70) L ’1 Verified Unverified Figure 4.8.--Mean respiration eaee-of-interpretability scores for high and low experience evaluators on record sets in the verified and unverified conditions. 152 (2) §§3.--No significant main effects were apparent from the ANOVA conducted on the ease-of-interpretability scores for GSR. Two significant interaction effects, how- ever, were apparent, a Verification x Truthfulness effect [F (l,8)=12.l3, p< .008], and a Verification x Truthfulness x Crime type effect [F (l,8)=6.29, p< .04]. The mean scores pertaining to the first of these interactions are shown in Figure 4.9, for the second in Figure 4.10. However, the meaning of these interactions is too obscure to be discussed here. 70 \ l 404 -n- Deceptive 39« . Truthful 38. 37. 36. 35.8 . ‘Q Q . Q Q ‘ Q Q .Q Q Q~ ‘Q 0 Q ~Q Q Q Q. ‘ ‘Q Q‘ Mean GSR Base-of-Interpretability Scores (range - 14-70) I» H / - I 1 . verified unverified Figure 4.9.-Mean GSR ease-of-interpretability scores on record sets in the truthful and deceptive categories for the verified and unverified conditions. 153 70 __ ' _ Deceptive - Crime Against Person Truthful - Crime Against / . / Person 40. : Oeee. : -> Deceptive - Property Crime 39 __ ........ Truthful - Property Crime 38. 37.1 >1 37‘ \‘\‘ :1, \‘s \\ 74 36* N. 35.7 E w 35. u g3 34. 33‘: ' H 334 e ___.—-"‘—" ' wI p I 3 8. 32. 3 5 31. ‘ ‘0 31.3 H m V 8 3 30+ : u 1’ 8 8 2:0 «I _ l l Verified Unverified Figure 4.10.--Mean GSR ease—of—interpretability scores on deceptive and truthful crime against a person and property crime record sets for the verified and unverified conditions. (3) Cardio.--The only significant effects which were apparent from the ANOVA conducted on the cardio "ease" scores concerned the main effects for the Truthfulness and Crime- type factors. Examination of the mean scores for these effects Shows that cardiovascular activity was judged easier to interpret On record sets in the deceptive than in the truthful category, with means of 37.7 and 34.3, respectively 154 [F (l,8)=59.27, p< .001]; and easier for record sets in the crime-against-a-person than the prOperty-crime category, with means of 37.2 and 34.8, respectively [P (l,8)=9.87, p< .01]. Numerical Evaluation Forty of the 112 record sets, five from each of the eight categories from which sets were drawn, were numerically scored by evaluators; scores for each of the three physio- logical components in each set had a theoretical range from plus 24 to minus 24; a combined score (the sum of the scores for the three components) had a theoretical range from plus 72 to minus 72. In all cases positive scores indicated greater responsiveness to control-questions in a record set, i.e., truthfulness; negative scores, greater responsiveness to relevant questions, i.e., deception. Accuracy The accuracy-scores discussed previously in this chapter were not independent of evaluators' numerical scores; comparisons are thus inappropriate. However, because numer- ical evaluation provides a mean of assessing the relative accuracy of the individual physioloqical components, a brief description of the accuracy of such evaluation follows; Only seven of the ten evaluators, four in the low- experience group, three in the high-experience group, scored the record sets assigned to numerical evaluation. To deter— mine the accuracy of these evaluators' judgments as based 155 solely on numerical scores, a procedure reported by Barland 13 . . . . For combined scores a dec151on-ru1e which was used. categorized as "inconclusive" all scores from plus to minus four was applied; that is, for scores on record sets in the truthful category (of which there were twenty assigned to numerical evaluation) any combined score greater than plus four was correct, less than minus four, incorrect, between plus and minus four, inconclusive. For record sets in the deceptive category, the reverse of this procedure deter- mined correct and incorrect judgments. For the scores of individual components the decision rule used determined as inconclusive all scores from plus to minus one, inclusive. Table 4.18 displays the average accuracy obtained when the decision rules discussed above were applied to the scores of all evaluators. For combined scores, 42 percent were correct, 32 percent incorrect, and 26 percent incon- clusive. For individual components cardio scores were the most accurate and GSR the least accurate, at 44 and 37 percent, respectively. It should also be noted that the scores for GSR were inconclusive almost twice as often as those for the other two components. l3G. Barland, "The Reliability of Polygraph Chart Evaluations" (paper presented at American Polygraph Associa- tion Seminar, August 15, 1972, Chicago, Illinois). 156 TABLE 4.18.--Average Percent Accuracy of Evaluators' Judgments Based on Numerical Scores. Judgments Component Correct Incorrect Inconclusive* Respiration 43% 37% 20% GSR 37% 24% 39% Cardio 44% 37% 19% Combined 42% 32% 26% *The boundaries of the inconclusive region were i l, inclu- sive, for each individual component and :,4, inclusive, for the score for all components combined. When inconclusive judgments are eliminated, the average accuracy of all seven evaluators was 57 percent for combined scores, 53, 54, and 60 percent for respiration, cardio, and GSR scores, respectively. These data are dis- played in Table 4.19, which also shows the accuracy of individual evaluators, excluding inconclusive scores. TABLE 4.19.--Percent Accuracy of Individual Evaluators Based on Numerical Scores (Excluding Inconclusives*); Evaluator l 2 3 4 5 6 7 Mean Component ReSpiration 57 47 53 52 62 53 51 53 GSR 63 61 59 65 61 63 47 60 Cardio 48 62 51 57 53 58 50 54 Combined 59 55 56 58 62 55 56 57 *The boundaries of the inconclusive region were :14 inclus- sive, for each individual component and i 4, inclusive, for the score for all components combined. 157 A further analysis of the accuracy of judgments based on numerical scores is shown in Table 4.20, which compares the average accuracy for record sets in the veri- fied to that in the unverified category, excluding incon- clusives. In the former category, GSR scores were correct an average of 63 percent; this was higher than the accuracy of the other two individual components and of the combined scores. For record sets in the unverified category combined scores were more accurate at 61 percent, than those of the individual components; respiration scores were slightly more accurate, at 59 percent, than either GSR or cardio scores. TABLE 4.20.——Average Percent Accuracy on Verified and Unverified Record Sets Based on Numerical Scores (Excluding Inconclusives*). Category Component Verified Unverified Respiration 47% 59% GSR 63% 58% Cardio 50% 58% Combined 53% 61% *The boundaries of the inconclusive region were 1L1: inclus- sive, for each individual component and i 4, inclusive, for the score for all components combined. 158 Reliability To determine the reliability, i.e., the extent of inter-evaluator agreement, of the scores derived from numer- ical evaluation, Pearson product-moment correlation coe- fficients (r) were computed for the set of scores for each of the possible pairs of evaluators. Since there were seven evaluators, twenty-one pairings were possible; corre- lations were calculated for each of these pairs for respir- ation, GSR, cardiovascular, and combined scores. Table 4.21 displays the correlation matrix for the pairs of evaluators with respect to combined scores. As indicated, these correlations ranged from .45 to .82; the 14 Tables 4.22, 4.23, and 4.24 display the mean was .65. correlations obtained for respiration, GSR, and cardio scores, respectively. The range for respiration scores was from .35 to .82, with a mean of .60; for GSR, from .61 to .86, with a mean of .74; and for cardio, from .33 to .78, with a mean of .60. Thus, there was greater agreement between evaluators on GSR scores than on either of the other two components or on combined scores. To clarify the reliability of numerical scoring, correlations were calculated using the scores for the pair- ings of evaluators on the record sets in the verified and unverified categories separately. The complete correlation 14All mean correlations were calculated using the r-Z transformation on raw'correlation coefficients. 159 TABLE 4.21.--Correlations of Combined Scores. Evaluator l 2 3 4 5 6 7 1 .70 .51 .72 .62 .62 .73 2 .52 .76 .59 .71 .64 3 .45 .58 .46 .68 4 .74 .69 .77 5 .60 .82 6 .60 TABLE 4.22.--Correlations of Respiration Scores. Evaluator 1 2 3 4 5 6 7 l .65 .40 .71 .46 .61 .65 2 .35 .70 .45 .65 .61 3 .45 .56 .35 .66 4 .62 .66 .71 5 .51 .82 6 .65 TABLE 4.23.--Correlations of GSR Scores. Evaluator l 2 3 4 5 6 7 l .81 .66 .82 .83 .86 .79 2 .67 .73 .68 .83 .66 3 .71 .65 .67 .61 4 .78 .82 .71 5 .79 .67 6 .67 160 TABLE 4.24.--Correlations of Cardio Scores. Evaluator 1 2 3 4 5 6 7 l .62 .49 .64 .63 .40 .74 2 .53 .73 .62 .63 .61 3 .33 .44 .41 .57 4 .77 .58 .74 5 .59 .78 6 .55 matrices for these data are displayed in Appendix F, Tables F.1 through F.8. The mean correlations for these data, how- ever, are displayed in Table 4.25; as shown, in all cases the mean correlations were higher for record sets in the unverified than in the verified category, although none of the differences were significant when tested by a t-test )15 for correlated means (t dep. TABLE 4.25.--Comparison of Mean Correlations of Numerical Scores of Verified to Unverified Record Sets. Record Sets Score Verified E Unverified E t dep. Cardio .60 .65 -l.85* Respiration .60 .65 - .92** TOTAL .64 .70 -l.65** * p< .10 ** p> .10 15 The t-tests were calculated in all cases by trans- forming the correlations to Z-variables; these variables were then used as the dependent measure. Since no predictions were made, two-tailed tests were used. Chapter V DISCUSSION The results of this study essentially indicate the following: (1) That depending solely on polygraphic record- ings obtained from field examinations conducted by control- question technique, the judgments of trained evaluators are accurate well beyond chance levels. (2) That there is substantial agreement (reliability) among evaluators con- cerning truth/deception judgments made on polygraphic recordings. (3) That the nature of polygraphic recordings -- the categories from which they are drawn -- is a more impor- tant variable in blind analysis than is the experience of evaluators. Accuracy of Judgments Thatrmnxaexperienced evaluators in this study were not significantly more accurate in their judgments than those less experienced is generally contrary to the findings of previous researchers. A plausible explanation for this difference lies in the definition of "experience". Horvath and Reid, for instance, found that incompletely trained evaluators with less than six months' experience were less accurate than those fully trained and with varying degrees 161 162 of active experience. In the present study, however, all evaluators had completed a formal training course, and although some were still interns, all had a minimum of eight months' active experience. It is reasonable to suspect, therefore, that given evaluators of a minimum level of exper- ience, the nature of recordings is more critical in blind analysis than is experience per se. The effect of the specific sources of the polygraphic recordings on accuracy is apparent in analysis of the inter- action-effect pertaining to accuracy scores, as shown in Figure 4.2, page 128. In all but one of the eight categories of record sets, accuracy was higher for those of deceptive than of truthful subjects. This finding contrasts with prior research reported by field examiners but is consistent with results reported by Barland in his experimental study of lie-detection.1 However, it is also apparent that this finding is complex and intricate. Inspection of Figure 4.2 shows that record sets in the "crime against a person" classification were judged deceptive more often than all others; hence the likelihood of false positives was greatest, and of false negatives least, in this classification, regardless of verification. The most likely explanation of this result is that relevant 1G. Barland, "An Experimental Study of Field Techniques in Lie Detection," (unpublished Master's Thesis, University of Utah, 1972), 38. 163 questions pertaining to investigations of crimes against a person elicit stronger physiological responses from both truthful and deceptive subjects than do such questions per- taining to prOperty-crime investigations. In other words, crimes against a person are, by nature, more emotionally weighted, a condition heightening the possibility of false positives in blind analysis of physiological responses. There is no completely satisfactory explanation for other aspects of the interaction pertaining to accuracy- scores. For instance, it is not clear why accuracy in- creased from the verified to unverified condition in judg- ments made on record sets in the "deceptive/crime against a person" and "truthful/property crime" categories when it decreased for other categories of record sets. Nor is it obvious why in the unverified condition record sets of truthful subjects in the property crime classification were more accurately judged than were those of deceptive subjects in the same classification. The latter finding, however, probably reflects the lack of uniform numbers of control question tests in record sets. Evaluators were more accurate on record sets limited to the basic battery of control—question tests than those including additional tests. Inspection of the distribution of record sets containing only the basic battery indicates six such sets in the "truthful/property crime" and only four in the "deceptive/prOperty crime" category, both in the 164 unverified condition.2 Thus, it is possible that higher accuracy in the former category was due to the predominance of more accurately judged record sets. In spite of this possibility, however, it is notable that other results pertaining to accuracy-scores are not explained by differ- ences in the number of control-question tests in record sets. That evaluators were more accurate on record sets with less rather than more physiological data (CQ tests) conflicts with Rouke's results. Rouke reported greater accuracy and reliability when evaluators of experimentally- derived lie-detector (GSR) recordings were given additional data.3 It seems likely that in the present study record sets containing only the basic battery of C0 tests were clearer in their indications of truthfulness and deception than those including additional tests. This explanation suggests that the examiners who actually conducted the testing supplemented it with additional tests when the basic battery was ambiguous in its indications, that additional tests and "stimulation" strategies may not clarify response- data to the extent which field examiners contend. Barland, in experimental lie-detection, reported that when he combined the numerical scores of a group of evaluators, the combined "average scores" were more accurate than the 2See Table 3.2, page 100. 3F. Rouke, "Evaluation of The Indices of Deception in the Psychogalvanic Technique" (unpublished Ph.D. dissertation, Fordham University, 1941), 46-47. 165 average accuracy of individual evaluators, that pooling indi- vidual decisions increased accuracy.4 His results are gen- erally supported by the present study's findings pertaining to collective accuracy: the greater the number of evaluators in agreement, the greater the accuracy. The nature of the criteria for assessing accuracy in this study was such that accuracy-scores were clearly un- related to the validity of lie-detection in the field. The requisite criteria of judgments made on verified record sets were confessions; thus, within reasonable limits, "ground truth" against which evaluators' judgments could be compared was known. As Orne has argued, however, such judgments re- flect only the extent to which evaluators can reliably identify those aspects of physiological data which they view as indicative of truthfulness and deception.5 In other words, the examiners' actual judgments in all verified situations were correct (valid); it is not known if the examiners relied on physiological or other information to make such judgments. On the other hand, the criteria of accuracy on un- verified record sets were the judgments of the testing 4 G. Barland, "The Reliability of Polygraph Chart Evaluations," (paper presented at American Polygraph Association Seminar, August 15, 1972, Chicago, Illinois), 7. 5M. Orne, "Implications of Laboratory Research for the Detection of Deception," Polygraph, 2 (1973), 179. 166 examiners. Of course, under such conditions neither "ground truth" nor the nature of the information which the testing examiners used to make such judgments was known. Accuracy- scores, then, whether on verified or unverified record sets, are essentially measures of reliability, agreement between examiners' judgments based on many sources of information, e.g., physiological data, behavioral characteristics of subjects, investigators' reports, etc., and evaluators' judgments based solely on physiological data. In View of the above argument it is noteworthy that accuracy-scores, while generally "correct" well beyond chance levels overall, were not substantially higher on verified than on unverified record sets. This result suggests that polygraphic recordings themselves are relatively stable from the first situation to the second. It also suggests that while physiological data are a substantial contribution to (police) examiners' judgments in actual field-testing, other sources of information probably have a considerable influence. In other words, as many field-examiners contend, lie-detection in the field is a diagnostic technique the validity of which is neither completely determined by, nor independent of, physiological information. Reliability of Judgments The consistency of evaluators' judgments in this study substantiates prior research, whether experimental or field- based, that there is considerable agreement among independent 167 evaluators as to the criteria believed associated with de- ception. That is, that blind analysis of polygraphic recordings by trained evaluators is an objective, reliable, procedure. Pairs of evaluators in this study agreed on an average of 69 percent of their judgments. Barland reported an average agreement of 95.5 percent for (pairs of) six field trained evaluators of experimentally-derived polygra- phic recordings.6 The difference in these results may be due to the nature of the polygraphic recordings, i.e., experimental as Opposed to field. On the other hand, it is also likely that the difference is partially explained by the fact that the evaluators in Barland's study scored the recordings numerically. The percentage of agreements re- ported represents the percentage of incidence of paired evaluators' scores indicating a definite decision; thus, disagreements caused by one of a pair's scores falling into the inconclusive region were not counted. It was apparent in this study that evaluator relia- bility did not substantially vary whether judgments made on verified or unverified record sets were considered; for both categories reliability coefficients were quite high, .89 and .85, respectively. This result supports the earlier suggestion 6Barland, "The Reliability of Polygraph Chart Evalua- tions," op. cit., 5. 168 in this chapter that there is a high degree of consistency in polygraphic recordings, whether derived from verified or unverified investigations. Although the accuracy and reliability of the judg- ments made by the evaluators in this study were quite substantial, it is apparent that these results were not as convincing as those reported in other somewhat similar studies dealing with field-derived polygraphic recordings. Horvath and Reid, for instance, reported an average accuracy of 87.7 percent for ten evaluators' judgments on the poly- graphic recordings of forty subjects. A similar figure, 86 percent, was reported by Hunter and Ash for seven evalua- tors' judgments on twenty polygraphic recordings. In both of these studies errors were almost identically balanced; that is, false positives occurred nearly as often as false negatives. Some of the possible explanations for such incon- sistencies between prior research and the present study are quickly eliminated as unlikely; others appear more relevant. Of probably minimal influence on differential results are the following: 1) In previous sections of this study it is suggested that verified recordings may be more accurately interpreted than those which are unverified, implying that the Horvath/ Reid and Hunter/Ash studies, using only verified recordings, biased results in favor of higher accuracy. However, evaluators 169 in the present study were not substantially more accurate on verified than on unverified recordings, average accuracy on the record sets in the former category being lower than that reported in previous studies. 2) A second possible explanation is that evaluators in prior studies may have had more experience in, or been more adept at, interpreting polygraphic recordings. The explanation is unconvincing since in this study evaluators, actively engaged in lie-detection for a period of years, were, on the average, less accurate than those in prior studies who had not yet completed a six-month training course. 3) Finally, evaluators in prior studies were not only given polygraphic recordings but were also briefed about the investigations from which the recordings were obtained. While Holmes has demonstrated that accuracy in- creases when evaluators are given information in addition to recordings7 it is exceedingly doubtful that the slight information given evaluators in prior studies can account for the substantial increment in accuracy over that in the present study. The most convincing explanations for the findings in the present study, and certainly factors which make it difficult to draw direct comparisons between this and other research, include the following: 7W. Holmes, "The Degree of Objectivity in Chart Interpretation," Academy Lecture on Lie Detection, II, V. Leonard (ed;) (Springfield, Illinois: C.C Thomas, 1958), 62-70:. 170 1) In contrast to prior studies, polygraphic re- cordings in the present study were selected at random from a pre-defined pOpulation. While randomization was, for this study, a desirable characteristic, it eliminates the possi- bility of control for any influence of examiner-subject interaction on polygraphic recordings. In other words, recordings in this study were included without regard for the capabilities of the examiners who had conducted the examinations from which the recordings derived. In fact, it became apparent during the study that some of the recordings were derived from examinations conducted by examiners who were, during the years from which the sample was drawn, interns. On the other hand, recordings evaluated in prior studies were, in each case, obtained from examinations conducted by the same experienced examiner. Obviously, any effect of examiner-subject interaction on physiological recordings was, at the least, minimized. Said in another way, variability due to differences between examiners was eliminated. It should be noted here that the lack of any signifi- cant differences between experience—levels of evaluators in the present study, does not refute the above considerations. That examiners acting as evaluators apparently do not differ in ability to interpret physiological data is not to say that experience is an unimportant variable in conducting 171 polygraphic examinations. In fact, in view of Orne's argu- ment that the primary variables in lie-detection are psycho- logical, not physiological, in nature,8 experience is pro- bably a critical determinant of the outcome of such examina- tions. In other words, it is experience that probably permits an examiner to adjust more effectively to complex situational demands. 2) Two of the prior studies have dealt with poly- graphic recordings of subjects involved in investigations undertaken by private or commercial examiners, whereas in the present study the recordings were of subjects involved in investigations conducted by police agencies. There may be obvious and dramatic differences between the two subject populations in regard to many of the variables known to influence autonomic activity, and, more generally, lie- detection. For instance, variables such as intelligence, ethnicity, age, and generally, personality and psychological make-up, are probably important determinants of reSponse- data obtained during polygraphic examinations.9 Moreover, as Orne has pointed out, examinations conducted by private 8Orne, "Implications of Laboratory Research for the Detection of Deception," op. cit., 188. 9See: G. Barland and D. Raskin, "The Use of Electro— dermal Activity in the Detection of Deception," Pre-publica- tion c0py to appear in W. Prokasy and D. Raskin (Eds.), Electrodermal Activity in Psychological Research (New York: Academic Press, in press), 31-39. 172 examiners may differ from those of police examiners with respect to the motivation of the subject, and the amount and nature of the information available to the examiner 10 All of these variables, singularly prior to the testing. or in combination, might make blind analysis of polygraphic recordings of police examinations more difficult than analy- sis of those obtained from commercial situations. 3) In examinations conducted by police examiners, the degree of stress on the subject is presumably greater than in those conducted by commercial examiners. Such stress is believed to increase detectability; thus, it could be suggested that evaluators of recordings obtained from police examinations would be more accurate than those who judge recordings obtained under different circumstances. However, neither Holmes's findingsll nor the results of the present study support such a suggestion. It may be that there is a threshold of stress, encountered primarily in police situations, beyond which the detectability of truth- fulness and deception in blind analysis of polygraphic recordings decreases; or, said in another way, beyond which the ambiguity of responses increases. Such ambiguity might also increase false positives. _ 10Orne, "Implications of Laboratory Research for the Detection of Deception," op. cit., 188. 11Holmes, "The Degree of Objectivity in Chart Interpretations," op. cit., 67. 173 4) Finally, evaluators in the present study were denied the advantage of some physiological data available to the testing examiners. For methodological reasons, "yes” tests were eliminatedfitmlall record sets; it is not clear whether such tests were included in prior research. Although it is possible that the elimination of "yes" tests decreased overall accuracy, it probably did not affect the relative results. With but one exception "yes" tests had to be eliminated from record sets in the deceptive category; hence, if anything, accuracy would have increased only on these record sets. Judgments of record sets in the truthful category would have been unaffected. Confidence in Judgments In general, the results pertaining to confidence- scores are consistent with those of accuracy-scores. More experienced evaluators were not significantly more confident than those less experienced, nor was confidence significantly greater on verified than on unverified record sets. These results lend support to explanations previously advanced in the discussion concerning the accuracy of judgments. That confidence-scores were significantly higher on the record sets in the deceptive than in the truthful cate- gory also supports prior discussion. In blind analysis the physiological responses believed to be associated with de- ception not only are more accurately but also more confidently 174 judged than those indicative of truthfulness. This result is consistent irreSpective of verification involved. While field-research dealing with the relationship between confidence-ratings and accuracy has not been reported, there are two experimental studies of lie-detection which have explored this issue. Kubis reported that independent evaluators of polygraphic recordings "had greater confidence in those decisions ultimately verified as correct than they did in those which were incorrect."12 In a later study, Moroney substantiated Kubis's findings.13 The results of the present study clearly support those reported by Kubis and Moroney: confidence was significantly greater on correct than incorrect judgments for both exper- ience—groupings of evaluators. While the practical signifi- cance of this finding is unclear, it suggests that the more ambiguous the recordings, the greater the possibility for error in blind analysis, regardless of the experience of the evaluator. When evaluators identified those aspects of physiological data believed to be indicative of truthfulness and deception, confidence increased; when those aspects were ¥2J. Kubis, Studies In Lie Detection: Computer Feasi- bility Considerations, TecH.—Report 62-205 (Arlington, VA.: Armed services Technical Information Agency, June, 1962), pre- pared for Air Force Systems Command, Contract No. AF 30 (602)- 22700, Project No. 5534, Fordham University, 1962, 68. 13W. Moroney, "The Detection of Deception as a Function of PGR Methodology," (unpublished Ph.D. dissertation, St. Johns University, 1968, Ann Arbor, Michigan: University Microfilms, 1969, No. 69-7125). 175 less apparent, confidence decreased. Moreover, it is inter- esting that this finding obtained even though the criteria for assessing accuracy were not the same for verified and unverified conditions, suggesting again that the nature of the recordings in the two conditions is relatively consistent. Ease of Interpretability of Record Sets The results concerning the total ease-of—interpreta- bility scores are both consistent and inconsistent with results pertaining to accuracy and confidence-scores. In regard to consistencies, it is apparent that the experience of evaluators did not significantly influence total "ease" scores. Contrary to Horvath and Reid's suggestion, when in blind analysis, more experienced evaluators apparently do not find it easier than do the less experienced to inter- pret polygraphic data, "to apply consistently the fine points of the [control question] theory."l4 However, as will be discussed, "ease" scores may not have been a very effective measure of truthfulness/deception indicated by physiological data. A second finding regarding the total "ease" scores and supporting other findings was that record sets in the deceptive category were judged significantly easier to l4F. Horvath and J. Reid, "The Reliability of Poly- graph Examiner Diagnosis of Truth and Deception," J. Crlm. Law, Crim. and Pol. Sci., 63 (1972), 281. 176 interpret than those in the truthful category, whether verified or unverified. This finding, of course, is con- sistent with the greater confidence and accuracy scores on "deceptive" record sets. Total "ease" scores decreased considerably from the verified to the unverified condition for record sets in the deceptive category, while for those in the truthful category they increased slightly. (These same effects were also apparent in confidence scores.) Again, an explanation of these results may lie in the lack of uniform numbers of control-question tests in record sets. In the deceptive category it is apparent that there were more record sets containing only the basic battery in the verified than in- the unverified condition; for sets in the truthful category the basic battery was apparent more often in the unverified than the verified condition.15 Thus, the direction of "ease" scores across the levels of verification may reflect merely differences in the number of record sets containing only the basic battery, presumably easier to interpret than other record sets. It is clear, however, that such differences do not account for the relationship between the ease of inter- pretability of truthful and deceptive record sets; "deceptive" were easier to interpret than "truthful" irrespective of the number of record sets in each of these categories containing only the basic battery. 15See Table 3.2, page 100. 177 Total ease-of—interpretability scores were signifi- cantly higher in the verified than in the unverified condi- tion. This result seems to conflict with other results, since neither accuracy nor confidence scores were signifi- cantly different in these two conditions. It is likely, however, the "ease” scores were not a measure of the degree to which an evaluator could discriminate between control- relevant question responses; hence, they were not directly related to accuracy. The "ease" scores were apparently regarded by evaluators as an index of the general level of the reSponsiveness of the physiological data in record sets, perhaps irreSpective of truthfulness/deception indications. This explanation helps clarify why record sets in the crime- against-a-person category were judged significantly easier to interpret than those in the prOperty-crime category; it is also consistent with the explanation previously advanced concerning the accuracy-score results: Investigations con- cerning crimes against a person are, by nature, more emo- tionally weighted than those pertaining to property crimes; thus, the general level of responsiveness for record sets in the former category is greater than in the latter. Results of analysis of the ease-scores for individual components are essentially similar to those for total "ease"- scores. ReSpiration and cardio were easier to interpret for record sets in the deceptive category than those in the 178 truthful. This same general result was also found for GSR "ease" scores but only in the verified condition -- an ex- ception not readily explained. The mean total ease—of-interpretability ratings were significantly higher on correct than on incorrect judgments. This result approximates Kubis's findings that for indepen- dent evaluators correctly judged records are easier to inter- 16 Other results of the pret than those incorrectly judged. present study, however, are strikingly dissimilar to those reported by Kubis. In the present study physiological components were rated for ease-of—interpretability in the following order: respiration, cardio, and GSR, the first two components judged significantly easier to interpret on correct than on incorrect judgments. These results, corresponding with anecdotal evidence offered by field examiners concerning the relative merits of the individual components,17 do not correspond with those of Kubis's laboratory study. Kubis found GSR, cardiovascular, and respiratory activity, in that order, easier to interpret and found the ratings for all 18 three components higher for correct than incorrect decisions. There are several explanations for these differences. 16Kubis, Studies io Lie Detection: Computer Feasi- bility Considerations, op. cit., 70-71. 17J. Reid and F. Inbau, Truth and Deception: The Poly: ra h ("Lie Detector") Technique (Baltimore: Williams and Wilkins, 1966), 40. 18Kubis, Studies lo Lie Detection: Computer Feasi- bilityConsideratlons, op. cit., 70. 179 Field examiners contend that for their purposes GSR is less useful as an indicator of deception than are respira- tion or cardiovascular activity. Thus, since the present study involved field polygraphic data evaluated by field- trained evaluators, "ease" ratings may be reflecting the particular orientation of these evaluators. Comparison of the simplicity of GSR responses to the complexity of reSpiratory and cardiovascular responses, however, detracts from this explanation. A second explanation of the differences may be that in the field the level of subject affect, being higher than in laboratory situations, distorts GSR responses to the extent that they are, in fact, more difficult to interpret than are respiratory or cardiovascular responses. This explanation is consistent with the claims of field examiners,19 although there is some indication that such claims may not be legitimate.20 Finally, differences in instrumentation in Kubis's laboratory situation and the typical field situation may affect GSR responses. Laboratory equipment such as that used by Kubis is usually more sophisticated than field equipment. Moreover, in field situations the apparatus for recording 19Reid and Inbau, Truth and Deception: The Polygraph ("Lie Detector") Technique, op. cit., 220. 20Barland, "An Experimental Study of Field Techniques in Lie Detection," op. cit., 50. 180 cardiovascular activity usually causes some discomfort to the subject. Kubis, however, recorded cardiovascular activity in a manner which precluded discomfort. Thus GSR responses in Kubis's study were uninfluenced by this additional factor whereas such responses as evaluated in the present study 21 may have been degraded. These assumptions concerning the effect of instrumentation differences on GSR responses, how- ever, are not fully supported by evidence reported by Barland,22 Kugelmass,23 and Orne.24 Numerical Evaluation Of secondary but real interest here is the accuracy of numerical scores of evaluators. When the scores for all record sets were considered, the evaluators' GSR scores, not counting inconclusives, were more accurate, 60 percent, than those for the other components; while this same result did not obtain when the accuracy of scores was calculated separately for record sets in the unverified condition, GSR 21Alternate explanations for differences between laboratory and field situations with respect to GSR re- sponses are also possible; see: Barland and Raskin, "The. Use of Electrodermal Activity in the Detection of Deception," 92. Cite, 30-44. 22Barland, "An Experimental Study of Field Techniques in Lie Detection," op. cit., 44. 23S. Kugelmass, I. Lieblich, A. Ben Ishai, A. Opatowski, and M. Kaplan, "Experimental Evaluation of Galvanic Skin Response and Blood Pressure Change Indices During Criminal Interrogation," o. Crim. Law, Crim. SEQ Pol. Soi., 59'(l968), 623-635. 24Orne, "Implications of Laboratory Research for the Detection of Deception," op, cit., 196. 181 scores were not substantially less accurate than those of the other components. These results are not consistent with the claims of many field examiners that in the field GSR is of relatively little merit compared to other physio- logical indices of deception. The results are, however, consistent with the results of many experimental lie-detec- tion studies, and they agree with Barland's tentative find- ings in the field.25 One reason for the apparent lack of faith which field examiners have in GSR responses may be that in many situa- tions such responses are too ambiguous, i.e., too labile, however otherwise useful they are. This ambiguity is appar- ent upon examination of the accuracy of the scores for individual components when "inconclusive" scores are not eliminated. The scores for GSR fell into the "inconclusive" region nearly twice as often as those of the other two components, making GSR scores the least accurate. The ambiguity of GSR responses is also apparent from an inspec- tion of the ease-of—interpretability ratings of individual components; GSR was rated the most difficult of the three components to interpret. It should be noted, however, that the ambiguity of GSR in the field may not be situational in nature, but rather due to the inattentiveness of examiners to instrumentation maintenance or adjustment. 25Barland, "An Experimental Study of Field Techniques in Lie Detection," op. cit., 50. 182 Results pertaining to the reliability of numerical scores indicate greater agreement between evaluators on GSR scores than on either of the other two components or on combined scores. With but one exception these results are consistent with Barland's findings concerning relative reliability of evaluators' scores.26 The exception is that in the present study evaluators did not differ in their consistent scoring of respiratory or cardiovascular responses. In Barland's study, on the other hand, respiratory responses were scored with considerably less consistency than either GSR or cardiovascular responses. It is not clear if this difference in results was due to differences in the nature of the polygraphic recordings used (field as Opposed to experimental) or to evaluator differences. However, the former explanation seems more likely since the evaluators in both studies were field-trained. The consistency of evaluators' numerical scores is surprisingly high, especially since evaluators received only minimal instruction in numerical evaluation, and since such scores reflect primarily relevant/control-question response differences rather than overall judgments of truth- fulness/deception. This result further indicates that analysis of polygraphic data by field-trained evaluators is relatively objective and reliable. 26Barland, "The Reliability of Polygraph Chart Evalua- tions," op. cit., 5. 183 Summary It is clear that in general the results of this study support prior research, that the "blind" judgments of trained evaluators made on field-derived polygraphic recordings are accurate well beyond chance levels and that there is a sub- stantial degree of reliability and objectivity in these judgments. Nevertheless, the results also suggest that it may be inappropriate to talk about the accuracy of blind analysis without first specifying the nature of the investi- gation from which recordings are drawn, whether for law enforcement or commercial purposes. The most consistent finding in this study was that pertaining to differences between polygraphic recordings of truthful and deceptive subjects. Not only were recordings of deceptive subjects judged more accurately and confidently, but they were easier to interpret than those of truthful subjects. While it is tempting to apply this result to the general field-situation it is inapprOpriate to do so. The results of this study pertain only to judgments made by blind analysis, which, as already pointed out, differs substantially from the manner in which judgments are made by examiners in , field-settings. It is clear that extensive research is warranted to determine the influence which differential sources of information have on examiners' judgments generally, and on the nature of errors in field lie-detection specifi- cally. APPENDICES 184 APPENDIX A NUMBER OF FOLDERS ASSIGNED TO STRATIFICATION LEVELS 185 186 TABLE A.1.--Number of Folders Assigned to Stratification Levels. Verified Truthful Deceptive Crimes Crimes Against PrOperty Against Property A Person Crimes A Person Crimes 47 33 187 213 Unverified Truthful Deceptive Crimes Crimes Against Property Against PrOperty A Person Crimes A Person Crimes 311 450 100 105 APPENDIX B INSTRUCTIONS TO EVALUATORS 187 188 General Instructions to Evaluators Enclosed are the polygraph recordings of 28 subjects in PACKET _____. Would you please analyze each set of re- cordings and for each subject complete folly the EVALUATOR ANSWER SHEET. PLEASE be sure that you have answered all questions on each sheet for each subject. Some subjects' recordings are given a number followed by the letters QC. These recordings are to be analyzed according to directions for completing the EVALUATOR ANSWER SHEET 32o the NUMERICAL EVALUATION SCORE SHEET, as explained on February 8, 1974. In other words, for 311 subjects com- plete an EVALUATOR ANSWER SHEET; for subjects whose numbers are followed by a QC complete an EVALUATOR ANSWER SHEET and a NUMERICAL EVALUATION SCORE SHEET. When you have completed an EVALUATOR ANSWER SHEET (and the NUMERICAL EVALUATION SCORE SHEET, where appropriate), place them in the PACKET envelope along with all of the poly- graph recordings. (PLEASE BE CAREFUL NOT TO LOSE OR MIS- PLACE ANY OF THE RECORDINGS). You will have one week to evaluate all recordings in any one PACKET. If you finish before this time limit please notify (The Chief Examiner) or me and tell us which PACKET you have completed. DO NOT give the recordings to any other examiner. 189 -2- NOTE: Valid results depend upon each examiner making his own analysis. So please do not consult with anyone else when making your decisions or discuss your results with any other examiner. If you have any questions concern- ing the study or the procedure please call before you start your analysis. THANK YOU. 1. 2. 190 Instructions For Numerical Evaluation Review each measure (resp., GSR, Cardio) separately in Test I. Compare response in each measure to each of the four relevant questions (consider only questions 3k, 5, 8, and 9) to the response on appropriate Control Questions. (See the Numerical Evaluation Score Sheet to decide which Control Question to consider). Decide if the response to the relevant question is greater or less than the response to the Control Question. If the response to the relevant question is greater the score for that question in the measure you are analyzing could be -1, -2, or -3; depending upon how much greater you believe the response is. For instance, if you are evaluating the respiration measure and the response at question #5 is very much greater than the response at Control Question #6, then you would indicate on the score sheet a -3; if the reSponse is only somewhat greater to the relevant question, then you would score a -2, etc. If there is no difference between the relevant question reSponse and the Control Question reSponse, then you would mark a Q on the score sheet. On the other hand, if the Control Question response is greater than the response to the particular relevant question you are evaluating then you would mark a +1, +2, or a +3, once again depending upon how much greater you believe the Control Question response to be. Carry out step 3 for each of the four relevant questions and for each measure on TEST I. Repeat steps #3 and #4 for TEST III (following the "number" test). If there are two respiration measures recorded, evaluate ONLY the recording of the lower pneumo; that is, the recording which is nearest the bottom of the chart. You do not have to total your scores, unless you want to, since your decision regarding the subject's truthfulness or deception will already be indicated on your EVALUATOR ANSWER SHEET. APPENDIX C SPECIMEN COPIES OF EVALUATOR ANSWER SHEETS 191 192 EVALUATOR ANSWER SHEET DATE: PACKET # EVALUATOR NAME: RECORD # I. BASED UPON YOUR ANALYSIS OF THE SUBJECT'S RECORDS WOULD YOU CONCLUDE THAT HE IS: (Please circle appropriate number). A truth-teller (NDI) 1 A liar (DI) 2 Inconclusive (INC) 3 II. WOULD YOU PLEASE RATE THE DEGREE OF CONFIDENCE YOU HAVE IN YOUR ANALYSIS: No confidence 1 Very doubtful 2 More doubtful than confident 3 More confident than doubtful 4 Very confident 5 Almost certain 6 III. OVERALL, HOW EASY WAS IT TO INTERPRET THESE RECORDS? Easy to interpret? Resp. GSR Cardio Very easy 5 5 5 Easy 4 4 4 Average 3 3 3 Difficult 2 2 2 Very Difficult l l l NUMERICAL EVALUATION SCORE SHEET 193 TEST I QBk-6 05-6 08-6 09-10 Component Total PNEUMO GALVO CARDIO TOTAL SUB-TOTAL TEST III 03k-6 05-6 08-6 09-10 CompOHEnt Total PNEUMO GALVO CARDIO TOTAL SUB-TOTAL SPOT TOTALS SUBJECT # PACKET # EXAMINER [GRAND TOTAL I APPENDIX D RESULTS OF INDIVIDUAL EVALUATORS' JUDGMENTS 194 195 8v mN N33 3; he mm .muazmmu Hmsna>ence mceumea mmoaucwdnm on» :3 moanmu umcuo fie... £9388 me 635 was 5 88836 3653565 :6 65.868 ms 35.8% 93836865 no .385: 93 3 8.69355 5 .595: on.» “magnum #85093 no 835: O5 m3 .385: @809... on.» “Command umfi 5 once mega 90058 no .805: on» m3 5300 nose :..n g umfiu 93. "E Amdm: 5:3 om om 3V 5 mm :3 en @033 on 33 .d 5 ms Am. 2. op "@3483. 23:8 mmmaov ON vv 8. 3 cm Ad ON on 3v mm 3m :3 ON cm 2: 3 3m :3 am an AC mm hm BOP-A8 Am. mm 3 8. m m 2: m 3 3v m 3 83 o m 3v m 3 33 m 3 .8 a. n 2: m m 3m :3 Nm oo 8. m o 2: m m 8. v 03 3v m o No b n A3 3 v Ad v ca Ne w m .Nv ov Nb A8 h h 8v m 3 8v m 3 2: m m 8. v 3.. 23 v 3 2: m m 3V m m mm 2: av Nb 2: N N3 2: N N3 2: w m 2: 3 «4 8v v 03 3V 3 3 2: m m .8 w m Pm 3v em 2. 2: v 3 8v m 3 8V "V 03 8v 9 m 3V N N3 8V 3 m3 2.: o m 2: m w mm 83% rod: Ad 8N 323 N we 8v N3 mm 8v 5N mv 8v 3. NN 8v 3 en 8v 3 mm 2: on an 2: hm mm 385m 83 Na. 2. 2: m m 2: N N3 3v m m 83 m m SE n 3 2: m 3 8V m m A8 A. x. mm 2: mv 3 8V N N3 8V v 03 8v v 03 8v N3 N 8V m .3 8V 3 m3 8V m w 2: m m cm 3V 3 .2. 3. m 3 A8 3 3 A8 m 0 8V 3 m 2: N N3 3V m 3 2: m m 2: b h mm 8. no 3 A8 m 3 8V 3 m3 2: m m 2: .3 m 2: o 3 8. 3 m3 2: m .6. 2: m m Nm 2: cc Na. 83 m w 83 v 3 SC v 3 2: m m 8. m m § m 3 8. m m 2: m a 3m 8:336 3: 9:5 :85: 4 5.0 :85: 4 850 :85: ¢ 950 common 4 mggg $62. Page emcee? 358a age? gene b.6563 3.46:8: panama 65.0 9:6 8:5 meta 939500 349.38 gfidmoma 3358 83.3%: 8335.5 36m 886m :6 6638688 .3833: 3834,85 :0 35838 8886855 .8 Suggoléd a I196 .vmlvH u anomoumo um: mmgoom mo:mnam:00 you macaw manflmmom may; o.nm m.om m.Nm v.Nm v.5m 0.5m w.Nm N.3m m“ om:wnsnu w.om «.mm m.om m.vm v.mm 0.5m o.mm ¢.mm M mm mm vm vm No hm mm mm cam mm Nm hm mq Nm he av 3v mm mm Nu mo No 3» mo vo mm mm mm 30 mm mm om om mm hm hm Nm cm mv 3m Nm mm Nm mm mm mo:m«ummxm :wflm 0.0m N.mm o.m¢ 0.0m ¢.mm 0.5m N.om o.aw Am vv Nm ov ov me mv mv mm mm 3 om S om. mm «m 2 av «m mm mm 3m om mm No 3m mm mm mm mv m¢ 3m ow mm mm mm Nm vm mm vm mm ow me hm 3m 3m 00:03Hmmxw_3oq mEflHU :omnwm m mEHnU :omuwm_¢. mEHHU :omnmm.¢_ mEaHU :omumm.4 maoumsamzm muumaoum um:Hmm¢ muumgoum um:3mm< xuuwmoum umcflmmc muumaoum um:Hmm< mEHHU mEfiHu msflnu mEHHU 9388a 3853:. 93:88 353:. @03m3Hm>:D nwwwflum> 38 88mm :0 83858 a.mHODm5Hm>m HMSvH>HdnH HO mmaoom mocmfiHM:OOll.N .Q Hausa .197 . .L .03Nqu u anomoumo um: mmuoom NU333nmumgmHmu:Huw0lmmmm 3muoo “an mm:mu 0333mmom use: o.mo3 aé33 933 5.333 m.h33 v.NN3 0.33 m.mo3 m 35:33.50 . N.m33 m.m33 m.o33 N.m33 w.MN3 v.mN3 o.o33 0.333 M N03 v03 om no oO3 No3 mo3 mm 03m M33 m03 v33 O33 9N3 mN3 N33 mo3 mm 533 nm3 om3 mm3 3v3 3¢3 NN3 MN3 mm oo3 M33 «m mO3 m33 0N3 Nm mm hm «m3 mm3 «N3 ¢N3 om3 3v3 NN3 om3 mm mogmwummxu swam vo3 o.O33 N.mm N.m03 .v.333 v.m33 m.oo3 m.oo3 m mm mm em as mm hm mm mm mm hm ao3 cm 503 mo3 N33 om hm vm m33 m33 N03 ~33 m33 mN3 mm mo3 mm 303 mo3 mm 533 333 mN3 o33 no3 Nm NN3 om3 MN3 3N3 «m3 vv3 MN3 o33 3m moco3HmQxN.au3 mango :Omumm_4 mE3HU :omumm d mEHHU :omuwa.a. MEHuU. :Omumm a muosm=3m>m huuwmoum um:3mm¢ auuwmoum um:30m¢ xuuwgoum um:3mm< avuwmoum umcflmmd mE3HU ms3nu mE3HU QE3HU 93%08 38.52:. 93:88 3958:. 83:382.: 8333/ muww vacuum mo muommumu MmuBSdm/m 3893?: mo 888 3333:3585 :0 8mm glad a APPENDIX E ANALYSIS OF VARIANCE TABLES 198 199 TABLE E.l.--Ana1ysis of Variance Table for Accuracy Scores. Source df MS F p< 1. A (A=Experience-high,low) l 15.75 .11 .75 2. E (E=Eva1uators): A 8 149.47 3. B (B=Verification-verified, unverified) 1 77.42 1.42 .27 4. C (C=Truthfu1ness-truthfu1, deceptive) 1 10628.36 10.70 .01 5. D (D=Crime type-person, prOperty) 1 183.32 1.54 .25 6. A X B 1 143.92 2.64 .14 7. A X C 1 1292.03 1.30 .29 8. A X D l 5.57 .05 .83 9. B X C 1 183.92 1.36 .28 10. B X D 1 231.54 1.18 .31 11. C X D 1 3039.35 55.83 .0001 12. A X B X C 1 .63 .005 .95 13. A X B X D 1 5.78 .03 .87 14. A X C X D 1 15.93 .29 .60 15. B X C X D 1 1925.70 20.87 .002 16. A X B X C x 1 5.57 .06 .81 17. B X E:A 8 54.46 18. C X E:A 8 992.65 19. D X E:A 8 119.05 20. B X C X E:A 8 135.30 21. B X D X E:A 8 195.76 22. C X D X E:A 8 54.44 23. B X C X D X 8 92.29 200 TABLE E.2.--Ana1ysis of Variance Table for Confidence Scores. Source ' df MS F p< 1. A (A=Experience-high,low) 1 470.45 1.77 .22 2. E (E=Eva1uators):A 8 266.50 3. B (B=Verification—verified, unverified) 1 16.20 .53 .49 4. C (C=Truthfu1ness—truthfu1, deceptive) 1 312.05 64.17 .0001 5. D (D=Crime type- person, property) 1 1.25 .08 .78 6. A X B 1 48.05 1.57 .25 7. A X C 1 12.80 2.63 .14 8. A X D 1 20.00 1.30 .29 9. B X C 1 36.45 6.23 .03 10. B X D 1 26.45 2.16 .18 11. C X D 1 16.20 2.72 .13 12. A X B X C 1 5.00 .85 .38 13. A X B X D 1 .20 .02 .90 14. A X C X D 1 .05 .008 .93 15. B X C X D 1 3.2 .42 .54 16. A X B X C X D 1 18.05 2.37 .16 17. B X E:A 8 30.69 18. C X E:A 8 4.86 19. D X E:A 8 15.38 20. B X C X E:A 8 5.85 21. B X D X E:A 8 12.26 22. C X D X E:A 8 5.94 23. B X C X D X E:A 8 7.63 201 TABLE E.3.--Analysis of Variance Table for Total Base of Interpretability Scores. Source df MS F p< 1. A (A=Experience— high,1ow) 1 1852.81 1.03 .34 2. E (E=Eva1uators):A 8 1800.91 3. B (B=Verification-verified, unverified) 1 171.11 7.65 .02 4. C (C=Truthfulness- truthful, deceptive) 1 1593.11 37.99 .0003 5. D (D=Crime type— person property) 1 437.11 8.22 .02 6. A X B l .012 .0006 .98 7. A X C l 2.11 .05 .83 8. A X D 1 37.81 .71 .42 9. B X C 1 556.51 9.31 .02 10. B X D 1 86.11 3.25 .11 11. C X D 1 17.11 1.05 .34 12. A X B X C l .61 .010 .92 13. A X B X D 1 .31 .012 .92 14. A X C X D l .013 .0008 .98 15. B X C X D 1 37.81 1.05 .34 16. A X B X C X 1 49.61 1.38 .27 17. B X E:A 8 22.38 18. C X E:A 8 41.93 19. D X E:A 8 53.15 20. B X C X E:A 8 60.94 21. B X D X E:A 8 26.46 22. C X D X E:A 8 16.31 23. B X C X D X 8 36.03 202 TABLE E.4.--Ana1ysis of Variance Table for ReSpiration Ease- of-Interpretability Scores. Source df MS F p< 1. A (A=Experience-high, low) 1 11.25 .06 .81 2. E (E=Eva1uators): A 8 187.94 3. B (B=Verification-verified, unverified) 1 115.20 40.87 .0003 4. C (C=Truthfu1ness-truthfu1, deceptive) 1 510.05 102.65 .0001 5. D (D=Crime type-person, property) 1 31.25 4.17 .08 6. A X B l 20.00 7.10 .03 7. A X C 1 2.45 .49 .50 8. A X D 1 6.05 .81 .40 9. B X C 1 72.20 6.25 .04 10. B X D 1 88.20 20.54 .002 11. C X D 1 36.45 5.33 .05 12. A X B X C 1 .20 .02 .90 13. A X B X D 1 .20 .05 .83 14. A X C X D 1 .05 .01 .93 15. B X C X D 1 .80 .07 .80 16. A X B X C X D 1 51.20 4.51 .07 17. B X E:A 8 2.82 18. C X E:A 8 4.97 19. D X E:A 8 7.49 20. B X C X E:A 8 11.54 21. B X D X E:A 8 4.29 22. C X D X E:A 8 6.84 23. B X C X D X E:A 8 11.34 TABLE E.5.-—Ana1ysis of Variance Table for GSR Interpretability Scores. 203 Ease-of- Source df MS F‘ p< 1. A (A=Experience-high, low) 1 312.05 78 .40 2. E (E=Eva1uators) : A 8 401.00 3. B (B=Verification-verified, unverified) 1 14.45 2.63 .14 4. C (C=Truthfu1ness-truthfu1, deceptive) 1 5.00 .59 .46 5. D (D=Crime type-person, prOperty) 1 22.05 3.82 .09 6. A X B 1 1.80 .33 .58 7. A X C 1 .45 .05 .82 8. A X D 1 5.00 .87 .38 9. B X C 1 140.45 12.13 008 10. B X D l 12.80 3.07 .12 11. C X D 1 11.25 4.79 .06 12. A X B X C l 3.20 .28 .61 13. A X B X D 1 6.05 1.45 .26 14. A X C X D l 3.20 1.36 .28 15. B X C X D 1 45.00 6.29 .04 16. A X B X C X 1 6.05 .85 .38 17. B X E:A 8 5.50 18. C X E:A 8 8.48 19. D X E:A 8 5.78 20. B X C X E:A 8 11.58 21. B X D X E:A 8 4.18 22. C X D X E:A 8 2.35 23. B X C X D X : 8 7.15 TABLE E.6.--Ana1ysis of Variance Table for 204 Interpretability Scores. Cardio Ease-of- Source ' df MS F p< 1. A (A=Experience-high, low) 1 485.11 3.37 .10 2. E (Evaluators) :A 8 143.91 3. B (B=Verification-verified, unverified) 1 2.11 .17 .69 4. C (C=Truthfulness-truthfu1, deceptive) 1 227.81 59.27 .0001 5. D (D=Crime type-person, prOperty) 1 112.81 9.87 .01 6. A X B 1 9.11 .72 .42 7. A X C 1 .31 .08 .78 8. A X D 1 2.11 .18 .68 9. B X C 1 10.51 1.48 .26 10. B X D 1 12.01 2.12 .18 11. C X D 1 2.11 .37 .56 12. A X B X C 1 4.51 .64 .45 13. A X B X D 1 12.01 2.12 .18 14. A X C X D 1 2.81 .49 .50 15. B X C X D 1 .11 .02 .89 16. A X B X C X D l 5.51 .93 .36 17. B X E:A 8 12.64 18. C X E:A 8 3.84 19. D X E:A 8 11.43 20. B X C X E:A 8 7.11 21. B X D X E:A 8 5.67 22. C X D X E:A 8 5.74 23. B X C X D X E:A 8 5.91 APPENDIX F CORRELATIONS OF EVALUATORS' NUMERICAL SCORES ON VERIFIED AND UNVERIFIED RECORD SETS 205 206 TABLE F.1.--Corre1ations of ReSpiration Scores: Verified Record Sets. Evaluator 1 2 3 4 5 6 7 1 .68 .24 .76 .52 .79 .62 2 .15 .65 .43 .76 .61 3 .06 .55 .41 .52 4 .55 .77 .56 5 .59 .89 6 .74 TABLE F.2.--Correlations of ReSpiration Scores: Unverified Record Sets. Evaluator 1 2 3 4 5 6 7 1 .65 .58 .66 .44 .43 .73 2 .49 .75 .49 .62 .66 3 .83 .62 .35 .89 4 .69 .58 .91 5 .41 .77 6 .53 TABLE F.3.--Correlations of GSR Scores: Verified Record Sets. Evaluator 1 2 3 4 5 6 7 1 .87 .63 .75 .86 .88 .73 2 .67 .68 .74 .88 .54 3 .68 .66 .70 .58 4 .71 .76 .64 5 .75 .64 6 .60 207 TABLE F.4.--Corre1ations of GSR Scores: Unverified Record Sets. Evaluator 1 2 3 4 5 6 7 1 .74 .70 .87 .76 .81 .82 2 .66 .76 .59 .77 .76 3 .75 .64 .63 .63 4 .81 .85 .75 5 .77 .64 6 .69 TABLE F.5.--Corre1ations of Cardio Scores: Verified Record Sets. Evaluator 1 2 3 4 5 6 7 1 .67 .41 .66 .59 .37 .70 2 .48 .71 .62 .54 .74 3 .32 .30 .41 .62 4 .80 .58 .79 5 .51 .71 6 .57 TABLE F.6.--Corre1ations of Cardio Scores: Unverified Record Sets. Evaluator 1 2 3 4 5 6 7 1 .61 .61 .63 .70 .44 .79 2 .60 .76 .67 .77 .54 3 .35 .63 .43 .52 4 .77 .59 .70 5 .69 .88 6 .53 208 TABLE F.7.--Corre1ations of Combined Scores: Verified Record Sets. Evaluator 1 2 3 4 5 6 7 1 .77 .35 .71 .65 .75 .68 2 .43 .69 .56 .79 .63 3 .26 .54 .48 .57 4 .70 .72 .71 5 .57 .85 6 .67 TABLE F.8.-~Correlations of Combined Scores: Unverified Record Sets. Evaluator 1 2 3 4 5 6 7 1 .66 .72 .73 .63 .48 .79 2 .62 .84 .65 .70 .66 3 .70 .69 .47 .83 4 .78 .65 .83 5 .57 .83 6 .50 BIBLIOGRAPHY 209 BIBLIOGRAPHY Books Arther, R. and Caputo, R., Interrogation for Investigators. New York: William C. Copp and Associates, 1959. Chao, L., Statistics: Methods and Analyses. New York: Mc- Graw-Hill, 1969. Ferguson, R., The Scientific Informer. Springfield, Illinois: Charles C Thomas, 1971. Greenfield, N. and Sternbach, R. (eds.), Handbook of Psycho- Physiology. New York: Holt, Rinehart and Winston, 1972. Harrelson, L., Keeler Polygraph Institute Training Guide. Chicago: Keeler Polygraph Institute, 1964. Inbau, F. and Reid, J., Lie Detection and Criminal Interro- ation. Baltimore, Maryland: Williams and Wilkins, 1953. Kirk, R., Experimental Design: Procedures for the Behavioral Sciences. Belmont, California: Brooks/Cole, 1968. Larson, J., Lying and Its Detection. Chicago: University of Chicago Press, 1932. Reprinted, Montclair, New Jersey: Patterson Smith, 1969. Lee, C., The Instrumental Detection of Deception: The Lie Test. Springfield, Illifiois: Charles C Thomas, 1953. Leonard, V.A. (ed-), Academy Lectures on Lie Detection. Springfield, Illinois: Charles C Thomas, 1957. Leonard, V.A. (ed.), Academy Lectures on Lie Detection. Vol. 2, Springfield, Illinois: Charles C Thomas, 1958. Lykken, D., Psychology and The Lie Detector Industry. Minneapolis: Department of Psychiatry, University of Minnesota Press, Report No. PRr74-l, 1974. 210 211 Marston, W., The Lie Detector Test. New York: Richard K. Smith, 1938. Mehrens, W.A. and Ebel, R. (eds.), Principles of Educational and Psychological Measurement. Chicago: Rand McNally, 1967. Munsterberg, H., On The Witness Stand. New York: Doubleday, 1908. Prokasy, W.F. and Raskin, D. (eds.), Electrodermal Activity in Psychological Research. New York: Academic Press, in press. Reid, J. and Inbau F., Truth and Deception: The Polygraph ("Lie Detector") Technique. Baltimore: Williams and Wilkins, 1966. Rosenthal, R., Experimenter Effects in Behavioral Research. New York: Appleton-Century-Crofts, 1966. Periodicals Abrams, S., "Polygraph Validity and Reliability: A Review,“ Journal of Forensic Sciences, 18 (1973), 313-326. Alpert, M., Kurtzberg, R.L. and Friedhoff, A., "Transient Voice Changes Associated With Emotional Stimuli," Archives of General Psychiatry, 8 (1963), 362-365. Altarescu, H., "Problems Remaining for the 'Generally Accepted' Polygraph," Boston University Law Review, 53 (1973), 375-405. Ansley, N. (ed.), "Actions of the Board of Directors, January 18-20," American Polygraph Association Newsletter, 1 (1974), 10. Ansley, N. (ed.), "A.P.A. Accepted Polygraph Schools," American Polygraph Association Newsletter (December/ January, 1974), 14. Ansley, N. (ed.), "Inquiry Regarding Dektor PSE-l," American Polygraph Association Newsletter, 3 (1972), 18. Arther, R., "Covering Two Crimes in One Examination," Journal of Polygraph Studies, 4 (1970), 3-4. Arther, R., "Crime Question Wording," Journal of Polygraph Studies, 4 (1969), 1-4. ' 212 Arther, R., "Irrelevant Questions," Journal of Polygraph Studies, 3 (1969), 3-4. Arther, R., "Peak of Tension: Basic Information," Journal of Polygraph Studies, 1 (1967), 4. Arther, R., "Peak of Tension: Dangers, Studies, 2, 5 (1968), 1-4. Journal of Polygraph Arther, R., "Peak of Tension: Examination Procedures," Journal of Polygraph Studies, 5, 1 (1970), 1-4. Arther, R., "The Guilt Complex Question," Journal of Polygraph Studies, 4 (1969), 1-4. Backster, C., "Methods of Strengthening Our Polygraph Technique," Police, 6, 5 (1962), 61-68. Backster, C., "Lie Detection Comes of Age," Law and Order (undated, unpaginated reprint supplied by author). Ben Shakhar, G., Lieblich, I., and Kugelmass, 8., "Guilty Knowledge Technique: Application of Signal Detection Measures," Journal of Applied Psychology, 54, 5 (1970), 409-413. Benussi, V., "On the Effects of Lying on Changes in Respira- tion," Archives Fur Die Gesamte Psychologie (1914), 244-273. Berkhout, J., Walter, D., and Adey, W., "Autonomic Responses During A Replicable Interrogation," Journal of Applied Psychology, 54, 4 (1970), 316-325. Berrien, P., "Possibilities in the Use of the Ophthalmograph as a Supplement to Existing Indices of Deception," Psychological Bulletin (Abstract), 37 (1940), 507. Berrien, P., "Ocular Stability in Deception," Journal of Applied Psychology, 26 (1942), 55-63. Berrien, P., and Huntington, G., "An Exploratory Study of Pupillary Responses During Deception," Journal of Experimental Psychology, 32 (1943), 443-449. Bersh, P., "A Validation Study of Polygraph Examiner Judgments," Journal of Applied Psychology, 53, 5 (1969), 399-403. Bitterman, M., and Marcuse, F., "Cardiovascular Responses of Innocent Persons to Criminal Interrogation," American Journal of Psychology, 60 (1947), 407-412. 213 Brisentine, R., "Quality Control," Polygraph, 2 (1973, 278-286. Burtt, H., "Further Technique for Inspiration Expiration Ratios," Journal of Experimental Psychology, 4 (1921), 106-110. Burtt, H., "The InSpiration-Expiration Ratio During Truth and Falsehood," Journal of Experimental Psychology, 4, 1 (1921), 1-23. Chappell, N., Matthew, N., "Blood Pressure Changes in Decep- tion," Archives of Psychology, 17, 105 (1929), 1—39. Davidson, P., "Validity of the Guilty-Knowledge Technique: The Effects of Motivation," Journal of Applied Psycholoqy, 52, l (1968), 62-65. Dearman, H., and Smith, 8., "Unconscious Motivation and the Polygraph Test," American Journal of Psychiatry, 119, 11 (1963), 1017-1021. Fay, P., and Middleton, W., "The Ability to Judge Truth- Telling or Lying From the Voice as Transmitted Over a Public Address System," Journal of General Psycho- logy, 24 (1941), 211-215. Geldreich, E., ”Studies of the Galvanic Skin Response As a Deception Indicator," Transactions Kansas Academy of Sciences, 44 (1941), 346-351. Gustafson, L., and Orne, M., "Effects of Heightened Motivation on the Detection of Deception," Journal of Applied Psychology, 47, 6 (1963), 408—411. Gustafson, L., and Orne, M., "The Effects of Task and Method of Stimulus Presentation on the Detection of Decep- tion," Journal of Applied Psychology, 48, 6 (1964), 383-387. Gustafson, L., and Orne, M., "The Effects of Verbal Responses on the Laboratory Detection of Deception," Psycho- physiology, 2, 1 (1965), 10-13. Harmon, G., and Reid, J., "The Selection and Phrasing of Lie- Detector Test Control - Questions," Journal of Criminal Law, Criminology and Police Science, 46 (1955), 578-582. 214 Heckel, R., Brokaw, J., Salzburg, H., and Wiggins, S., "Polygraphic Variations in Reactivity Between Delusional, Non-Delusional and Control Groups in a 'Crime' Situation," Journal of Criminal Law, Criminologyyand Police Science, 53, 3 (1962), 380-383. Horvath, E., "Verbal and Nonverbal Clues to Truth and Decep— tion During Polygraph Examinations," Journal of Police Science and Administration, 1, 2 (1973), 138-152. Horvath, E., and Reid, J., "The Reliability of Polygraph Examiner Diagnosis of Truth and Deception," Journal of Criminal Law, Criminology and Police Science, 62, 2 (1971), 276-281. Horvath, P., and Reid, J., "The Polygraph Silent Answer Test," Journal of Criminal Law, Criminology and Police Sci- ence, 63, 2 (1972), 285-293. Hunter, F., and Ash, P., "The Accuracy and Consistency of Polygraph Examiner's Diagnoses," Journal of Police Science and Administration, 1 (1973), 370-375. Keeler, L., "A Method for Detecting Deception," The American Journal of Police Science, 1 (1930), 38-52. Kubis, J., "Electronic Detection of Deception," Electronics, 18 (1945), 192-212. Kubis, J., "Experimental and Statistical Factors in the Diagnosis of Consciously Suppressed Affective Experiences," Journal of Clinical Psychology, 6 (1950), 12-16. Kugelmass, S., and Lieblich, I., "Effects of Realistic Stress and Procedural Interference in Experimental Lie Detection," Journal of Applied Psychology, 50, 3 (1966), 211-216. Kugelmass, S., Lieblich, I., and Bergman, Z., "The Role of Lying in Psychophysiological Detection," Psycho- physiology, 3, 3 (1967), 312-315. Kugelmass, S., Lieblich, I., Ben-Ishai, A., Opatowski, A., and Kaplan, M., "Experimental Evaluation of Galvanic Skin Response and Blood Pressure Change Indices During Criminal Interrogation," Journal of Criminal Law, Criminology and Police Science, 59, 4 (1968), 632-635. 215 Landis, C., "Electrical Phenomenon of the Skin," Psychological Bulletin, 29, 10 (1932), 693-752. Landis, C., and Gullette, R., "Studies of Emotional Reactions," Journal of Comparative Psychology, 5 (1925), 221-253. Landis, C., and DeWick, H., "The Electrical Phenomenon of the Skin (Psychogalvanic Reflex)," Psychological Bulletin, 26, 1 (1929), 64—119. Landis, C., and Wiley, L., "Changes of Blood Pressure and Respiration During Deception," Journal of Comparative Psychology, 6 (1926), 1-19. Larson, J., "Modification of the Marston Deception Test," Journal of the American Institute of Criminal Law and Criminology, 12 (1921), 390-399. Larson, J., "The Cardio Pneumo Psychogram and Its Use in the Study of Emotions, with Practical Applications," Journal of Experimental Psychology. 5 (1922), 323-328. Lykken, D., "The GSR in the Detection of Guilt," Journal of Applied Psychology, 43, 6 (1959), 385-388. Lykken, D., "The Validity of the Guilty Knowledge Technique: The Effects of Faking," Journal of Applied Psychology, 44, 4 (1960), 258-262. Lyon, V., "Deception Tests with Juvenile Delinquents," Journal of General Psychology, 48 (1936), 494-497. MacNitt, R., "In Defense of the Electrodermal Response and Cardiac Amplitude as Measures of Deception," Journal of Criminal Law and Criminology, 33, 3 (1942), 266-275. Marston, W., "Systolic Blood Pressure Symptoms and Deception," Journal of Experimental Psychology, 2 (1917), 117-163. Marston, W., "Psychological Possibilities in the Deception Test," Journal of The American Institute of Criminal Law and Criminology, 2, 4 (1921), 551-570. Obermann, C., "The Effect on the Berger Rhythm of Mild Affective States," Journal of Abnormal and Social Psychology, 34 (1939), 84-95. Orne, M., "Implications of Laboratory Research for the Detection of Deception," Polygraph, 2 (1973), 169-199. 216 Paterson, R., "The Future of Polygraph in Industrial Security," American Polygraph Association Newsletter, No. 8 (1972), 1-3. Peterson, P., and Jung, C., "PsychOphysical Investigations with the Galvanometer and Pneumograph in Normal and Insane Individuals," Brain, 30 (1907), 153-218. Reid, J., "Simulated Blood Pressure Responses in Lie-Detector Tests and a Method for Their Detection," Journal of Criminal Law and Criminology, 36, 1 (1945), 201-214. Reid, J., "A Revised Questioning Technique in Lie-Detector Tests," Journal of Criminal Law and Criminology and American Journal of Police Science, 37, 6 (1947), 542-547. Reid, J., and Arther, R., "Behavior Symptoms of Lie Detector Subjects," Journal of Criminal Law, Criminology and Police Science, 44, l (1953), 104-108. Romig, C., "The Status of Polygraph Legislation of the Fifty States," Police, 16, 2 (1971), 54-61. Ruckmick, C., "The Truth About the Lie Detector," Journal of Applied Psychology, 22, 1 (1938), 50-58. Sternbach, R., Gustafson, L., and Colier, R., "Don't Trust the Lie Detector," Harvard Business Review, 40, 6 (1962), 127-134. Summers, W., "Science Can Get The Confession," Fordham Law Review, 8 (1939), 334-354. Suzuki, A., "An Analysis of Relative Effectiness (sic) of the Physical Indices and the Influence of Polygraph Examiner's Experience Upon Judgment of Polygraph Records in Detection of Deception," Japanese Journal, (title unknown), 21, 3 (1968), 51-59. Thackray, R., and Orne, M., "A Comparison of Physiological Indices in Detection of Deception," Psychophysiology, Thackray, R., and Orne, M., "Effects of the Type of Stimulus Employed and the Level of Subject Awareness on the Detection of Deception," Journal of Applied Psychology, 52, 3 (1968), 234-239. Trovillo, P., "Deception Test Criteria," Journal of Criminal Law and Criminology, 33 (1942), 338-358. 217 Trovillo, P., "A History of Lie Detection," Journal of Criminal Law, Criminology and Police Science, 29 (1939), 848-881 and 30, 104-119. Van Buskirk, D., and Marcuse, P., "The Nature of Errors in Experimental Lie Detection," Journal of Experimental Psychology, 47 (1954), 187-190. Unpublished Works Barland, G., An Experimental Study of Field Techniques in Lie Detection (unpublished Master's Thesis, Depart- ment of Psychology, University of Utah, 1972). Moroney, W., The Detection of Deception as a Function of PGR Methodology (unpublished Ph.D. dissertation, St. John's University, 1968. Ann Arbor, Michigan: University Microfilms, 1969, No. 69-7125). Reid, J., Interpretation of Truth and Deception in Polygraph Test Records (Undated, unpublished manuscript supplied by author). Reid, J., Stimulation Technique Outline, undated, unpublished manuscript supplied by J.E. Reid and Associates, Chicago. Rouke, F., Evaluation of the Indices of Deception in the Psychogalvanic Technique (unpublished Ph.D. disserta- tion, Fordham University, 1941). Scheifley, Verda, and Schmidt, W., Jeremy D. Finn's Multi- variance-Univariate and Multivariate Analysis of Variance, Covariance and Regression, occasional paper No. 22, Office of Research Consultation, Michigan State University, 1973. Government Documents Ellson, D., Davis, R., Burke, C., and Saltzman, I., A Report of Research on Detection of Deception, pre- pared for Office of Naval Research, Contract N60Nr- 18011, Department of Psychology: University of Indiana, 1952. Federal Bureau of Investigation, Uniform Crime Reports for the United States: 1972, Washington: Government Printing Office, 1973. 218 Kubis, J., Studies in Lie Detection: Computer Feasibility Considerations, Technical Report 62-205, Arlington, Virginia: Armed Services Technical Information Agency, 1962, prepared for Air Force Systems Command, Contract No. AFBO (602)-22700, Project No. 5534, Fordham University, 1962. Kugelmass, 8., Effects of Three Levels of Realistic Stress on Differential Psychological Reactivities, AFEOAR Grant 63-61, Air Force Office of Scientific Research, EurOpean Office, Aerospace Research, U.S. Air Force, Hebrew University of Jersusalem, Isreal, 1963. Orlansky, J., An Assessment of Lie Detection Capability (Declassified Version), Technical Report 62-16, Arlington, Virginia: Institute for Defense Analyses, Research and Engineering Support Division, 1964. U.S. Congress, House, Subcommittee of the Committee on Government Operations, Use of Polygraphs as "Lie Detectors" by the Federal Government, Hearings, 88th Congress, 2nd Session, and 89th Congress, lst Session, Parts 1-6, Washington, D.C.: U.S. Government Printing Office, 1964-1966. Violante, R., and Ross, 8., Research on Interrogation Pro- cedures, Interim Report prepared for U.S. Navy, Office of Naval Research, Contract Nonr 4129(00), Stanford Research Institute, Menlo Park, California, 1964. Other Sources Arther, R.,"The Heart and You"(unpublished, undated manu- script, National Training Center of Lie Detection, New York). Backster, C., Standardized Polygraph Notepack and Technique Guide, New York: Backster Research Foundation, 1969. Backster, C., Tri-Zone Polygraph, New York: Backster Research FoundaEion, 1969. Barland, G., The Reliability of Polygraph Chart Evaluation, paper presented to American Polygraph Association Seminar in Chicago, Illinois, 1972. Golden, R., The "Yes"- "No" Technique, paper presented to the American Polygraph Association Annual Convention in Houston, Texas, 1969. 219 Klump, C., Principles of Controlled Stimulation, paper pre- sented at American Academy of Polygraph Examiners, Eighth Annual Seminar, Washington, D.C., 1961. Orne, M., Untitled Manuscript, presented to American Poly- graph Association, Third Annual Seminar, Silver Springs, Maryland, 1969. "’11111111111111'TS