ABSTRACT THE DEVELOPMENT AND VALIDATION OF AN F SCALE FOR AN OBJECTIVE TEST BATTERY ON MOTIVATION by Roger Clay Thweatt This investigation was concerned with the development and vali- dation of an F scale for an objective test battery of motivation. The study sampled the 4200 administered test protocols of Michigan eleventh grade students who participated in Farquhar's motivational project. Instrumentation consisted of the Generalized Situational Choice Inven- tory, the Preferred Job Characteristics Scale, the Human Trait .Inventory and the Word Rating List. Rarity responses (based upon a ten percent or less criterion for item selection) were determined separately for a validation and cross- validation statistically defined total sample of 264 males and 264 females. Items comprising the F scale were based upon commonly selected rarity responses between the above samples. Male and female F scales con- sisted of twenty-five and seventy-three items, respectively. Male and female F scale reliability coefficients of . 729 and . 746 were obtained by using Hoyt's analysis of variance technique of estimating internal consistency reliability. The critical F score for both sexes was determined by plotting respective F distribution curves for misclassified and properly classi- fied over- and under-achievers. The point of overlap where misclassified under-achievers scored as properly classified under-achievers on F items was identified as three rarity responses for males and six for females after cross validation. Abstract Roger Clay Thweatt To obtain evidence of F scale validity three approaches were examined for the effect of F on: 1) expectancy of response fake; 2) test reliability; and 3) test validity. Under-achievers selected significantly more F items than over- achievers in both male and female samples. - Consequently, the rational of high fake expectancy was clearly substantiated. The respective critical F scores were applied to a sample of males and females. Individuals possessing F scores as large as or greater than the critical score were excluded from the sample. Hoyt's analysis of variance technique for estimating internal consistency was used to obtain a reliability statement of the GSCI discriminating items before and after application of the F scale. It was hypothesized that further evidence of the effectiveness of the F scale could be determined by its ability to remove unstable individuals who tend to lower instrument reliability by erratic test performance. Theoretically, reliability should increase with exclusion of unreliable subjects. However, the effect of homogeneity of test performance may operate also to reduce reliability. The question was raised as to which has the greater effect on reliability: erratic test performance or homogeneity of test performance. To test the effects of the above question a random sample of subjects equal in magnitude to those identified by F as high fake potential were excluded. The assumption was made that the internal consistency reliability coefficient reduced by random selection should be greater than the reliability coefficient reduced by F selection. However, no significant differences between reliability coefficients were found. The effects on validity between GSCI scores and standardized grade point averages before and after application of the F scale were determined. Before application of F the male and female validity co- efficients were . 582 and . 243, respectively. After the use of F the Abstract Roger Clay Thweatt male correlation decreased to . 501 and the female validity coefficient increased to . 394. No significant differences in correlations were obtained. All correlations, however, were significant from zero at the 3% or better level of confidence. Linear regression lines were plotted for each sex using GSCI scores and standardized grade point averages to locate placement of high F score individuals among the sample. Eighteen percent of the males and thirty- eight percent of the females selecting high numbers of rarity items fell one standard error of estimate below and above the regression line. Eighty-two percent and forty-one percent of the respective males and females fell in the lower left quadrant of the plot. This area represented location of low achieving, low ability students. Conclusions of the study were: 1. Under-achieving students select significantly more F items than over-achieving students. 2. Further investigation with the F scale should be conducted before employment of the scale in test battery interpretation occurs, particularly for males. 3. The F scale represents a measure of social conformity. 4. The F scale possesses the ability to tap an academic mascu- linity-femininity continuum. 5. Re-evaluation of F scale concept and utility in clinical instru— ments should be conducted. Copyright by ROGER CLAY THWEATT 1961 THE DEVELOPMENT AND VALIDATION OF AN F SCALE FOR AN OBJECTIVE TEST BATTERY ON MOTIVATION By Roger Clay Thweatt A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY College of Education 1961 TABLE OF CONTENTS CHAPTER Page I. THE PROBLEM ..................... 1 Statement of the Problem . . ............ 8 Delimitations .......... . . .......... 8 Statement of the Hypotheses ............. 8 Background of Theory and Research ........ 9 Summary Statement of Organization ......... 19 II. PREVIOUS ATTEMPTS TO DEVELOP VALIDITY KEYS: A REVIEW OF THE LITERATURE ..... 20 TheLScale. . . . . . . . ............. 20 The F Scale ..................... 23 The K Scale ..................... 28 Subtle-Obvious Keys (S-O) .............. 41 Set T Scale ...................... 44 The B Scale ...................... 47 Miscellaneous Scales . ............... 49 Similarity Between Various Scales ......... 52 Summary ....................... 57 III. PROCEDURES . . . ................... 58 «Background of Farquhar's Study ........... 58 General Design of the Motivational Study ...... 59 Instrumentation . .................. 61 Procedures for the Present Investigation ...... 62 Rational of High Fake Expectancy . . . . ..... 66 Summary ................ . . . . . . .. 67 IV. RESULTS OF THE INVESTIGATION .......... 68 Selection of the F Scale ............... 68 Sex Differences in F Item Selection ......... 69 Distribution of F Items ............... 7O Reliability of the F Scale . . . . ......... . 71 Validation of the F Scale ............. . 72 Summary ....................... 83 V. SUMMARY AND CONCLUSIONS ............. 85 The Problem ..................... 85 Methodology . ..... . . ............ . 85 Conclusions ..................... 87 Implications for Further Research ......... 89 BIBLIOGRAPHY ............ . . . . . . . . . . . . . 91 APPENDIX ........................... . "100 ii TABLE I. II. III . IV. VI. VII. VIII. IX . XI. XII. XIII. LIST OF TABLES Intercorrelations of K with Other MMPI Variables . . Correlations of K Scale with Other Variables Thought to Be Loaded with the K-Factor . . .......... Intercorrelations of Five Scales Thought to Be Loaded with The Test—Taking Attitude, No Item Overlap, n = 150 Normal Males . ............. Intercorrelations of Four MMPI Validity Scale Indicators and Hy for Normals ....... Summary of Theory of Need-Achievement and Non- Need-Achievement Motivation Basic to Current Research . ..... Hypothesized Personality Factors Associated with Academic Achievement ......... Number of Rarity Items in Each Test . . . ...... Sex Differences in F Item Selection ....... Comparisons of F Item Means, Mean Squares and Sample Number Between Male and Female Misclassi- fied and Properly Classified Groups. . . . . ..... Comparisons of T-Values and Significant Levels of One Tailed Tests Between Male and Female Mis- classified and Properly Classified Groups Effects on GSCI Internal Consistency Reliability Before and After Application of the Male and Female F Scale . . . . . .................... Effects on the Validity Coefficient Between GSCI Raw Scores and Standardized Grade Point Averages After Application of the Male and Female F Scale ...... Significance of Difference Between Validity Corre- lation Coefficients Before and After Application of the Male and Female F Scale . Page 37 37 38 53 58 59 69 69 75 76 78 79 8O FIGURE II. III . IV. VI. VII. VIII. IX. LIST OF FIGURES Methodological Selection of Individuals with Stable Measured Aptitude. . . Method of Selecting Under- and Over-Achievers . Theorized Model of Selection of Misclassified Under- Achievers Differentiated by the GSCI. Selection Procedure for Determining F Item Overlap Score .. ..... Cross Validation F Item Frequency Distribution for Males (N = 132). . .............. Cross Validation F Item Frequency Distribution for Females (N: 132) . . . . . . ..... GSCI Overlap Points for Over- and Under-Achievers for Each Sex. . . ..... . . . ..... Male Regression Plot . Female Regression Plot iv Page 60 61 64 65 7O 71 73 81 82 CHAPTER I THE PROBLEM The most important failing of almost all structured objective tests is their susceptibility to faking or lying. In addition, objective tests possess an even greater susceptibility to unconscious self deception and role playing.1 The possibility of such factors having an invalidating effect upon scores has been noted by many writers. 2'26 One of the assumed advantages of the projective methods is that they are relatively lP. , E. Meehl and S. R. Hathaway, "The K Factor as a Suppressor Variable in the MMPI, " Journal of Applied Psychology, 1946, 30:525-564. 2C. R. Adams, "A New Measure of Personality, " Journal of Applied Psychology, 1941, 25: 141-151. 3G. W. Allport, "A Test for Ascendance-Submission, " Journal of Abnormal Psychology, 1928, 23:118-136. 4G. W. ~ Allport, "The Use of Personal Documents in Psychological Science, " Social Science Research Council Bulletin, 1942, Number 42. 5R. G. Bernreuter, "Validity of the Personality Inventory, " Personality Journal, 1933, 11:383-386. 6R. G. Bernreuter, "Theory and Construction of the Personality Inventory, " Journal of Social Psychology, 1933, 4:387-405. 7R. G. Bernreuter, "The Present Status of Personality Trait Tests, " Educational Research Supplement, 1940, 21:160-171. 8Marion Bills, "Selection of Casualty and Life Insurance Agents, " Journal of Applied Psychology, 1941, 25:6-10. 9E. S. Bordin, "A Theory of Vocational Interests as Dynamic Phenomena, " Educational and Psychological Measurement, 1943, 3:49-65. 10P. Eisenberg and A. Wesman, "A Consistency in Responses and Logical Interpretation of Psychoneurotic Inventory Items, " Journal of Eucational Psychology, 1941, 32:321-338. less influenced by such distorting factors. However, even on pro- jectives malingerers perform in a discriminable fashion. (Here the “J. P. Guilford and R. B. Guilford, "Personality Factors 5, E, and M and Their Measurement, " Journal of Psychology, 1936, 2:109-127. lZD. G. Humm and K. A. Humm, "Validity of the Humm-Wadsworth Temperament Scale: With Consideration of the Effects of Subjects' Response-Bias, " Journal of Psychology, 1944, 18:55—64. 13D. G. Humm and G. W. Wadsworth, "The Humm-Wadsworth Temperament Scale, " American Journal of Psychiatry, 1935, 91:163-200. “E. L. Kelley, c. (2. Miles and L. M. Terman, "Ability to Influence One's Score on a Typical Pencil and Paper Test of Personality, " Character and Personality, 1936, 4:206-215. 15D. A. Laird, "Detecting Abnormal Behavior, " Journal of Abnormal Psychology, 1926, 20:128-141. 16c. Landis and s. E. Katz, "The Validity of Certain Questions Which Purport to Measure Neurotic Tendencies, " Journal of Applied Psychology, 1934, 18:343-356. 1'J'J. B. Maller, "The Effect of Signing One's Name, " School and Society, 1930, 31:882-884. 18W. C. Olson, "The Waiver of Signature in Personal Reports, " Journal of Applied Psychology, 1936, 20:442-450. 19Saul Rosenzweig, "A Suggestion for Making Verbal Personality Tests More Valid, " Psychological Review, 1934, 41:400-401. ZOSaul Rosenzweig, "A Basis for the Improvement of Personality Tests with Special Reference to the M-F Battery, " Journal of Abnormal and Social Psychology, 1938, 33:476-488. 21F. L. Ruch, "A Technique for Detecting Attempts to Fake Performance on a Self-Inventory Type Personality Test. " In Quinn McNemar and M. A. Merrill, Studies in Personality (New York: McGraw- Hill, 1942), pp. 229-234. 2‘ZE. K. Strong, Vocational Interests of.Men and Women (Stanford: Stanford University Press, 1943). Z3P. M. Symonds, Diagnosing Personality and Conduct (New York: Appleton-Century, 1932). “P. E. Vernon, ”The Attitude of the Subject in Personality Testing, " Journal of Applied Psychology, 1934, 18: 165-177. clues are extreme cautiousness and hesitancy, rejection of cards, and a minimization of response in general. )27 The existence of a distorting influence in test-taking attitude is so obvious that it has hardly been thought necessary to establish it experimentally. However, a number of investigations have empirically demonstrated the effect. Frenkel-Brunswik investigated tendencies to self-deception in rating oneself, finding in some cases marked negative relations between self judgments and the evaluation of others.‘28 Hendrickson reported that a group of teachers earned significantly more stable, dominant, extroverted and self sufficient scores on the Bernreuter scales when instructed to take the test as though they were applying for a position, than when under more neutral instructions. 7‘9 On tests of mental ability,ma1ingerers try more items and make more errors than do intellectually inadequate persons. Malingerers also fail items that handicapped persons pass, and pass items that the defectives fail.30 A comparative study of malingerers and authentic psychiatric cases using the Cornell Selectee Index and a shortened form of the Shipley ~ " ZSJ. N. Washburne, "A Test of Social Adjustment, " Journal of Applied Psychology, 1935, 19:125-244. 26R. R. Willoughby and M. E. Morse, "Spontaneous Reactions to a Personality Inventory, " American Journal of Orthopsychiatry, 1936, 6:562-575. 27H. G. Gough, "Simulated Patterns on the MMPI, " Journal of Abnormal and Social Psychology, 1947, 42:215. 28E. Frenkel-Brunswik, "Mechanisms of Self-Deception, " Journal of Social Psychology, 1939, 10:409—420. 29G. Hendrickson, "Attitudes and Interests of Teachers and Prospective Teachers, " (paper read before Section Q, AAAS, Atlantic City, December 27, 1932). 30W. A. Hunt and H. J. Older, "Detection of Malingering through Psychometric Tests, " United States Naval Medical Bulletin, 1943, 41: 1318. Personality Inventory found that malingerers scored significantly 31 Ruch showed that college students could fake higher on both tests. extroversion on the Bernreuter to the extent of achieving a median at the 98th percentile on Bernreuter's norms, as contrasted with a naive median at the 50th percentile.” Bernreuter found that college students could produce marked shifts in their Bernreuter scores in the socially approved direction. He interpreted this finding, however, as indicating the comparative unimportance of the faking tendency. His reasoning was that had the need for giving socially approved re— sponses operated in the first administration to any extent, the effect of special instructions to take this attitude should not have been great. 33 To Meehl and Hathaway this reasoning seemed rather tenuous, inasmuch as the occurrence of a shift merely shows that conscious and permitted faking can produce greater effects than those which may have been operating in the naive original testing.34 Meehl and Hathaway further state: The insignificant correlations between naive and faked scores were also used by Bernreuter to support his view, an argument which is not comprehensible in view of the gross skewness of the faked scores. What is clear from his investi- gation is that people are able to influence their scores to a considerable extent if they choose to, and that the average student's stereotype of what is socially desirable seems to be an individual who is dominant, self sufficient and stable.35 31W. A. Hunt, “The Detection of Malingering: A Further Study, " United States Naval Medical Bulletin, 1946, 46:249. 32F. L. Ruch, "A Technique for Detecting Attempts to Fake Performance on a Self-Inventory Type of Personality Test. " In Quinn McNemar and M. A. Merrill, Studies in Personality (New York: McGraw-Hill, 1942), pp. 229-234. 33R. G. Bernreuter, ”Validity of the Personality Inventory, " Personality Journal, 1933, 11:383-386. 3“P. E. Meehl and S. R. Hathaway, ”The K Factor as a Suppressor Variable in the MMPI, " Journal of Applied Psychology, 1946, 30:525-526. 35lleid. Bordin reports that students acquainted with the occupational groupings included in the Strong Vocational Interest Blank were able to simulate certain specified occupational types, even though the students were unfamiliar with the mechanics of scoring.36 He points out that one factor determining the profile on a test of this kind is the degree of acceptance of an occupational stereotype as a self-description. It is as if a person would ask himself the question, "Who am I?" and then answer the test items in a manner consistent with the resulting self-conception. A second important factor is the degree of knowledge of the true occupational stereotype. This factor will determine the clarity of the obtained interest pattern.37 Similarly, if a subject is attempting to reSpond as a psychoneurotic on a personality inventory, the success of the trial will be largely influenced by his understanding of the neurotic syndrome in its intimate as well as its obvious aspects.38 Clinical observations substantiate the above conception of malingering. Ossipov states that every malingerer is an actor who portrays an illness ”as he understands it. " The malingerer goes to extremes, apparently believing that the more eccentric his behavior, the more disordered he will be thought to be. In addition to exaggeration of symptoms, the malingerer tends to act as a "state" or an episode, but not a disease. For this reason Ossipov emphasizes that the entire clinical picture must be carefully evaluated, especially the configura- tion of symptoms, in distinguishing a feigned from a genuine illness.39 36E. S. Bordin, "A Theory of Vocational Interests as Dynamic Phenomena, " Educational and Psychological Measurement, 1943, 3:57. 37Ibid., p. 54. 38H. G. Gough, "Simulated Patterns on the MMPI, " Journal of Abnormal and Social Psychology, 1947, 42:216. 39V. P. Ossipov, "Malingering: The Simulation of Psychosis, " Bulletin of the Menninger Clinic, 1944, 8: 39-42. Maller,40 Metfessel, M Olson,42 and Spencer43 have studied the effects of anonymity on responses to self-rating situations and found that the requirement of signing one's name has a definite effect on scores. Kelly, Miles and Terman demonstrated the great ease with which scores on the Terman-Miles Masculinity-Femininity Test could be faked in either direction once the subjects had been informed concerning the secret of what the test measured.44 Strong, ‘5 Bills,46 Steinmetz,47 as well as Bordin,48 have presented evidence of the ability of subjects to distort their interest patterns when taking the Strong Vocational Interest Blank. There are several reports of MMPI simulation in the literature. Benton had nine homosexuals who were positively identified on the Mf scale retake the test and try to conceal their femininity. Six of the 40J. B. Maller, "The Effect of Signing One's Name, " School and Society, 1930, 31:882-884. 41M. Metfessel, "Personality Factors in Motion Picture Writing, " Journal of Social and Abnormal Psychology, 1935, 30:333-347. 42W. C. Olson, "The Waiver of Signature in Personal Reports, " Journal of Applied Psychology, 1936, 20:442-450. 43D. Spencer, "Frankness of Subjects on Personality Measures," Journal of Educational Psychology, 1938, 29:26-35. 44E. L. Kelley, c. c. Miles and L. M. Terman, "Ability to Influence One's Score on a ’Typical Pencil and Paper Test of Personality, " Character and Personality, 1936, 4:206-215. 45E. K. Strong, Vocational Interests of Men and Women (Stanford: Stanford University Press, 1943). 4’G’Marion Bills, "Selection of Casualty and Life Insurance Agents, " Journal of Applied Psychology, 1941, 25:6-10. 47H. C. Steinmetz, "Measuring Ability to Fake Occupation Interest, " lournal of Applied Psychology, 1932, 16:123-130. 4‘8E. S. Bordin, "A Theory of Vocational Interests as Dynamic Phenomena, " Educational and Psychological Measurement, 1943, 3:49-65. nine participants were able to bring their Mf scores within normal limits.49 Meehl and Hathaway had 54 psychological trainees take the MMPI as if trying to avoid being drafted for military service, and obtained F scale T-scores of 78 or higher in 96 percent of the cases. In addition to high F scores, most of the profiles would have been clinically invalidated because of their highly unusual configurations. 5° Meehl and Hathaway suggest that it is quite possible that in developing personality questionnaires constructed in the traditional, _a_ priori fashion and refined by statistical manipulation, the test-maker is merely pooling sets of items to differentiate among people with respect to various test-attitude continua of little or no psychiatric relevance. The underlying disposition which leads a subject to respond in a certain way to such questions may or may not be identical with the dispositions recognized as clinical variables, nor with those that might be suggested by the item content. To Meehl and Hathaway it is quite clear on present evidence that identification cannot be established by an assumed equivalence between non-test behavior and the verbal report. Hence, both a priori selection of items and the psychological naming of a statistically homogeneous scale from its item content are fraught with possibilities of error.51 Guilford has explicitly called attention to the importance of the problem of test-taking attitudes as "factors" when he says: We must constantly remember that the response of a subject may not represent exactly what the question implies in its most obvious meaning. Subjects respond to a question as at the moment they think they are, with perhaps a lack of 49A. L. Benton, "The MMPI in Clinical Practice, " Journal of Nervous and Mental Disorders, 1945, 102:416-420. 50P. E. Meehl and S. R. Hathaway, "The K Factor as a Suppressor Variable in the MMPI, " Journal of Applied Psychology, 1946, 30:550-551. 51Ibic1., pp. 553-554. insight in many cases as to their real position on the question. They also respond as they would like themselves to be and as they would like others to think them to be and as they wish the examiner to think them to be. They also respond with some regard to self-consistency among their own answers. Whether these determining factors are sufficiently constant to set up individual differences which are uniform in character and so constitute common factors in themselves is difficult to say. Should any one of them be so pervasive it should introduce an additional vector in the factor analysis.” Statement of the Problem The purpose of this investigation was: 1) to determine whether or not an F scale validity key could be developed for an objective test battery on motivation; and 2) to validate the develOped key. 53 Delimitations This study used the administered test protocols of eleventh grade Michigan high school students who were participants in Farquhar's motivational research investigation. 54 Statement of the Hypotheses The following hypotheses were examined in this investigation: 1) an F scale can be developed for each sex for four of the six 52J. P. Guilford and R. B. Guilford, "Personality Factors S, E, and M and Their Measurement, " Journal of Psychology, 1936, 2:118. 53W. W. Farquhar, "A Comprehensive Study of the Motivational Factors Underlying Achievement of Eleventh Grade High School Students, " (East Lansing: Approved Research Application of the Commissioner of Education, United States Office of Education, November 1, 1959). 15 pp. (Mimeographed.) “Ibid. inventories comprising the objective test battery on motivation; and 2) the F scale can differentiate protocols of those test-takers who were uncooperative, who could not comprehend the test items, who made clerical errors, and who intentionally placed themselves in a bad light . Background of Theory and Research Among the many authors who recognize the problem of detection of malingering and falsification on objective tests there are but a few who have made specific suggestions for its solution. The inclusion of special exhortations to frankness and objectivity in the test directions themselves is common, but there is no e/idence as to its effectiveness.55 Obviously, if a subject is consciously determined to fake, he will do so; whereas if his motivation to distortion is of a more subtle, non-verbalized nature, such exhortations can hardly be expected to be efficacious. Another method is to attempt to disguise the content of items, so that the significance of a given response is less obvious. Traditional approaches to the measurement of personality render this technique practically impossible, inasmuch as the items are selected to begin with for their obvious psychological significance. Hence, unless the items are changed so greatly as to no longer elicit the de— sired information, they will almost inevitably continue to betray their origin. An effective use of a set of subtle items is only possible when the initial item pool is large and the initial selection of items is ruth- lessly empirical. Those items whose significance would not have been guessed by the test-maker will then be equally mysterious to the testee. The presence of projective and role playing components of test-taking 55F. E. Meehl and S. R. Hathaway, "The K Factor as a Suppressor Variable in the MMPI, " Journal of Applied Psychology, 1946, 30:527. 10 behavior should be recognized in objective personality inventories.56 A spurious anonymity using secret coding for identifying the testee is a possibility suggested by the studies cited above, but is clinically impractical and the deception involved is not desirable.57 Lacking anonymity, it has been suggested by Olson that the name be signed at 8 the conclusion of the test administration instead of at the top of the page.5 This suggestion was carried into practice by Maller in his Character Sketches.59 In addition, he also stated the questions in the third person (indirect) form, requiring the subject to indicate whether he was the same or different from the person described. Maller presents evidence that this procedure aroused considerable less annoyance in his subjects; however, direct proof that this decrease in annoyance led to increased validity is lacking. Meehl expresses doubt whether or not the removal of personal reference is wholly desirable because there is reason for believing that the same role playing and self-deception which operate to invalidate some of the measurements are an important factor in making other measurements possible. 60 Another technique for reducing the effect of signing one's name is to have the items printed on cards which are then sorted by the subject. Such a procedure makes all writing unnecessary and it is assumed that 56P. E. Meehl, ”The Dynamics of Structured Personality Tests, " Journal of Clinical Psychology, 1945, 1:296-303. 57F. E. Meehl and S. R. Hathaway, "The K Factor as a Suppressor Variable in the MMPI, " Journal of Applied Psychology, 1946, 30:527. 58W- C. Olson, "The Waiver of Signature in Personal Reports, " Journal of Applied Psychology, 1936, 20:442-450. 59J. B. Maller, Character Sketches (New York: Bureau of Publi- cations, Teachers College, Columbia University, 1932). “P. E. Meehl and S. R. Hathaway, "The K Factor as a Suppressor Variable in the MMPI, " Journal of Applied Psychology, 1946, 30:527- 528. 11 the feeling that one is making a permanent record of his personal fail- ings is lessened. This method has been employed by Maller in a revised test (Personality Sketches) and by Hathaway and McKinley in the MMPI.61' 67‘ However, evidence supporting the above assumption concerning the increased validity of performance is lacking. Although all of these strategems may have a considerable value particularly in the aggregate, the fact still remains that they do not by any means remove the possibility of faking. What is much more im— portant, they are mainly directed at the sort of conscious falsehood which most investigators have stressed, while ignoring the more subtle tendencies to self deception which are probably of even greater import- ance in affecting scores. Also, they neglect to stress the existence of trends in the Opposite direction--namely those trends which exaggerate the apparent abnormality or maladjustment of the individual. Meehl and Hathaway state that it is only natural that the tendency of a testee to put himself in a favorable light should have received more attention than the contrary tendency. However, there is considerable evidence that this latter tendency does exist and that it is a much more important factor in determining scores on inventories than has generally been supposed.63 It is also probable that certain systematic differences in item- interpretation, not necessarily a function of personality dynamics of the defensive or self-critical sort but relatively neutral psychologically (semantic variation), lead to score deviations which are misleading. “J. B. Maller, "Personality Tests." In J. M. Hunt, Personality 31d the Behavior Disorders (New York: Ronald Press, 1944). 623. R. Hathaway and J. C. McKinely, "A Multiphasic Personality Schedule: I. Construction of the Schedule, " Journal of Psychology, 1940, 10: 249-254. 63P. E. Meehl and S. R. Hathaway, "The K Factor as a Suppressor Va riable in the MMPI, " Journal of Applied Psychology, 1946, 30:528-529. 12 Such problems have been investigated by Benton, 64 Eisenberg, '35 and Eisenberg and Wesman. 66 A more fruitful attitude was taken by Rosenzweig in which he reiterated the fact of untrustworthiness of self-ratings and indicated that instead of trying to eliminate completely these sources of error, the test-maker should recognize them and attempt to correct for them in interpreting the results. He says: Astute phraseology in the instructions and questions of the test have sometimes been resorted to, but such expedients are rarely very effective. Might it not be more effective to recognize at the outset that such tests have certain limitations that can never be completely circumvented and then go on to the measurement of these limiting factors themselves, thus obtaining information by which a correction may be applied to the subject's answers ?67 Rosenzweig's specific prOposal for achieving this end was to include among the usual self- rating items a set of items of the form "I should like to be the sort of man who . . . , " on the theory that if the test-maker knew something of the strength of certain "ideal-self" trends in the person, the investigator could make appropriate correction for these trends in interpreting responses to the traditional items. Rosenzweig, however, never carried this idea into practice. On the other hand, Meehl and Hathaway consider that this approach would be relatively ineffective, since they feel what is desired is not a statement of the strength or number of ideals for the self, but a measure of the extent to which they are allowed to distort responses. In other words, 64A. L. Benton, "The Interpretation of Questionnaire Items in a Personality Inventory, " Archives of Psychology, 1935, Number 190. 65P. Eisenberg, "Individual Interpretation of Psychoneurotic Inventory Items, " Journal of Genetic Psychology, 1941, 25:19-40. 66P. Eisenberg and A. Wesman, "A Consistency in Responses and Logical Interpretation of Psychoneurotic Inventory Items, " Journal of Educational Psychology, 1941, 32:321-338. 6'ISaul Rosenzweig, "A Suggestion for Making Verbal Personality Tests More Valid, " Psychological Review, 1934, 41:400-401. 13 a subject might easily have lofty ideals verbally expressed, but might be too honest, insightful, objective or self critical to distort his responses into agreement with these ideals.68 Maller attempted to solve this problem in another way in his Character Sketches by including a small set of items which were supposed to measure the subject's "readiness to confide. " The occurrence of normal, well-adjusted scores in combinations with a low-measured "readiness to confide" would lead one to be skeptical of the validity of the measurement.69 However, the "readiness to confide" items were them- selves self ratings on readiness. In the later form called Personality Sketches Maller does not make use of the "readiness to confide" concept, so it may be assumed that it was unsuccessful or at least did not materially improve validity. Meehl and Hathaway, carrying Rosenzweig's thinking to its logical conclusion, consider the obvious procedure to follow is to give the subject a good chance to distort his answers in accordance with some self picture or conscious facade, and observe the extent to which he does so.70 Of course, the difficulty here is that such a procedure requires a knowledge of the objective and subjective facts which are usually inaccessible. Here there are apparently three possibilities Open to the test builder. First, he may sidestep the problem of getting directly at the objective truth, and attempt to establish falsehood by obtaining internal contradictions, a technique employed by Maller in his earlier test. Cady, in his application of a modified form, of the Woodworth 68P. E. Meehl and S. R. Hathaway, "The K Factor as a Suppressor Variable in the MMPI, " Journal of Applied Psychology, 1946, 30:529. 69J. B. Maller, Character Sketches (New York: Bureau of Publi- cations, Teachers College, Columbia University, 1932). 70P. E. Meehl and S. R. Hathaway, "The K Factor as a Suppressor Variable in the MMPI, " Journal of Applied Psychology, 1946, 30:530. l4 Psychoneurotic Inventory to the measurement of juvenile incorrigibility, had earlier made use of repeated items to increase reliability of the scores (although the aim of detecting inconsistency of the fake sort was ).71 Each question appeared twice, once in not explicit in his rationale each section of the test, except that in the second appearance the question was phrased in the negative. Theoretically the subject's response should also be reversed, and the number of failures to reverse is an indication of some inconsistency. Hence, a measure of non-cooperation or dishonesty would be obtained. The inconsistency score obtained in this way was to be subtracted from the adjustment score to get a sort of corrected score as pr0posed by Rosenzweig. However, Meehl and Hathaway point out that it is by no means obvious that the shift to a negative form of item will leave the projective properties of the stimulus simply reversed in meaning; so that the fact of an inconsistency in the strict logical sense would not necessarily imply lack of c00peration or dishonesty. However, it would seem reasonable that a very large number of such inconsistent pairs would cast grave suSpicion upon the scores, either for dishonesty or some equally serious reason.7‘2 This technique also was abandoned by Maller in his revised instrument. The second method of using distortion is to present opportunities for answering in an extremely favorable way, but in a way which could almost certainly not be true. This idea was employed by Hartshorne and May in the Character Education Inquiry.73 If it is assumed that there are very few aspects of behavior for which one could have complete confidence that no subject would be "ideal" in them, it is necessary to “V. M. Cady, "The Estimation of Juvenile Incorrigibility, " Journal of Delinquency Monographs, 1923, Number 2. 1"2P. E. Meehl and S. R. Hathaway, "The K Factor as a Suppressor Variable in the MMPI, " Journal of Applied Psychology, 1946, 30:530. 73H. Hartshorne and M.‘ A. May, Studies in Deceit (New York: Macmillan, 1928. 15 present a considerable number of such Opportunities and progressively reduce the probability that any individual would be as described. In this sense, everyone would possess at least a few highly desirable traits, and no one would be the possessor of all. Without knowing anything whatsoever about a particular person, the test-maker could write on rational grounds a list of extremely good and rare human qualities which is statistically absurd to suppose will all or in large part be possessed by the individual. If the testee says, however, that he has all or a great many of them, it can be decided that he is not telling the truth. The answers to these items could yield strong evidence for deception.74 The Humm-Wadsworth Temperament Scales has made use of the socially-desirable response method.75 Humm and Wadsworth deserve credit for having been among the first investigators of structured personality measurement to lay great stress upon the problem of detect- ing non-cooperation and distortion of response when evaluating a particular profile of scores. They were also among the first to adopt an explicit and uncompromising empiricism in selecting items from a large initial pool. The two scales which serve as "checks" or "correctors" for the remainder of the profile on the Humm-Wadsworth are the "normal" component and the "no-count. " The "normal" component attempts to assess the strength of a general inhibiting, controlling or normalizing factor in personality which Humm and Wadsworth considered always present to act as a "brake" upon strong abnormal tendencies on the other variables. This means that in interpreting a given profile the significance of any deviation on one of the abnormal components must be established with the size of the normal score in mind. However, Meehl 74P. E. Meehl and S. R. Hathaway, "The K Factor as a Suppressor Variable in the MMPI, " Journal of Applied Psychology, 1946, 30:531. 75D. G. Humm and G. W. Wadsworth, "The Humm-Wadsworth Temperament Scale, " American Journal of Psychiatry, 1935, 92: 163-200. l6 and Hathaway question Humm and Wadsworth's claim for the normal component. 76 The "no-count" is based upon the number of items to which the subject responds in the negative. Inasmuch as approximately 76 per- cent Of the scored items of the Humm-Wadsworth are "Obviously" suggestive of abnormality when replied to affirmatively, the "no—count" is to some extent a measure of the testee's tendency to avoid, consciously or otherwise, saying "bad" things about himself when taking the test. That this relationship occurs is further supported by the tendency for the no-count to correlate positively (. 77) with the normal component and negatively with the various abnormal components." If the no-count is excessively great, the inference is that the subject has responded in a very defensive or possibly stereotyped fashion; and therefore the particular testing is of doubtful validity. Humm and Wadsworth state that as high as 25 or 30 percent of normals seem to invalidate their scores in this way, a proportion which seems to Meehl and Hathaway to be impractically high for clinical purposes.78 Later, Humm, Storment and Iorns attempted to reduce the proportion of useless tests by a "correction" for the no-count based upon multiple regression procedures.79 A study of hospitalized psychiatric cases by Arnold indicated that even the exclusion Of cases with invalid no-count did not result in any greater 76P. E. Meehl and S. R. Hathaway, "The K Factor as a Suppressor Variable in the MMPI, " Journal of Applied Psychology, 1946, 30:531. 77D. G. Humm and G. W. Wadsworth, "The Humm-Wadsworth Temperament Scale, " American Journal of Psychiatry, 1935, 92:174. 78F. E. Meehl and S. R. Hathaway, "The K Factor as a Suppressor Variable in the MMPI, " Journal of Applied Psychology, 1946, 30:531. 79D. G. Humm, R. C. Storment and M. E. Iorns, "Combination Scores for the Humm-Wadsworth Temperament Scale, " Journal of Psychology, 1939, 7:227-253. 17 validity clinically than was obtained using all cases. 80 Hmnm stated that improved multiple regression techniques have resulted in a marked reduction in the proportion of test misses and of uninterpretable profiles. 81 Washburn, in revising his Test of Social Adjustment, included a set of 21 items modeled after the ”lie" items of Hartshorne and May and referred to the total score on this set as "Objectivity. " This score was included to detect both lying and unintentional inaccuracy. An extremely low objectivity score was said to invalidate the test as a whole. A weighted objectivity score was included in the total score on the entire test. 82 The third technique available is the empirical derivation of a fake scale by making use of the item shifts obtained when persons take a test under normal naive conditions and then are retested with instruc- tions to fake. This method has been used by Ruch to construct an "honesty" key for the Bernreuter.83 To Meehl and Hathaway it is interesting that such a procedure "so logical and straightforward, invented to solve a problem so obvious and insistent, should have been employed for the first time over twenty years after the appearance of the first personality inventory. "84 Ruch says: 80D. A. Arnold, "The Clinical Validity of the Humm-Wadsworth Temperament Scale in Psychiatric Diagnosis, " (unpublished Doctor's thesis, University of Minnesota, Minneapolis, 1942). 81P. E. Meehl and S. R. Hathaway, "The K Factor as a Suppressor Variable in the MMPI, " Journal of Applied Psychology, 1946, 30:532. 8‘ZJ. N. Washburne, "A Test of Social Adjustment, " Journal of Applied Psychology, 1935, 19:125-244. 83F. L. Ruch, "A Technique for Detecting Attempts to Fake Performance on a Self-Inventory Type of Personality Test. " In Quinn McNemar and M. A. Merrill, Studies in Personality (New York: McGraw-Hill, 1942 ), pp. 229-234. 84P. E. Meehl and S. R. Hathaway, ”The K Factor as a Suppressor Variable in the MMPI, " Journal of Applied Psychology, 1946, 30:532—533. 18 The argument is rather simple. If answers to items on a test like the Bernreuter can be faked at all, the chances are that some are easier to fake than others. Therefore, it should be possible to give each item a weight to represent the extent to which it can be faked by the average college student. This was done by tabulating the frequency of each answer to each question for the standard condition and for the influenced condi- tion. These frequencies were converted into percentages, and an honesty weight was assigned to each reply according to the magnitude of the critical ratio of the difference between the frequency of the reply in the honest and in the influenced condi- tion. 85 Ruch seems to have been the first investigator to attempt empirical derivation of a fake key for a question-answer personality inventory. 86 As was stated earlier by Meehl and Hathaway there is evidence of a tendency on the part of some testees to make themselves appear in a "bad" light in taking personality tests. Such a tendency is difficult to characterize because it may occur on several different bases. A patient in the hospital may engage in a type of malingering for strictly conscious reasons, presenting a profile on a test which shows abnormalities out of all reasonable proportion to what is apparent from other considera- tions. Again, there may be somewhat general traits of verbal pessimism or self deprecation which act to distort systematically the results Of personality measurement. Meehl and Hathaway have dichotomized the test-attitude continuum by the two Opposed terms defensiveness and plus- getting. However, they make no implication concerning the degree of conscious, deliberate deception involved in either. The corresponding extremes of deliberate deception are referred to as faking good and faking bad, respectively. It was recognized that, like the defensive tendency, the plus-getting tendency might exist in all degrees from a mild self-criticality or merely objectivity to a deliberate, conscious 85lhid., p. 231. 86F. E. Meehl and s. R. Hathaway, "The K Factor as a Supressor Variable in the MMPI, " Journal of Applied Psychology, 1946, 30:533. 19 attempt to make oneself look abnormal. Whether this represents simply the extreme of a continuum with faking good at the opposite end, or an entirely new and different factor, was undecided by these authors. Meehl and Hathaway state: In any case it would be desirable to deveIOp a scale for detecting these test-taking tendencies to put oneself in a bad light when answering a personality inventory, so that allowance might be made in such cases in the light of a deviant score Obtained on such a scale. 87 Summary Statement of Organization The remainder of this investigation is presented in four sub- divisions: in Chapter 11 previous attempts to develop validity keys are epitomized; the motivational research study with its instrumentation and the procedures utilized in the present investigation are discussed in Chapter III; in Chapter IV F scale development, sex differences in F item selection, F scale reliability and Validation‘are discussed; and in Chapter V the summary and conclusion of the investigation are presented. In Chapter II the literature appertaining to validity scale develop- ment is reviewed. 87lhid., pp. 533-534. CHAPTER II PREVIOUS ATTEMPTS TO DEVELOP VALIDITY KEYS: A REVIEW OF THE LITERATURE For the present investigation all of the major validity scales were reviewed. The order of discussion of the various keys was based upon the chronology of their development with the earliest devised scales discussed first. In addition to the major scales, however, several minor scales had to be considered in order to bring about a more lucid understanding Of the major keys. The L Scale The ”lie" scale of the MMPI attempts to identify those individuals who try to falsify their score by choosing responses which they feel are most acceptable socially.1 The original fifteen MMPI items making up the L scale were selected under the inspiration of the work of Hartshorne and May. 7‘ Each of the items presents a situation desirable socially, but which is rarely true of an individual. It was recognized by Hathaway and McKinley that extremely conscientious persons would frequently have more than the average of the L items validly positive, but it was assumed that for a person to have six or eight such items marked was highly improbable.3 It was concluded by these investigators 1s. R. Hathaway and J. C. McKinley, Manual for the MMPI (New York: The Psychological Corporation, 1945). zH. Hartshorne and M. A. May, Studies in Deceit (New York: Macmillan, 1928). 3P. E. Meehl and S. R. Hathaway, “The K Factor as a Suppressor Variable in the MMPI, " Journal of Applied Psychology, 1946, 30:537-538. 20 21 that the fifteen items of this type scattered among the main body of the items constituted a fairly subtle trap for anyone who wanted to give an unusually good impression of himself. The standardization procedure revealed that among the various normal groups the mean score on the L items lay between three and five items. The frequency curves were all skewed sharply in the positive direction. Few individuals obtained raw scores of seven or more. Only two or three percent exceeded ten items. These values were arbitrarily called the 60 and 70 T-score points, respectively. As more data were accumulated Hathaway and McKinley concluded that the original tentative assumptions regarding the meaning of L were in the main correct, but other valid interpretations of L in the range from T-score 56-70 also existed. The original arbitrary assignment of T-scores had been too conservative, and more emphasis was placed on the T-score range of 56-60. To Hathaway and McKinley the positive presence of the rise in the L score seemed quite valid as an indicator that the individual taking the test was being dishonest and might be somewhat unreliable. However, if no rise in L was observed, these investigators offered no positive or clear interpretation.4 To check the assumption that L would not identify the more sophisti- cated subject, an experiment was performed with 53 male psychology students. The participants, who had completed a considerable portion of their training in psychology, were asked to take the MMPI twice. The first administration was done in the standard manner. In the second administration the group was asked to make certain that they would be acceptable to army induction. Half the group took it with fake good instructions first, half second. Through this procedure a faked good record and a normal record were both obtained. 5 ‘lbid. 51bid. 22 These records showed no appreciable rise in L. It is also true, however, that the majority of the profiles were only slightly better than the corresponding non-faked profiles. Although one might conclude from the design of the above experi- ment that the outcome simply tested the participants' willingness and/or ability to fake good, Hathaway and McKinley held that the results demonstrated that the intent to deceive is not often detectable by L when the subjects are relatively normal and sophisticated.6 Cottle, however, found additional support for one of the conclusions of Hathaway and McKinley, namely, that sophisticated, bright individuals tend to score low on L items. In his study with 100 high level college students on the MMPI Cottle found that the mean L score on the card form was 2. 54 raw score points and for the booklet form was 2. 73 raw score points.7 However, no conclusions can be reached from this study concerning the hypothesis that the L scale is a valid key in differentiating between individuals who wish to fake good and those who do not. Hovey discusses three cases of individuals who discovered the scoring purpose of cuts on MMPI cards. He says that the L score in these cases was zero.8 On the other hand, Cofer and others when study- ing the effect of malingering on the MMPI found that the instructions to fake a normal profile raised L scores.9 From the evidence presented in the literature no definite conclusions can be drawn at this time concerning the efficacy of the L scale. No prob- ability statement regarding its differentiating power is offered. 6Ihid. 7W. C. Cottle, "Card Versus Booklet Forms of the MMPI, " burnal of Applied Psychology, 1950, 34:255-259. 8H. B. Hovey, "Detection of Circumvention in the MMPI, " fiurnal of Clinical Psychology, 1948, 4:97, 9C. N. Cofer, J. Chance and A. J. Judson, "A Study of Malinger- ing on the MMPI, " Journal of Psychology, 1949, 27:491-499. 23 The literature only rationally suggests that factors of degree of training and psychological SOphistication may influence unduly an individual's performance on L items. The F Scale In the original publication of the MMPI the F scale was not pre- sented as an empirically validated variable. Its validity was assumed onapriori grounds. The key was composed of 64 items which were selected because they were answered with relatively low frequency (10 percent or less) in either the true or false direction by the main normal group. The scored direction of response was the one which was rarely made by unselected normals. Additionally, the items were chosen to include a variety of content so that it was unlikely that any particular pattern would cause an individual to answer many of the items in the unusual direction. The relative success of this selection Of items, with the deliberate intent of forcing the average number of items answered in an unusual direction downward, was illustrated in the fact that the mean score on the 64 items ran between two and four points for all normal groups. The distribution curve was extremely skewed; the higher F scores approached half the total number of F items. In distributions of normal persons the frequency of scores drOpped rapidly at about seven items and was at the two or three percent level of chance by a score of twelve. Because of this quick cutting off of the curve and scores seven and twelve were arbitrarily assigned T-score values of 60 and 70 in the original F table. 1° From the first Hathaway and McKinley recognized that F represented several interpretations. 1) The subject would need to sort almost all of 10P. E. Meehl and S. R. Hathaway, "The K Factor as a Suppressor Variable in the MMPI, " Journal of Applied Psychology, 1946, 30:535-536. 24 the items according to expectation in order for these low scores to result; and any error in recording, such as mistaking true items for false items and the like would raise the F score appreciably. 2) If a subject could not understand what he was reading adequately enough to comprehend fully the answers to these items, the F score would obviously be higher. 3) Persons who were highly individualistic and independent might honestly make infrequent responses to F score items. For example, such individuals might admit to disliking children and not believing their mother was a good woman. 4) It was early discovered that schizoid subjects and subjects who apparently wished to put themselves in a bad light also obtained high scores. Meehl and Hathaway felt that the schizoid group obtained high scores because they said unusual things due to de- lusional or other aberrant mental states in responding to the items. This was referred to as distortion since it was considered that an impartial study would not justify the patient's placement. Among more normal persons some high scores were also observed where the individual had rather unusual ways of responding to conventional stimuli. For example, to the item, "I have had periods in which I carried on activities without knowing later what I had been doing, " most persons answered false. Some individuals, however, included periods of sleep in the implication of the item. One might argue that such ways of thinking are often allied to schizoid mentation generally and that the answers in this case indicate a true abnormality. At the very least, however, the person is responding to some items in a way that differs from that of most individuals. Meehl and Hathaway conclude that such persons might, therefore, not be appro- priately approached through this method of personality measurement. To them it seemed reasonable that there are individuals whose habitual ways of reacting to items are so different from other persons that measure- ment of their personalities through the use of verbal items of this type would reflect the unusualness of their reactions to the items more than any clinical abnormality. u “Ibid., p. 536. 25 Clinical eXperience suggested to Meehl and Hathaway that the usual critical score of T equals 70 was too low in the case of F. They found that scores ranging up to T scores of 80 were often more a reflection of validly unusual symptoms and attitudes than an indication of invalidity in the rest of the profile due to misunderstanding, etc. However, scores above this strongly suggested an invalid record. 12 As a result, it was decided that scores above 70 would indicate the whole record to be invalid, except in the special cases mentioned above of schizoid tendencies. Scores from 60-70 would be considered Opened to suspicion; scores from 50-60 would be considered a reliable sign that item comprehension, clerical work, etc. , had been satisfactory and that the subject was similar to persons in general. When the MMPI was administered to incoming servicemen it was possible to consider the F score as evidence of an attempt to malinger and to obtain fallaciously bad scores on the other MMPI scales. To check on this interpretation, a similar study to the investigation conducted on the L scale (see pages 21-22) was devised. A group of 54 service men who had completed a considerable portion of their training in psychology were asked to take the MMPI twice. The group took the MMPI in the standard way and also took it under instructions to assume that they wished to avoid being accepted in the draft; and in order to be rejected they were to obtain adverse scores without giving themselves away. The order was reversed for half of the group for the test administration. Through this plan a faked bad record and a normal record were obtained. The data reveals that 96 percent of the faked bad records had a raw F score of 15 or more (T-score of 78 or greater). The researchers concluded that even these men who were somewhat cognizant of psychological measurements betrayed themselves when they attempted to fake a bad record. 13 The F lzIhid., pp. 536-537. 13'Ihid., p. 537. 26 scale was a good device for identifying the intentional faking that could be set up in an experimental situation. Kazan and Sheinberg found, however, that a high F score is rarely an invalidating factor with abnormal subjects. In their investigation of 170 maladjusted, male servicemen, all the items of the F score were not answered in the infrequent direction less than 10 percent of the time by normals, and that the percentage was but little higher for miscellaneous abnormal subjects. 14 Schmidt, likewise, found that the profiles for psychotics rose more sharply at F than for any other clinical group. 15 Schneck also found a high F score less valid in a study of character dis- orders in an army disciplinary barracks. 16 Cofer and others show that subjects attempting to fake emotional upset were detected easily by the F score. 17 Another study of faking on the MMPI by Gough reports that feigned psychotic curves can be detected by being too low on neurotic scales, too high on psychotic scales, and by a significantly elevated F scale. Gough also found that feigned neurotic curves were identified by high F and low K scores. 18 Mechanical sorting using an index of F minus K correctly selected 82 percent of these feigned neurotic profiles. This study used eleven individuals with a background in psychology or psychiatry as contrasted with controls of thirteen hOSpitalized paranoid schizophrenics and 57 severe psychoneurotics. Gough concludes that ”A. T. Kazan and 1 M. Sheinberg, "Clinical Note on the Significance of the Validity Score F in the MMPI, " American Journal of Psychiatry, 1945, 102:181-183. 15H. O. Schmidt, "Test Profiles as a Diagnostic Aid: the MMPI, " Journal of Applied Psychology, 1945, 29: 115-131. 16J. M. Schneck, "Clinical Evaluation of the F Scale on the MMPI, " American Journal of Psychiatry, 1948, 104:440-442. 17C. N. Cofer, J. Chance and A. J. Judson, "A Study of Malinger- ing on the MMPI, " Journal of Psychology, 1949, 27:491-499. 18H. G. Gough, "Simulated Patterns on the MMPI, " Journal of Abnormal and Social Psychology, 1947, 42:215-225. 27 relatively skilled persons are unable to simulate either a psychoneurotic or psychotic condition on the MMPI in such a way as to avoid detection. Hunt used a group of 109 psychology students and 74 Navy general court martial prisoners to investigate the effect of deliberate deception. 19 He substantiated Gough’s discovery that an index of F minus K correctly identifies a substantial proportion of malingered or faked abnormal pro- files. However, Hunt concluded that the index was of no use in detection of faked normal profiles. A later report on the F minus K index by Gough suggests that the sampling distribution of F minus K is reasonably normal and the index is not particularly distorted by psychiatric abnormality. 20 He presents data from Sweetland which support the use of F minus K. 7‘1 After reviewing the MMPI literature Cottle found that high scores on F are caused by carelessness of the subject, scoring errors, psy- chotic state, or deliberate faking of an abnormal profile. He, too, con- cluded that the F scale is of little use in detecting faked normal profiles. 22 From the evidence presented in the literature Cottle’s conclusions concerning the F scale appear rationally sound. However, as in the case of the L scale, no translucent probability statement is offered regarding the differentiating power of F in distinguishing between adequate and in- adequate performance (as outlined by Hathaway and McKinley on pages 23-24 above). 19H. F. Hunt, "The Effect of Deliberate Deception on MMPI Per- formance, " Journal of Consulting Psychology, 1948, 12:396-402. 2“)H. G. Gough, "The F Minus K Dis simulation Index for the MMPI, " Journal of Consulting Psychology, 1950, 14:408-413. 21A Sweetland, "Hypnotic Neurosis--Hypochondriasis and Depres- sion, " Journal of Genetic Psychology, 1948, 39: 19-105. 22W. C. Cottle, "The MMPI: A Review, .. Kansas Studies in Edu- cation, 1953, 326-9. 28 ' The K Scale Meehl and Hathaway conceptualized two approaches to the problem of identifying the attitude a subject takes toward the items in a personality inventory. First, the investigator may have the subject deliberately assume a generally defined attitude. , For example, faking might be directed toward obtaining either adverse or desirable-scores. - A "normal" set Of responses must be obtained relatively simultaneously with the faked responses if a reference point is to be determined. The faked and normal records can then be contrasted for study to discover the items which are'most frequently changed from the normal records as contrasted to the fake‘records. Secondly, the investigator may choose records in which there is presumptive likelihood that a special attitude has been assumed. - As in the first approach a "normal" set of reSponses must be Obtained simultaneously for comparison. 7‘3 . Using the direction to fake approach several scales were derived by Meehl and Hathaway. The good and bad fake scales were found to be composed of different sets of items. . Either of the procedures provided a scale that would be about as good for the othertype of faking as it was for the one from which it was derived when such scales were applied to test cases not used in the original derivations. Using two such scales separately did not materially increase the accuracy of prediction. - In the second line of experimental approach Meehl and Hathaway alsoderived several subdivisions. ~ Among presumably functional and normal records cases were often identified which were so abnormal as to indicate that the individual should have been hospitalized. The investi- gators attempted to discover items which would differentiate normal from clinically diagnosed abnormal persons. For the counterpart to this approach they also selected for item analysis hospitalized cases whose .— 23P. E. Meehl and S. R. Hathaway, "The K Factor as a Suppressor Variable in the MMPI, " Journal of Applied Psychology, 1946, 30:538—539. 29 records showed a normal profile. Using this approach Meehl and Hathaway experimentally derived four scales. Derivation of the L6 Scale. The most important finding of the investigators was that whichever of the'methods used, as was with the case with the faked approach above, the resultant scales were about equally effective. These scales were also fairly effective in differentiat- ing the fake group as well. After two years of this experimentation all of the promising scales were cross-validated on a new sample. A single best scale was derived which was originally called L6. 24 L6 was derived by an item analysis of the responses of 25 males and 25 females in a psychopathic hospital. These subjects' MMPI profiles showed an L scale ("lie" key) of T equal 60 or more. In addition, these individuals were predicted to obtain abnormal profiles because of the clinical diagnoses given to these cases by the psychiatric staff. » However, the scores on the MMPI profiles fell within the normal range. p Two restrictions were employed in the selection of the criterion group. All of the individuals were characterized by deviant behavior; however, they obtained relatively normal profiles and were termed "misses" for the MMPI. ~ In addition, all criterion cases were character- ized by having a tendency to obtain elevated scores on the L scale. The item responses of these 50 cases, analyzed separately for males and females, were compared to item frequencies from previous standardization groups. In all, 22 items were chosen as a result of this comparison on the basis of a 30 percent discrepancy between validation and standardization groups. Because the criterion group was assumed to desire good scores, the larger raw scores on these items were in the same direction as the larger raw scores on the L scale. The item content suggested an attitude of denying worries, inferiority feelings, and psychiatrically unhealthy “lbid.. pp. 539-540. 30 symptoms, together with a disposition to see only good in others as well as oneself. ‘ Cross-validation of L6. Following the final choice of L6 as the best of the scales available, Meehl and Hathaway subjected the validity key to more careful study. Hospital and normal records were examined to discover whether L6 would be helpful in interpreting individual profiles. Relatively few data were found on normal cases, but on hospital cases a fairly extensive symtomatic summary was available. By examining the profile for normalcy it was determined whether or not the L6 deviated in an upward direction. It was assumed that an upward direction indicated that the patient had attempted to place himself in a good light. As a result of this study L6 was judged effective but left much to be desired. 25 The L6 scale as a measure of defensive and plus-getting attitudes. To the investigators L6 appeared as adequate for the detection of plus- getting as was N (see section below) or any of the other experimental scales. Accordingly, the records of a new series of presumable normal persons showing deviant profiles was examined. The L6 scale again appeared to work at the plus-getting end of the test-attitude continuum. That is to say, a relatively low score on L6 could be used to under- interpret an otherwise deviant profile and thus avoid some of the presum— able false positives in the normal pOpulation sample. - Thus L6 seemed useful at both ends of the test-attitude continuum: defensiveness and plus - g etting . Refinement of the L6 scale. The'most outstanding difficulty in the above procedure was that L6 tended to be low on severe depressive or schizophrenic patient records. This led toan under-interpretation in spite of the fact that the patients were grossly abnormal. Tocorrect for the under-interpretation tendency, items were added that would work in the opposite direction. To choose these items Meehl and Hathaway 25Ihid., pp. 540—542.. 31 studied the item tabulations for the group of psychological trainees above who had attempted to fake good and bad scores. In the above study there were many items which showed no tendency to change with an alteration in the test-taking attitude. The percent of true or false remained con— stant whether the: attitude was the normal one or the faked one. From among these items, a sub-group was chosen which showed differences between. schizOphrenic and depressive criterion groups and general pOpu- lation normals. Meehl and Hathaway admit that the procedure rested upon the insecure assumption that any item that did not appear to be affected by the test-taking attitude (as approached by a normal person attempting to fake good or bad, but occurred as a frequent item to differen- tiate depressed or schiZOphrenic patients) would be useful in correcting the tendency of L6 to go too low for schizophrenic and depressed patients. Such an item was scored in a way that would make it work against the tendency of the L6 scale. Eight items were selected by this method. The effect of adding these eight items to the 22 on‘L6 was to elevate slightly the mean score of normals and to make it more nearly approach 26 the mean score of abnormal cases on the complex of all 30 items. Derivation of the K scale. As a final step in the refinement of the L6 scale the above eight items were combined with the 22 L6 items into a single scalewhich was designated K. The K scale represents the final outcome of many experiments in the field of measuring test attitude. ' Meehl and Hathaway state: The K scale is far from perfect for its purpose as measured by the various available data. Generally speaking it is about as good as any other single scale yet derived. In individual appli- cations it is inferior now to one scale and now to another but the differences are never great enough to be very significant practical- 1y and the small number of items in this scale gives it a distinct advantage over one or two of the longer scales such as N." 26lhid., p. 543. "mid... pp. 543-544. 32 Because the K scale was derived as a correction scale for improv- ing the discrimination yielded on the already existent MMPI scales, it was not assumed to be measuring anything which in itself was of psychiatric significance. Meehl and Hathaway considered that it was first necessary to choose criterion cases of the sort on which K could conceivably be of value. It was apparent to these investigators that such cases would be characterized by the presence of what was called borderline profiles, that is, individuals who possessed T-scores between 65-80. In studying hundreds of deviant profiles after the addition of K, almost no individuals were found with T-scores above 80 in the normal sample; and it was not statistically profitable to correct elevations of such magnitude to the point of calling them normal. On the other hand, when a curve showed no elevations at all above 65, even the presence of a high K score did not enable the examiner to form any adequate notion of what the peak would be had the K factor not been Operating to distort the results. There were apparently upper and lower‘limits beyond which deviations on“ K could not effectively Operate. Profiles showing subtest scores above 80 were interpreted as probably abnormal no matter how low K fell. If a profile showed no subtest scores above 65 it was unknown whether a high K meant the profile should be adjusted toward more severe scores or was merely that of an actually normal person who for some reason took a defensive attitude when being tested. Validation of the K scale. Meehl and Hathaway judged that the kind of curve which gave interpretative difficulty, and which could be improved by knowledge of the influence of K, would be a curve in the doubtful, borderline region. Accordingly, a group of cases from the normal and hospital groups was chosen on the basis of having achieved such border- line curves. For this study all cases in the files showing at least one personality component elevated as high as T equal 65, but with no component elevated to T greater than'80.' . Among the normals, 33 there were 71 males and 103 females having such borderline curves. Corresponding to these cases, 129 abnormal males and 208 abnormal females were located with similar borderline profiles. The data for the two sexes were treated separately. The analysis of these data was in terms of the ability of the K scale, used mechanically, to separate the curves of the actual normals from those of the actual abnormals. The procedure was to arrange the whole set (normals and abnormals combined) for each sex in order of the magnitude of their K scores. The distribution of K was cut on the basis of the prOportion of normals and abnormals in the sample, calling all cases above the cut abnormal and all those below normal. Setting up a fourfold table on this basis, a chi-square of 20.436 for the males and 29. 540 for the females was obtained. . Both of these were highly significant (P equal to less than . 0010) with one degree of freedom. If instead of locating an optimal cutting score, the K distribution was cut at the mean of the general population K distribution (T equal 50), the cutting point of the males was unchanged. However, the cutting point for the females shifted enough to lower their chi-square to 17. 750, which was still highly significant. If one considered miscellaneous profiles which lie in the borderline range between 65 and 80, regardless of the kind of elevation and irrespective of the clinical diagnosis of those who were clinically abnormal, one could separate them into actual normals and abnormals significantly better than chance by using a cutting score on K. In this instance Meehl and Hathaway emphasized that K was Operating chiefly as a suppressor of certain test-taking tendencies, for K by itself did not differentiate unselected normal and abnormal cases. In terms of percent- ages, it was found that for the males, 72 percent of the abnormals and 61 percent of the actual normals were correctly identified. For the females, 66 percent of the abnormals and 59 percent of the normals were correctly classified. These percentages were based upon the separations 34 with K equal 50, taking no account of the actual normal-abnormal prOpor- tions among the above cases. ‘8 Refinement of the K scale. Evidence from examination of the test "misses" disclosed by K in the data, combined with knowledge of the correlation between K and other MMPI scales, indicated that the K cor- rection was more important in some scales than in others. Therefore, it was decided to analyze the borderline groups in terms of the peak elevation of their profiles in the attempt to identify those particular curves on which K could be used with profit. The entire group of 511 borderline curves (males and females, normals and abnormals pooled) was divided into eight sub-groups, each sub-group composed of cases having the peak score on the same one of the eight MMPI personality components. The normals and abnormals having borderline curves with the same peak score were then separated mechanically by the use of a cutting score on K. - The proportion of cases above the cutting score was determined on the basis of the proportion of actual abnormals versus normals in each sub-group. Meehl and Hathaway state that it was unavoidable in the analysis because the relative proportions of actual normals and abnormals varied widely from scale to scale and the use of the mean of K would have been grossly misleading. For the eight groups studied in this manner, only three showed a significant chi-Square (P less than . 01). One clinical scale group yielded a chi- square between the 10 percent and 20 percent level of significance. It seemed, therefore, that the K factor could be used with profit in interpreting some kinds of profiles but not others. The failure to discriminate with K when grouping profiles by peak score did not establish that a K correction might not be profitably added to the single scores themselves. 7'9 “Ibid., p. 545. 29lbid., p. 546. 35 Cross-validation of the K .scale. One other validating study was done of K in the original investigation. A group of 22 normals and 22 abnormals who were employed in a previous study was used.30 The normals consisted of a random selection from a large group of profiles showing any elevation of 70 or over. The abnormals consisted of a heterogeneous group also having at least one subtest score over 70. All cases were chosen randomly from hOSpital cases. Because these groups had been selected for a different investigation, they had not entered into the derivation of K in any way. Without regard for any other infor- mation concerning the profiles, all cases showing K greater than 50 were arbitrarily guessed as abnormals, whereas those with K less than 50 were called normals. The cutting score was also independent of the statistics of the original group. To Meehl and Hathaway the K scale worked phenomenally well here. Of the entire group of 44 cases, 37 were correctly classified when using K in this way, a total of 85 percent "hits. " Here it was the purpose to separate normals and abnormals all of whom possessed deviant profiles. The investigators concluded that this percent was quite impressive considering the task set for K. Of the seven errors in classifying, six were false positives (cases of normals showing elevated profiles and K greater than 50 and termed abnormal). The chi- square for the fourfold table of these data was 21. 569 which with one degree of freedom was highly significant (P less than . 0001). Meehl and Hathaway conclude: Here we have striking evidence of the validity of K when used to differentiate between deviant curves of actual normals and abnormals. We are not prepared to explain the superiority of this result to that originally encountered, except to say that the range of abnormal scores in the present analysis was from 70-90 whereas in the original analysis borderline scores were defined as lying between 65 and 80. In what way this could make K appear to function more effectively in the one case than the FmP. E. Meehl, " An Investigation of General Normality Control ' Factor in Personality Testing, " Psychological Monographs, 1945, 59: Number 4. 36 other is not clear. Also the present study involved only males where K in general seems to work a little better than on females. 3‘ Discrepancy in the efficacy of K. The fact that K was less effective when applied to some scales than others would suggest separate interpretations or cutting scores. Furthermore, the classification into normal and abnormal on the basis of a single arbitrary cutting score obviously sacrifices some quantitative information about the actual magnitude of the K score. Meehl and Hathaway, however, did not tend to prOpose such a cutting method as the most efficient manner of appli- cation for K. They simply used that form to indicate that K possessed differentiating power for what it was hoped to differentiate. With the exception of Hy and D clinical scales the correlations of K with the other MMPI variables were consistently negative. This was to be expected if K represented the defensive, lying, or self-deceptive test-taking attitude it was derived to measure. It might be thought that such low correlations as occur in Table I below would preclude any possibility of the use of K as a suppressor. However, there is a tendency for the scales on which K seems valid by the chi-square test to Show higher correlations. But for the use of which K was put, correlations as low as . 20 were utilized to yield very significant and useful improve- ments in discrimination. 37‘ Considering the relative unreliability of some of the MMPI vari- ables, the intercorrelations of the K scale with other variables con- sidered loaded with the K factor, are rather impressive: the G scale and plus scale were derived wholly by internal item relationships and without regard to criteria of any non-test behavior; the N scale corrected for self-criticality of certain plus-getters who showed deviant profiles; the Ch scale differentiated hypochondriacs from non-hypochondriacal 3”P. E. Meehl and S. R. Hathaway, "The K Factor as a Suppressor Variable in the MMPI, " Journal of Applied Psychology, 1946, 30: 546-547. ”mm. . pp. 548-549. 37 Table I. Intercorrelations of K with Other MMPI Variables H s D Hy Pd Pa Pt Sc Ma .17 -.07 -.67 -.59 -.36 Normalfemales -.35 -.03 .30 -.06 -.02 -.64 -.58 -.28 .19 .60 -.6O -.37 .13 .63 -.58 -.38 Normal males - . 30 . 15 . 48 Abnormal males -. 42 -. 29 . ll -. 26 Abnormalfemales -.17 -.16 .17 -.21 abnormals who had elevated H scores; and a sub-set of items (Hy-O) which were chosen because they differentiated a clinical group--hysteria. There was, however, a considerable item overlap among these scales, which tended to raise the correlations. On the other hand, Meehl and Hathaway point out that the K scale was not actually pure for the hypo- thetical test-taking attitude because if was a composite of the test-taking scale L6 plus eight psychotic items. This would presumably tend to lower the correlations. Accordingly, Meehl and Hathaway substituted L6 for K, removed the item overlap among the scales G, N, Ch, L6 and Table II . Correlations of K Scale with Other Variables Thought to Be Loaded with the K-Factor "+" G N Ch Hy-O Normal males -.64 -.76 -.70 -.67 .81 Normal females -.62 -.73 -.64 -.63 .78 Abnormal males -.70 -.75 -.69 -.64 .74 Abnormal females -.70 -.81 -.72 -.71 .74 m 38 Hy-O and calculated correlations among these reduced keys. Table III shows the intercorrelations among these five non-overlapping keys, based upon the responses of 150 unselected normal males between the ages of 26-45, rejecting records of F greater than 80. All scales were scored so as to render the correlations positive."3 Table III. Intercorrelations of Five Scales Thought to Be Loaded with The Test-Taking Attitude, No Item Overlap, n = 150 Normal Males G Ch L6 N CH .82 L6 .76 .71 N .78 .73 .66 Hy-O .70 .63 .70 .59 This correlation matrix was subjected to a factor analysis and repeated three times in successively approximating the communalities because of the small number of tests. It appeared that one common factor was quite sufficient to account for the intercorrelations of these scales. The factor loadings of the scales G, Ch, L6, N and Hy-O were . 927, .868, .847, .818 and .770, respectively.“ It is claimed by Hathaway that the correction of the five clinical scales of the MMPI by use Of K increases the proportion Of clinically diagnosed cases scoring above the 90th percentile of normals.35 Jeffery undertook an investigation of the factors which influence responses to K. 33lhid., pp. 551-552. “Ibid.. pp. 552-553. 358. R. Hathaway, Supplementary Manual for the MMPI (New York: Psychological Corporation, 1946). 39 She found that the differences characterizing high and low K were con- sistent for both the test and an interview situation. It was concluded that this observed set was persistent and deep-rooted. She states that the set evidences itself in items containing temporal adverbs. People who answered "true" tended to select categories containing the term "frequent" and the opposite appeared for those answering "false. " Jeffery's data support the assumption that K measures two extremes of a deeply-rooted attitude causing distortion of personality items, but offered no explana- tion of the dynamics involved. 36 McKinley, Hathaway and Meehl describe the statistical derivation of K weights as an optimal value which operates as a differential ratio in distinguishing a criterion group of abnormals diagnosed for each scale from a sample of unselected normals. The investigators emphasize, however, that this differentiation is only for the general population and state "it seems likely that for the best separation of maladjusted normals such as those which abound in a college counseling bureau and would be formally diagnosed in a psychiatric clinic as simple adult maladjustment, other weights might be better. "37 A study of the K scale by Schmidt using 98 Army convalescents who were diagnosed normal found that the L scale was as good an indicator of falsifying as K. He reports that the basic shape of the profile remained the same, but its height decreases with falsification. He concluded that the K factor contributes little, if anything, to differential diagnosis.38 Hunt and others tried to check the discriminatory power of the K scale using psychotic and non-psychotic adult male psychiatric patients. 36M. E. Jeffery, "Some Factors Influencing Answers on the Multi- phasic K Scale, " (unpublished Doctor's dissertation, The University of Minnesota, 1946). 37J. C. McKinley, S. R. Hathaway and P. E. Meehl, "The MMPI: VI. The K Scale, " Journal of Consulting Psychology, 1948, 12: 20-31. 38H. O. Schmidt, "Notes on the MMPI: The K Factor, " Journal of Consulting Psychology, 1948, 12:337-342. 40 They state that the K scale did not improve diagnosis and failed to reduce "false negatives. "39 In another study using 109 psychology students and 74 Navy general court martial prisoners Hunt found that the K correction failed to reduce deception. However, Hunt did find that an index of F minus K correctly identified malingerers, but not faked normal profiles.40 Cofer and others found K and L scores significantly higher for faked normal profiles and suggested the use of an additive combination of L and K to identify these.41 After reviewing MMPI literature Cottle concludes that K does not appear useful by itself to increase the discrimination of the clinical scales; however, the K scale in combination with L or F is useful to detect deliberate faking on the MMPI. In conclusion, Cottle states that there is evidence that the K scale reflects a persistent, deep-rooted attitude of distorting personality items and may be useful as a clinical scale to identify the defensive or the overly self-critical subject.42 Because of the contradictory evidence presented in the literature concerning the K scale and its function, no definite conclusion can be reached regarding its efficacy. Meehl and Hathaway's investigation of the K scale presented striking evidence of the validity of K to differentiate between deviant curves of actual normals and abnormals. However, in replication by other investigators the usefulness of K is not completely supported. All that can be said presently is that perhaps K or a fraction of K coupled with other validity key components can be useful in identify- ing certain test-taking attitudes. 39H- F. Hunt, it al. , "A Study of the Differential Diagnostic Efficiency of the MMPI, " Journal of Consulting Psychology, 1948, 12: 331-336. 40H. F. Hunt, "The Effect of Deliberate Deception on MMPI Performance, " Journal of Consulting Psychology, 1948, 12:396-402. 41C. N. Cofer, J. Chance and A. J. Judson, "A Study of Malinger- ing on the MMPI, " Journal of Psychology, 1949, 27:491-499. 42W. C. Cottle, "The MMPI: A Review, " Kansas Studies in Education, 1953, 3:9. 41 Subtle- Obvious Keys (8- O) While the K scale was being considered and developed Wiener was developing the concept of relatively subtle and obvious keys for the MMPI scales on the same ground as those given by Meehl and Hathaway above}:3 It was considered that the development of such keys on individual scales of the MMPI would yield more information and be of more practical usefulness than an overall validity scale, such as L, G and finally K. Wiener considered that extremely deviate individuals could be picked out by a test consisting of obvious items. However, to help the counselor working with a normal population, it was thought that a much more subtle test was required which would both distinguish the extreme deviates and differentiate among the characteristics of a normal popu- lation. To Wiener these two services of a personality test appeared to be best served by developing both S and O scales.44 In the development of the S and O keys the MMPI was used because 1) it was available for a relatively large normal population, 2) it was felt to have the most extensive and useful validation of any personality test, 3) it used generally accepted categories of personality character- istics, and 4) it had the unique feature of validity measures designed to indicate test-taking attitudes. It was felt, however, that when the MMPI was used with a relatively normal population, one that was functioning relatively successfully in society, certain important aspects of personality were masked because of its validation in terms of abnormal groups. The most obvious items distinguished the abnormal groups from the normal, whereas it appeared that the more subtle items should have had the greater validity in distinguishing the personality characteristics of normal groups .45 43D. N. Wiener, "Subtle and Obvious Keys for the MMPI, " Journal of Consulting Psychology, 1948, 12:164-170. “lbid., p. 164. 45lipid. 42 In deveIOping the S and 0 keys all items of the MMPI were divided into two groups, those to which significant responses were relatively easy to detect as indicating emotional disturbance, and those to which they were relatively difficult to detect. The items for each scale were sorted into these two categories. All F scale items that also appeared in other scales were automatically assigned to the obvious category because by definition they seldom occur in a normal pOpulation. In addition, those items for which a blank response (no check on the answer Sheet) was scored in a significant direction, were assigned to the subtle keys. Pooled judgments of raters was then used to sort the other items into the two categories with no attempt made to equalize the number of items in each group. There were more obvious items than subtle. The keys thus developed were used to rescore the test sheets of a representative sampling of 100 cases of the original male norm group for the MMPI. T-scores were developed and assigned to subtle and obvious item counts on the same basis as for the total scale T-scores. Subtlety-obviousness was determined rationally and not empirically; hence S-O scales were not formed for all MMPI subtests because Wiener felt that they contained too few S items.“'6 Tabulation tables for the raw scores on the subtle and Obvious keys indicated a positive skew for most of the 0 item distributions of the norm group; relatively few individuals in the normal population answered the obvious items in a significant direction. It seemed probable to Wiener that significant answers to these 0 items were most character- istic of an institutionalized population. The S items were distributed in a relatively normal manner. An additional check on the validity of the selection of items for the keys was the frequency of their occurrence among the responses of a normal population. For all five scales the S items were answered in 461bid., pp. 165-166. 43 a significant direction approximately twice or more as frequently as the O items. The bases used to select items for the S and O keys for five scales of the MMPI yielded O items which were answered relatively infrequently in a significant direction by a normal population compared with S items, and S items whose significant answers were in a reverse direction from the expectation of both the original authors of the MMPI.“ Recent research with the subtle and obvious scales for the MMPI 48-51 While it was not raises some important interpretive problems. puzzling to find that hospitalized psychiatric patients and other mal- adjusted and unsuccessful groups obtained high T-scores on the obvious scales, it was disconcerting to Fricke to find that groups of recovered psychiatric patients, successful trainees, successful salesmen and college SOphomores obtained higher T-scores on the S scales than the normal MMPI population. In addition, these groups Obtained higher overall T-scores than groups of unrecovered psychiatric patients, unsuccessful trainees, etc.52 Because each of the items in the subtle scales was originally selected by Hathaway and McKinley due to its discrimination between normal and abnormal groups, it appears that there is something common either to the items or to the groups which influences the size of the T-scores more than does the original dis- criminating power of the items. 47Ibid., pp. 166-168. 48E. Rosen, "Self Appraisal, Personal Desirability, and Perceived Social Desirability of Personality Traits, " Journal of Abnormal and Social Psychology, 1956, 52:151-158. 49E. Rosen, " Self Appraisal and Perceived Desirability of MMPI Personality Traits, " Journal of Counseling Psychology, 1956, 3:44-51. 50D. N. Wiener, "Selecting Salesmen with Subtle-Obvious Keys for the MMPI, " American Psychologist, 1948, 3:364. 51D. N. Wiener, "A Control Factor in Social Adjustment, " Journal of Abnormal and Social Psychology, 1951, 46:3-8. 5‘?‘B. G. Fricke, "Subtle and Obvious Test Items and ReSponse Set, " Journal of Consulting Psychology, 1957, 21:250-252. 44 Fricke suggests that it is quite possible that groups obtaining high T-scores on the S scales have a reSponse set to answer false. Fricke points out that Wiener's division of the MMPI items for five clinical scales into S and 0 items tends to make false the scored response for the S items. Four of the five scales have a majority of items scored false. Only the hypomania scale has more true than false responses and it is of interest to note that it is this scale which does not separate the groups studied by Wiener.53 On the basis of some data and theoretical considerations, Wiener suggests that the S items are best for assessing the personality of normal persons and that the O items are best for abnormal persons. 54 Fricke states: If it is true that S items are more likely to function in a normal population and 0 items in an abnormal population, then according to the present contention that K operates as a measure of response set, a difference in correlations should be reflected in normal and abnormal groups. Specifically it would be ex- pected that the correlations of K with the five clinical scales utilized would be more positive or less negative in a normal than in an abnormal population. Correlations reveal that four of the five expectations are fulfilled; only the correlation between K and hypomania are not substantially different. A tentative hypothesis drawn from the data is that the more positive the correlation between scores from a measure of response set to answer false and scores from clinical scales composed of 8-0 items, the more likely it is that the group is well adjusted or successful.55 Set T Scale According to Cronbach who has reviewed several response sets that influence a test taker's behavior, the variance generated by a 53D. N. Wiener, ”Subtle and Obvious Keys for the MMPI, " Journal of Consulting Psychology, 1948, 12:168-169. 54lhid., p. 170. 55B. G. Fricke, "Subtle and Obvious Test Items and ReSponse Set, " Journal of Consulting Psychology, 1957, 21:250-252. 45 response set is regarded undesirable because it contributes only error variance and cannot be used to increase the usefulness of a test. 56’ 57 However, Fricke assumes a different rationale which stresses that response set can be used to improve the validity of many tests. 58 In surveying the responses scored in most personality scales Fricke discovered that the significant responses are predominantly in one direction. For example, 96 out of the 100 items in the Cornell Index are keyed yes and 47 of 60 items in the hysteria scale of the MMPI are keyed false. A review of personality tests validated to predict academic achievement disclosed that for the more valid tests the response predictive of high achievement is usually false, no, or disagree. Some other tests in which a majority of scored responses fall into a particular response category are the California Psychological Inventory, the Humm- Wadsworth Temperament Analysis, the Minnesota Teacher Attitude Inventory, and the Strong Vocational Interest Blank.59 Fricke contends that when the number of score responses is unequally divided between all possible response categories the effect of a response set may be substantial. If response setperse is not directly related to the criterion, it operates to introduce error in the criterion predictor. Fricke states that this is usually the case, but if response set is related to another variable that is criterion-related, 56L. J. Cronbach, "Response Sets and Test Validity, " Educational and Psychological Measurement, 1946, 6:475-494. 57L. J. Cronbach, "Further Evidence on ReSponse Sets and Tests Designs, " Educational and Psychological Measurement, 1950, 10:3-31. 58B. G. Fricke, "Subtle and Obvious Test Items and ReSponse Set, " Journal of Consulting Psychology, 1957, 21:250-252. 59B. G. Fricke, "The Development of an Empirically Validated Personality Test Employing Configural Analysis for the Prediction of Academic Achievement, " (unpublished Doctor‘s dissertation, University of Minnesota, 1954). 46 then response set may be used as a suppressor variable.60 By suppress- ing or removing the influence of response set from the criterion- predictor an improvement in test validity can be effected.£’l“64 Fricke originally held the view that the imbalance of true and false items was simply a function of how the statements were written. But this seemed unlikely since on some tests having scales for several personality dimensions, the scored direction is different for different scales. It was concluded that the nature of the criterion groups and not the items was responsible for the imbalance. As a result Fricke set out to determine whether or not a measure of response set could be used to increase test validity. One index or measure of set for answering true was obtained by counting the times a person marked true to statements in a test. This method was utilized by Humm and Wadsworth to get a measure of suggestibility or COOperativeness. They counted the times a test taker answered no to questions in the Temperament Schedule. Answer sheets having a large or small "no count" were not considered sufficiently valid for further analysis. According to their suggested cut-off points, 30-60 percent of the answer sheets were rejected as invalid.65 Fricke considered that a much more sensitive index of reSponse set would be a count of the true responses for those statements which 6°Ibid. 61Paul Horst, "The Prediction of Personal Adjustment, " Social Science Research Council Bulletin, 1941, Number 48. 62A. S. Levine, "A Technique for Developing Suppression Tests, " Educational and Psychological Measurement, 1952, 12:313-315. 63Quinn McNemar, "The Mode of Operation of Suppressant Vari- ables," American Journal of Psychology, 1945, 58:554-555. 64R. J. Wherry, "Test Selection and Suppressor Variables, " Psychometrika, 1946, 11:239-247. 65D. G. Humm and G. w. Wadsworth, "The Humm-Wadsworth Temperament Schedule, " American Journal of Psychiatry, 1935, 92: 163-200. 47 49-51 percent of the test-takers marked true. For such statements there was no general agreement on an answer; they held maximum controversiality. A person without a response set would reSpond true as often as false on these items. A person with a high score could be thought to have a strong set for marking true. A low score would indicate the opposite. Fricke emphasized that items for a set scale would be selected independently of any criterion external to the test. Because it was difficult to obtain enough items to form a scale at the 49-51 percent level of controversiality, it was decided to accept for the true response set scale (Set T) those statements in the Opinion, Attitude and Interest Survey which 40-60 percent of each criterion group marked true.66 Hence, the Set T scale of the OAIS consisted of 69 statements scored in the true direction. Upon completion of his investigation, Fricke concluded that a measure of response set could be used to increase greatly the accuracy of criterion prediction. 67 The B Scale The B scale of the MMPI is structurally and functionally similar to the K scale but differs markedly from K in the method used for its construction. B, a measure of response bias, was modeled after the Set T scale of the OAIS.68 The B scale and the Set T scale made it possible to measure a test-taker's tendency to answer true to statements in a personality inventory. Individuals with a strong bias to answer true 66B. G. Fricke, The Opinion, Attitude and Interest Survey (Minneapolis: Investors Diversified Services, 1955). 67B. G. Fricke, "Response Set as A Suppressor Variable in the OAIS and MMPI, " Journal of Consulting Psychology, 1956, 20:161-169. 68B. G. Fricke, The Opinion, Attitude and Interest Survey (Minneapolis: Investors Diversified Services, 1955). 48 obtain low hysteria scores on the MMPI due to the fact that 78 percent of the Hy items are scored false.69 Fricke felt that if a fraction of a Set T type of scale was added to Hy or if a fraction of K was subtracted from Hy, the validity of Hy would be improved. The assumption was that by partialing out, or suppressing the influence of response bias, the purity of the clinical scale would be increased.70 To select items which would reflect response bias on the MMPI Fricke examined the item responses of normal persons. His objective was to obtain a pool of items of high controversiality, that is, items to which about equal numbers answer true and false. Items drawing true answers from 40-60 percent were considered sufficiently sensitive to be useful. Since some normals did not answer true or false to every item, half the "cannot say" items were added to the true answers to establish whether or not each item met the arbitrary 40-60 percent level of controversiality. Fricke considered that a test-taker with a strong bias to answer true would be expected to achieve a high score when all the controversial items were scored. Two normal samples were involved. The first sample of 604 cases was used by Hathaway and McKinley in the construction of the clinical scales; it consisted of a sub-group of 339 Minnesota normals and a sub-group of 265 college normals. The percentage of these two sub-groups who answered true to each item was averaged. The second sample of 589 cases was used as the norm group for the more recently constructed non-clinical scales; it consisted of a sub-group of 253 normal males and a sub-group of 336 normal females. The percentages for these two sub-groups also were averaged. A total of 81 items were located which were answered true by 40-60 percent of both normal 69B. G. Fricke, "Conversion Hysterics and the MMPI, " Journal of Clinical Psychology, 1956, 12:322-326. 70B. G. Fricke, "A Response Bias Scale for the MMPI, ” Journal of Counseling Psychology, 1957, 4: 149-153. 49 samples. Of the 81 items of high controversiality 18 were found to be in the 30-item K scale. The response bias scale B consists of the 63 non-K items. Fricke states that high T-scores on B indicate a tendency to answer false. Because of the complete lack of evidence in the literature concern- ing Fricke's interpretation of the B scale, no definite conclusion can be reached regarding the efficacy of the B scale. Miscellaneous Scale 5 The G scale. About three years before research on the test-taking attitude was begun, Hathaway and Estes, using a variant of the method of internal consistency, developed a scale called G. This scale was the only MMPI scale which was derived without the use of a criterion external to the test; the selection and scoring of item was based wholly upon the intercorrelations among the items themselves. Essentially, the procedure consisted in locating among a group of 101 unselected normals those individuals who, when their answer sheets were used as scoring keys, produced the maximum variance of the other 100 scores. The assump- tion was that these persons were the most extreme deviates on whatever factor or factors contributed most heavily to the variance and covariance of the total pool of MMPI items. From the evidence adduced by Mosier, it is of course clear that the purity of factorial unity of this hypothetical underlying continuum is by no means guaranteed by such a procedure.71 Meehl and Hathaway state that this maximizes the variance of a set of items by scoring them in such a direction as to maximize their mean covariance (since the item variances are unaffected by the direction of scoring). 77‘ Instead of actually calculating the variances for the 2550 ways 71C. I. Mosier, "A Note on Item Analysis and the Criterion of Internal Consistency, " Psychometrika, 1936, 1:275-282. 72P. E. Meehl and S. R. Hathaway, "The K Factor as a Suppressor Variable in the MMPI, " Journal of Applied Psychology, 1946, 30:549-550. 50 of scoring the test, the investigators selected individuals who approxi- mated the optimal scoring key. It was found that the scoring keys for some ten individuals selected by this method tended to form two distinct clusters, each of which consisted of keys (individuals) showing high correlations with one another and high negative correlations with the members of the other cluster. An item analysis was then carried out on these two small groups, and the items resulting were combined into a scale called G (general factor). The G scale, although derived without recourse to any clinical group whatever, nevertheless showed a correlation of . 91 with clinical scale Pt. The mean MMPI curves for unselected normals with high G (the neurotic end) showed elevations on seven MMPI clinical scales and on F; whereas L (raw score) and Hy tended to fall below the mean. The mean profile for normals with low G was almost an exact mirror image of this curve. However, G was not found to be effective in the detection of any clinical group or to be particularly useful for any purpose; and since at that time no theoretical basis was available for interpreting it, the scale was abandoned. Another scale, called plus was derived in a Similar but not identical manner.73 The N scale. When Meehl and Hathaway first discovered that certain clearly abnormal persons obtained normal MMPI profiles, and on the other hand, that certain normals obtained elevated profiles with no evidence of deliberate falsification, they began an investigation which initially led to the development of the N scale. The key discriminated individuals who were overly critical in reporting themselves (plus-getters) from actually abnormal subjects with similar MMPI profiles. This scale, however, did not prove to be useful in detecting impunitive or defensive sortings. For this reason, further study was carried out which resulted ultimately in the development of K. 74 73Ibid., p. 550. 74S. R. Hathaway and J. C. McKinley, "A Multiphasic Personality Schedule: 1. Construction of the Schedule, " Journal of Psychology, 1940, 10: 249-254. 51 The Ch scale. In the derivation of the original hypochondriasis key, there was developed a correction scale called Ch, the function of which was to separate actual clinical hypochondriacs from a group of non-hypochondriacal abnormals who attained spuriously elevated scores on H. The item content of this Ch key was quite puzzling to Hathaway and McKinley because, although the correction was successful, the items did not appear to refer to anything either hypochondriacal or anti-hypochondriacal. They possessed no apparent psychological homogeneity. The majority of the items on Ch were scored if answered in the statistically rare and obviously maladjusted direction. They apparently measured some non-somatic component of test responses which resulted in spuriously elevated H scores in persons who were not actually hypochondriacal. 75 The Hy-O scale. The Hy-O items reflected the N component, although they were scored in the opposite direction from N, as well as from Ch and G. Hy—O consist of a sub-set of items which were chosen because they differentiated empirically the hysteria clinical group from normals. It was concluded that these items reflected the self-deceptive and impunitive attitude of the hysterical temperament. The Cd and the Hy-S keys. Other minor attempts of validity key construction for the MMPI consists of the Cd and Hy-S scales. The former was developed to differentiate systematic depressives from normals; the latter items constituted an attempt to distinguish hysteria, hypomania and psychopathic deviates from normals.76’ 77 75F. E. Meehl and S. R. Hathaway, "The K Factor as a Suppressor Variable in the MMPI, " Journal of Applied Psychology, 1946, 30:550-551. 76S. R. Hathaway and J. C. McKinley, "A Multiphasic Personality Schedule: III. The Measurement of Symptomatic Depression, " Journal of Psychology, 1942, 14:73-84. 77J. C. McKinley and S. R. Hathaway, "A Multiphasic Personality Schedule: V. Hyst eria, Hypomania and PsychOpathic Deviate, " Journal of Applied Psychology, 1944, 28: 153-174. 52 Consistency keys. Consistency or verification of response checks are no longer infrequent keys in various objective personality and interest inventories. The instruments devised by Kuder, Strong and Edwards typify contemporary attempts to uncover falsification by using consistency of response to determine whether the subject understood test directions, possessed sufficient reading skills to comprehend the test items, or answered carelessly or insincerely.78"80 Similarity Between Various Scales B and K scales. Although the methods used in construction of the B and K scales were vastly different, structurally and functionally B and K are quite similar. Both consist of items of high controversiality and both have their items scored in one direction. 81 The functional similarity of B and K is revealed by the correlation coefficients shown in Table IV. The correlation between B and K for 336 normal females is minus .68 and the correlation for 253 normal males is minus . 67. The correlation for a sample of 63 conversion hysterics was found to be minus . 73.8‘2 Fricke concluded from the data in Table IV that B and K were the two most similar validity indicators. It is of interest to note that the correlations between K and L are higher than the correlations 78G. F. Kuder, Kuder Preference Record: Vocational (Chicago: Science Research Associates, 1956), p. 3. 7‘’G. F. Kuder, Kuder Preference Record: Occupational (Chicago: Science Research Associates, 1959), p. 4. 80A. L. Edwards, Edwards Personal Preference Schedule (New York: The Psychological Corporation, 1959), p. 15. 81B. G. Fricke, "A Response Bias Scale for the MMPI, " Journal of Counseling Psychology, 1957, 4: 150. 821bid., p. 151. 53 between B and L and that the correlations between B and F are higher than the correlations between K and F. From this Fricke interpreted that the L component was stronger in K than in B, and the F component was stronger in B than in K. Table IV. Intercorrelations of Four MMPI Validity Scale Indicators and Hy for Normals Female Male L F K B Hy L - 28 38 - 30 13 F - . 01 - . 36 . 40 .19 K 36 - 35 -.68 29 B - 28 43 -.67 - 23 Hy .19 . 04 . 39 -. 37 The correlations between Hy and B, and between Hy and K suggest that B and K could be used as suppressor variables. Fricke suggests that the Hy scale could be improved by adding to it a certain fraction of B or by subtracting from it a certain fraction of K. However, what values should be used were not discussed.83 B and K as discriminators. A function more influential than the suppressor action is evidently Operating in B and K. Fricke states that B and K are both unsuccessful suppressors because they both discriminate conversion hysterics from normals; a low B and a high K are indicative of conversion hysteria. Consequently, this investigator concludes that the subtraction of a fraction of B from Hy and the addition of a fraction 83B. G. Fricke, "Subtle and Obvious Test Items and Response Set, " Journal of Consulting Psychology, 1957, 21:250-252. 54 of K to Hy would improve the validity of Hy if B and K tap something diagnostic of conversion hysteria that is not tapped by Hy. 84 It is important to note that B and K each appear to function simul- taneously as scale suppressor and criterion discriminator. This is unfortunate since the suppressor and discriminator effects are in Opposite directions and tend to cancel each other. The influence of B and K as suppressors is weaker than their influence as discriminators and this results finally in the subtraction of B and the addition of K. 85 Because the level of K scores, and probably B, is affected by the social-educational-economic level of the test-takers, it was Fricke's speculation that for the lower levels the discriminator role of K (and B) would be much more important than the suppressor role, but that for the higher levels (college students) the discriminator role of K and B would be much less important than the suppressor role. A possible explanation for McKinley, Hathaway, and Meehl's finding that the addition of a fraction of K to Hy did not improve Hy is to be found here. If their conversion hysterics and normal control cases had the same mean K scores, then the addition of a fraction of K to Hy would not improve its validity; subtraction of a fraction of K probably would have capitalized on K's suppressor role and improved the validity of Hy. 86 Discrepancies in B and K scores may be of some importance (that is, it is possible that a test-taker with a T score of 70 on B and a T score of 60 on K is less defensive than a test-taker with a T score of 50 on B and a T score of 60 on K; it might be argued that the first test-taker's K score was obtained largely due to his response set to answer false but that the second test-taker's K score was Obtained through the operation of something other than response set). 84B. G. Fricke, "A Response Bias Scale for the MMPI, " Journal of Counseling Psychology, 1957, 4: 152. 85lhid. 86lbid. 55 The assumption is that K is more than a measure of response set. If the assumption is not valid, there probably is no need for the two test scores. 87 While both scales appear to reflect to a certain extent a test- taker's nontest behavior, their primary role is to reveal something about a test-taker's test taking behavior so that more accurate inferences can be drawn from the diagnostic clinical scales. Functional similarity of K and Set T. The functional similarity of K and Set T scales is seen in the fact that two-thirds of all K items fall in the controversial range. Further evidence on the similarity of ' Set T and K is found in their correlation with each other: minus . 58 to minus . 71. To Fricke, it appeared that Set T and K accomplished essentially the same thing. Whether Set T was superior to K was not determinable from the available data. While K may not be as pure a measure of response set, it may do something in addition to what is done by Set T.88 Fricke concludes: A measure of response set such as K or Set T may function as a suppressor when there is no true-false imbalance, and may not function as a suppressor when there is a moderate true-false imbalance. Too frequently scores from the K scale have been correlated with scores from other tests, and judgments have been made as to what K was "really” measuring. I don't know what K measures, no one else does either. It seems unlikely that response set keys would be able to bring validity to tests which are not validated empirically even though response set may be a major component in a test taker's score. 89 The marked structural and functional similarity of Set T of the OAIS and K of the MMPI was drawn upon to challenge the traditional interpretation of the K scale. Evidence was assembled which indicates that some of the MMPI scales are not optimally K corrected. 87lbid. 88lipid. 89Ihid. 56 S and 0 relationships. Relationships exist between the S and O, and the L scale, intelligence, psychological sophistication and neuro- psychiatric diagnosis. The group with high scores on the L scale of the MMPI was higher on the S keys of all five scales than on the 0 keys, and was also higher on the S keys than was the low L scale group. For the group with low L scores, the 0 scores were for all scales approxi- mately equal to or higher than the S scores. Individuals of high ability (intelligence T score above 60 on the Otis);have approximately equal O and S scores, whereas individuals of low ability (T score below 40) have generally higher O scores than S, and higher 0 scores than the high ability group.90 MMPI profiles of a psychologically sophisticated group showed distinction between S and 0 keys, with S much higher than 0 whether the group was giving honest results or was attempting to fake good. With this sophisticated group it appeared to make little difference whether the test was taken honestly or faked good; in either case, 0 items were successfully avoided whereas S items yielded average and above average T scores.91 In general, 0 keys are highly correlated with each other and have no correlation with S keys; S keys have a low positive correlation with each other. There is a high negative correlation between O minus S and K, indicating the considerable weighting of K with S items. There is evidence that high L scale scores are associated with higher S than 0 scores, whereas the converse is true for low L scores; psychologically SOphisticated individuals almost completely avoid sig- nificant O reSponses and have much higher S scores. While total scale scores on the MMPI failed to differentiate significantly between success- ful and unsuccessful students and on the job trainees, 0 keys were 90D. N. Wiener, "Subtle and Obvious Keys for the MMPI, " Journal of Consulting Psychology, 1948, 12:169. 91lhid., p. 170. 57 significantly higher than the S for the unsuccessful group. The S keys were insignificantly lower, and the total scale scores were between the two. 92 Summary All major validity scales have been examined in this chapter. The order of discussion of the various keys was based upon the chronology of their development with the earliest devised scales discussed first. In addition to the major scales, however, several minor scales had to be considered in order to bring about a more lucid understanding of the major keys. All of the scales were developed around the MMPI with the exception of Set T. Set T was constructed with OAIS data. Essentially, 1) the L scale used items which were socially desir- able but rarely true of an individual; 2) the F scale was developed with rarity items (items chosen by ten percent or fewer of the sample); 3) the K scale items were selected because they differentiated normal individuals who scored abnormally from abnormal subjects who scored normally; 4) the 8-0 keys distinguished abnormal subjects from normal by the use of a subtlety-obviousness continuum; and 5) Set T and B scale items were developed by uncovering high controversial items and measured reSponse set. Clear probability statements concerning the validity of the various scales are lacking. The motivational research study with its instrumentation and the procedures utilized in the present investigation are discussed in Chapter III. ”Ibid. CHAPTER III PROCEDURES The present investigation used the administered test protocols of participants in Farquhar's motivational research project.1 In this chapter the background, theory, design and instrumentation of Farquhar's project are discussed and the procedures for the present study are out- lined. Background of Farquhar's Study In an attempt to collate a thorough objectively validated description of the personal characteristics of high and low academically motivated students, Farquhar scrutinized the existing literature on motivation. As a result, a theory of need-achievement and non-need-achievement motivation was formulated. A summary of the basic motivational theory is found in Tables V and VI. Table V. Summary of Theory of Need-Achievement and Non-Need- Achievement Motivation Basic to Current Research Motivational Situation Need-Achievement Non-Need-Achievement 1. Long term involvement 1. Short term involvement 2. Competition with a maximal 2. Competition with a minimal standard of excellence standard 3. Unique accomplishment 3. Common accomplishment lWilliam w. Farquhar, "A Comprehensive Study of the Motivational Factors Underlying Achievement of Eleventh Grade High School Students, " (East Lansing: Awarded Research Application to the Commissioner of Education, United States Office of Education, November 1, 1959), 15 pp. (Mimeographed.) 58 59 Table VI. Hypothesized Personality Factors Associated with Academic Achievement Academic Anxiety Independence-Dependence Conflict Self Value Activity Patterns Authority Relations Goal Orientation Inte rper sonal Relationships Three attitudinal areas were ascertained which were considered capable of differentiating between over-and under-achievers. These areas consisted of attitudes toward school and learning, toward self and toward parents. Later, another area was established: attitudes toward occu- pations. In order to test the posited motivational theory 725 items were logically developed with the purpose of contributing significantly to the regression of an academic predictor on grade achievement. The essential assumption was that students who exemplified extremes in academic performance should also exhibit extreme response to motivational characteristics. General Design of the Motivational Study The above hypothesis was tested with a sample of approximately 4200 eleventh grade Michigan public high school students who were attend- ing nine separate institutions. Under-and over-achieving students were identified by the following procedure: 1) Schools which had 9th grade Differential Aptitude Test scores available on their current 10th graders were contacted and asked to cooperate in the study. 2) A second aptitude measure was obtained so that reliable estimates of academic aptitude could be made. California Tests of Mental Maturity were administered to schools lacking this data. 3) The DAT-Verbal Reasoning subtest and the CTMM Language scores were used in obtaining a stable estimate of 60 academic aptitude after empirically examining possible DAT CTMM sub- score combinations. 4) Regression lines were calculated for each school and sex assuming a perfect correlation between DAT-VR and CTMM-L sub-tests. Separate equations were calculated because a pilot study indicated that one equation could not be generalized from school to school. Only those individuals who fell within one standard error of estimate above and below the regression line were included in the study. The methodology of selection of individuals with stable measured aptitude is summarized in Figure 1. Because it was important that the criterion groups be classified with little chance of making a Type I error (accepting instead of rejecting) it was decided to run the risk of Type II error (rejecting instead of accepting) even if sample were lost in the process. CTMM-L DAT-VR ® = Individuals selected for the study Figure I. Methodological Selection of Individuals with Stable Measured Aptitude. 5) Regression equations predicting grade point average from DAT-VR scores were calculated for each sex in each participating school. The DAT-VR was used because it was found to correlate consistently higher with grade-point average than the CTMM-L scores. Under-achievers were defined as those individuals whose grade point average fell at least one standard error of estimate below the regression line prediction of 61 achievement. Similarly, over-achievers were designated as falling one standard error above the regression line. The methodology used in selecting under-and over-achievers is summarized in Figure 11. Grade Point Average DAT-VR Ix): Over-achievers ®= Under-achievers Figure II. Method of Selecting Under-and Over—Achievers. By using the above method under-and over-achievers were selected from the full range of academic aptitude. Approximately twelve percent of the sample was classified in one of the extreme groups. Instrumentation The Generalized Situational Choice Inventory was developed by reviewing studies of over-and under-achieving students to determine characteristic differences. These characteristics were incorporated in pairs of statements which typified need or non-need-achievement. Classification into need or non-need-achievement was based on face validity of the content of choice. Choices had to meet the following criteria: 1) be as free as possible of culture stereotypes; 2) allow the 62 individual to project his choice into the future; and 3) be purely as possible need-achievement or non-need-achievement. It was not necessary that the choice be concerned with only one of the three sub- factors provided it could be clearly designated need or non—need achievement. The inventory consists of 200 item pairs. The Preferred Job Characteristics Scale was deveIOped essentially in the same manner as the inventory above. Previous studies indicate that occupational goals are positively correlated with need-achievement. Z The scale consists of 64 items. The Human Trait Inventory was developed by reviewing items from personality measures which previous research found to differentiate between under- and over-achieving students. The inventory consists of 120 items. The Word Rating List was developed by using adjectives which described under- and over-achievers. The inventory consists of 110 items. Procedures for the Present Investigation Selection of the F Scale The F scale was selected for development because it appears to bring into consideration more invalidating variables which influence test interpretation than any other singlevalidation scale. It is postulated that an F scale can differentiate protocols of test-takers 1) who are unCOOperative; 2) who can not comprehend the test items; 3) who make clerical errors; and 4) who intentionally place themselves in a bad light. In addition, Meehl and Hathaway present evidence that the F scale can differentiate between faked and non-faked protocols.3 (The rational for selection of F is presented on page 66.) zlhid. 3P. E. Meehl and S. R. Hathaway, "The K Factor as a Suppressor Variable in the MMPI, " Journal of Applied Psychology, 1946, 30:536-537. 63 F Scale DeveIOpment F scale items comprise those statements chosen with relatively low frequency by each sex for each of the four above inventories. The criterion for rarity of response for this study was placed at ten percent or less of the sample answering in one direction. F items were determined first on a stratified random sample consisting of 132 males and 132 females. - Stratification was based upon the identified over-achievers, under-achievers, and normals in the nine Michigan public high schools in Farquhar's motivational study. Under- and over-achievers were chosen in direct proportion to their representation in the pilot distributions. Both the over-achieving and under-achieving samples for both sexes consisted of sixteen subjects each. The subjects making up each of these groups represent approxi- mately twelve percent of the total sample. This percentage was found by Farquhar to represent the frequency of membership in any of the extreme motivational groups in the 4200 sampling of the original study. The normal sample consisted of 100 males and 100 females. An identical procedure was followed with a cross validation sample of equal numbers. Only those F items which were found to be in common between validation and cross validation groups were included in the final F scale. The completed F scale is based upon a sample of 264 males and 264 females . Sex Differences in F Item Selection Sex differences in F item selection (that is, avoidance in selection of the item) were graphically analyzed. F items which were selected by males only, by females only and by both sexes jointly were determined. Reliability of the F Scale Hoyt's analysis of variance technique was used in determining 64 F scale reliability.4 Stratified random samples of 66 males and 66 females, treated separately, were used in determining F item reliability. Validation of the F Scale It was assumed that under-achievers could be differentiated from over—achievers by F item selection. Because the F scale is a measure of social conformity (due to the 90% criterion for selection of the item), the over-achiever was expected to select few F items. »Conversely, the under-achiever was expected to select significantly more F items because of the non-conformity characteristics of his behavior. The following procedure was used to validate the Generalized Situational Choice Inventory F scale: 1) GSCI items which differentiated between over- and under- achievers were determined for the 171 male over-achievers and for the 137 male under-achievers. 2) An item frequency distribution was constructed for each group. The point of overlap where under-achievers scored as over- achievers on GSCI items was determined by plotting the re- Spective normal distribution curves. The point of overlap which was to be identified is illustrated in Figure III. Under-achievers Over-achievers Figure 111. Theorized Model of Selection of Misclassified Under- Achievers Differentiated by the GSCI. 4'C. J. Hoyt, "Test Reliability Estimated by Analysis of Variance, " Psychometrika, 1941, 6: 153-160. 65 3) Under-achievers who possessed GSCI scores as large as or greater than the overlap point were identified. These under- achievers scored as over—achievers on GSCI discriminating items. In addition, the over-achievers who selected less than the overlap point scored as under-achievers on GSCI discriminating items. The above groups were then labeled misclassified over-achievers and misclassified under- achievers, respectively. 4) Two random samples of equal numbers were selected from the remaining identified over-achievers and under-achievers. These groups were labeled properly classified over- and under-achievers. 5) An F item distribution for all four inventories was constructed for the misclassified and properly classified samples. The point of overlap where misclassified under-achievers scored as prOperly classified under-achievers was determined by plotting the respective normal distribution curves for the F items. An identical procedure was followed using misclassified over-achievers and properly classified over-achievers. 6) Replication of the above procedure was carried out by randomly assigning the misclassified and properly classified individuals into original and replicative samples. Two properly classified and two misclassified F item distributions were obtained. The common point of overlap between the original and replicative groups determined the critical F score. The selection procedure for determining the F item overlap score is illustrated in Figure IV. Misclassified Properly Misclassified PrOperly Under-achievers Classified Under-achievers Classified Under-achievers Under-achievers Original Sample Replicative Sample Figure IV. Selection Procedure for Determining F Item Overlap Score. 7) 8) 9) 10) 11) 12) 13) 14) 66 To obtain evidence of F scale validity three approaches were examined for the effect of F on: 1) expectancy of response fake; 2) test reliability; and 3) test validity. t-tests were then used to determine significant differences between F item means of the over-achieving misclassified and properly classified groups as well as of the under- achieving misclassified and properly classified samples. It was hypothesized that the GSCI reliability would increase after the application of the F scale because homogeneity of test performance would be increased. The effectiveness of the F scale would be determined by its ability to remove unstable individuals . The critical F score was then applied to a stratified random sample of 66 over-achievers, under-achievers and normals. Individuals possessing total F scores as large as or greater than the critical F score were then excluded from the sample. Reliabilities of the GSCI discriminating items were then estimated before and after application of the F scale through Hoyt's analysis of variance technique. It was hypothesized that the validity correlation of GSCI raw scores and grade point average would increase after the appli- cation of the F scale. Again, the effectiveness of the F scale would be determined by its ability to remove unstable individuals. An identical procedure was followed using the 189 female over-achievers and the 173 female under-achievers. An F item distribution was constructed using the cross validation sample of 132 males and 132 females. Rational of High Fake Expectancy Through the findings disclosed in Farquhar's study over-achievers tend to be highly stable, social conforming individuals. Conversely, under-achievers tend to be more erratic, less conforming persons. In addition, under-achievers tend to avoid "risk" situations which might possess the potential of placing them in a bad light. It is likely then 67 that the under-achiever does not want to appear in a bad light test- wise and is more erratic in test behavior. Therefore, the under- achiever would tend to select more non-conforming or F scale items than the over- achiever . Summary In this chapter the background, theory, design and instrumen- tation of Farquhar's motivational project were discussed. In addition, the design of the present investigation was outlined. The results of the study are discussed in Chapter IV. CHAPTER IV RESULTS OF THE INVESTIGATION The outcomes of the present investigation are discussed in this chapter. The selection of items comprising the F scale, sex differences in F item selection, reliability and validation of the scale are presented. Selection of the F Scale Rarity responses (based upon a 10% or less criterion for selection of the item) were determined separately for a stratified random sampling of 132 males and 132 females. Thirty-five rarity items were selected by the males from the total test battery. Eighty-six rarity items were chosen by the females from the total battery. The cross validation stratified random sample of 132 males and 132 females selected from the total test battery forty-one and ninety-two rarity items, respectively. Only those rarity items which were commonly selected by both the validation and the cross-validation samples comprise the F scale. In the final form the male F scale consists of twenty-five F items. The female F scale is composed of seventy-three F items. The number of rarity items appearing in each of the four tests for both sexes for vali- dation and cross validation samples appears in Table VII. Approximately 90% of the items which were not supported under the cross validation process would have been in common agreement if the item selection criterion had been placed at 15%. 68 69 Table VII. Number of Rarity Items in Each Test T t Validation Cross Validation es Sample Sample Final F Scale Males - Females Males - Females Males - Females GSCI 10 27 11 34 6 24 Human Trait 10 11 12 17 8 11 Word Rating List 10 27 14 29 8 26 Preferred Job 5 21 4 12 3 12 Characteristics Total 35 86 41 92 25 73 Sex Differences in F Item Selection Sex differences in F item selection are illustrated in Table VIII. Table VIII. Sex Differences in F Item Selection GSCI Word Rating List Males 6 Males 8 Females 24 Females 26 Overlap 4 Overlap 8 Human Trait Preferred Job Males 8 Males 3 Females 11 Females 12 Overlap 5 Overlap 3 Males selected five rarity items which were not chosen by females. Females selected fifty-three rarity items which were not chosen by males. Both males and females selected in common twenty F items. 70 Distribution of F Items An F item frequency distribution was constructed for the cross validation sample of 132 males and 132 females. Of a total of 25 possible F choices 37% of the males selected no F item. Eighty-seven percent of the sample chose four or less rarity responses and ninety-four per- cent selected six or less items. The highest number of F items selected by any one individual was sixteen. The male F item frequency distribution is illustrated in Figure V. 33 and over _ 31-33- x of Total Sample = 2.14 28-30- x of Normals = 2.10 x of Under-Achievers = 3.18 25-27‘ 3(— of Over-Achievers = l. 37 g 22-24 2" 19-211 ‘23 2 16-18‘ 0 U) 3 13-15J i m 10-124 ‘8 1. 7'94 ,8 g 4-6- Z . ,_ 1‘3 l 1 rm r1 | I r: 0 1 2 3 4 5 6f7 8'9'10V11'121314715'169 Number of F Items Figure V. Cross Validation F Item Frequency Distribution for Males (N = 132) 71 Of a total of 73 possible F choices for the females eleven percent of the sample selected no F item. Eighty-four percent of the sample chose eight or less rarity responses and ninety-one percent selected ten or less items. The highest number of F items selected by any one individual was 28. The female F item frequency distribution is illus- trated in Figure VI. 22'241' K of Total Sample = 5. 12 19-21' : of Normals Only = 5. 32 g x of Under-Achievers Only = 5. 25 :1 - J _ no 16 18 x of Over-Achievers Only = 4.81 r: '5' 13-1' U .33 5’, 10-12- -, 3 O- 7_9.. E m ”3 4-6- ‘0-4 O 1.. . - . _L B 1-3 . lull—714111.111 g 012345678910111213141516171819200r 2 Number of F Items more Figure VI. Cross Validation F Item Frequency Distribution for Females (N = 132) Reliability of the F Scale Hoyt's analysis of variance technique was used in estimating F scale reliability. Reliability was based on a stratified random sample of 66 males and 66 females. For the males a reliability estimate of . 729 was obtained. For the females a reliability estimate of . 746 was determined. 72 Validation of the F Scale To Obtain evidence of F scale validity three approaches were examined for the effect of F on: 1) expectancy of response fake; 2) test reliability; and 3) test validity. Effect of F on Expectancy of Response Fake It was assumed that under-achievers could be differentiated from over-achievers by F item selection. Because the F scale is a measure of social conformity (due to the 90% criterion for selection of the item), the over-achiever was expected to select few F items. Conversely, the under-achiever was expected to select significantly more F items because of the non-conformity characteristics of his behavior. GSCI Overlap The GSCI was used to obtain evidence of the validity of the F scale. This instrument was used because it embodied to a greater extent than the other battery instruments the motivational theory of Farquhar's project. Forty-five GSCI items differentiated at the 10% or better level of confidence after cross-validation between the 171 male over-achievers and the 137 male under-achievers . For the female sample (consisting of 189 over-achievers and 173 under-achievers) thirty items were estab- lished. An item frequency distribution was constructed for each group for each sex. The point of overlap where under-achievers scored as over-achievers on GSCI discriminating items was determined by plotting the distribution curves. The overlap point was identified as thirty-one GSCI items for the males and nineteen items for the females. The overlap points are illustrated for each sex in Figure VII. 73 x1 Y1 X2 . Y2 \ L 22, V2 0 31 45 0 19 30 Under-Achievers Over-Achievers Under-Achievers Over-Achievers Males Females GSCI Items x = Properly Classified z = Misclassified Under—Achievers Under-Achievers y = Properly Classified v = Misclassified Over-Achievers Over-Achievers Figure VII. GSCI Overlap Points for Over- and Under-Achievers for Each Sex Under-achievers who selected as many as or greater than the over- lap point scored like over-achievers on GSCI discriminating items. In addition, over-achievers who selected as many as or less than the overlap point scored like under-achievers. The above groups were then labeled misclassified over-achievers (Z, and zz) and under-achievers (v, and v2), respectively. The male and female misclassified over- achievers totaled forty-three and thirty-seven individuals, respectively. Male and female misclassified under-achievers totaled forty-four and sixty-eight persons, respectively. However, complete test data were not available for all subjects. Consequently, the sample was reduced to thirty-three male and thirty-two female over-achievers and forty-one male and forty-nine female under-achievers. The sample reduction was added evidence of the misclassification of the discrepant achievement groups. In all other analyses sample was reduced more frequently because of incomplete testing information for the under-achiever than the over-achiever. 74 All of the groups were randomly divided into two equal samples for validation and cross-validation analyses purposes. F It em Ove rlap An F item frequency distribution for all four inventories was constructed for the misclassified and properly classified male and female samples. The point of overlap where misclassified under- achievers scored on F items as properly classified under-achievers was determined by plotting the respective normal distribution curves. The point of overlap between the groups was identified as three F items for the males and six F items for the females. Both overlap points were identical after replication of the procedure. The same procedure was followed using misclassified and prOperly classified over-achievers. The overlap point was identified as two F items for the males and four F items for the females which again held after cross- validation. A t-test was used to determine significant differences between F item means of the male and female over-achieving misclassified and properly classified groups. A similar procedure was followed using under-achieving misclassified and properly classified samples. Comparisons of the significant findings with other pertinent data are given in Tables IX and X. The means for the eight groups were in the theoretically predicted direction: both male and female under-achievers tend to select more F items than over-achievers. Properly classified under-achievers scored significantly higher on F items than the properly classified over- achievers. The misclassified over-achievers, who actually scored as under-achievers on GSCI items, scored significantly higher on F items than the properly classified over-achievers. The misclassified under- achiever, who actually scored as an over-achiever on GSCI items, selected significantly fewer F items than the properly classified , 75 Table IX. Comparisons of F Item Means, Mean Squares and Sample Number Between Male and Female Misclassified and Properly Classified Groups Males Females Misclassified _ _ Over-Achievers X = 2.61 X = 6.45 MS: 13.2121 MS: 100.2812 N = 33 N = 32 Properly Classified __ _ Over—Achievers X = l. 03 X = 2. 25 MS: 2.0606 MS= 7.1875 N = 33 N = 32 Misclassified _ _ Under-Achievers X = 2. 07 X = 4.14 MS: 9.2439 MS: 35.1632 N = 41 N = 49 Properly Classified _ _ Under-Achievers X = 3. 59 X = 7. 33 MS = 22. 0243 MS = 96.4693 N = 41 N = 49 under-achiever. However, misclassified under-achievers selected significantly more rarity items than the prOperly classified over- achievers. Although there were no significant differences between means of the other groups, the direction of mean magnitude gives slight support to the underlying hypothesis. The misclassified over-achiever tended to select more rarity items than the misclassified under-achiever. In addition, there were no significant differences in F item selection between misclassified over-achievers and prOperly classified under- achievers. The magnitude of differences of F item selection between the groups would have doubtless been greater if the population (from which the investigation sample had been drawn) had not been screened previously. 76 Table X. Comparisons of T-Values and Significant Levels of One Tailed Tests Between Male and Female Misclassified and Properly Classified Groups Signifi- cance Levels df t-Value (Percent) hdales Properly Classified Properly Classified Under-Achievers and Over-Achievers 72 3. 0117 . 0005 Misclassified Misclassified Under-Achievers and Over-Achievers 72 .6836 ---- Misclassified Properly Classified Over-Achievers and Over-Achievers 64 2. 2898 . 01 Misclassified Properly Classified Under-Achievers and Under-Achievers 80 1.. 7272 . 025 Misclassified Properly Classified Over-Achievers and Under-Achievers 72 .9800 ---- Misclassified Properly Classified Under-Achievers and Over-Achievers 72 l. 7931 .025 Females Properly Classified Properly Classified Under-Achievers and Over-Achievers 79 2. 8166 . 005 Misclassified . Misclassified Under—Achievers and Over-Achievers 79 l. 2762 .12 Misclassified Properly Classified Over-Achievers and Over-Achievers 62 2. 2581 . 02 Misclassified Properly-Classified Under-Achievers and Under-Achievers 96 1.9273 . 03 Misclassified Properly Classified Over-Achievers and Under-Achievers 79 . 3832 --- - Misclassified Properly Classified Under-Achievers and Over-Achievers 79 1. 6725 . 05 77 Protocols of manifested uncooperative test-takers and individuals with erratic test performance were removed along with tests displaying obvious clerical errors before the present investigation was initiated. Hence, through earlier screening, the individuals whom the F scale attempts to identify were removed. However, regardless of the previous screening significant differences on F scale performance were obtained. Effect of F on Test Reliability It was hypothesized that the effectiveness of the F scale could be determined by its ability to remove unstable individuals who tend to lower instrument reliability by erratic test performance. Theoretically, reliability should increase with exclusion of unreliable subjects. However, the effect of homogeneity may operate also to reduce reliability. The question was asked which has the greater effect on reliability: erratic test performance or homogeneity? To test the effects of both of the above statements a random sample of subjects equal in magnitude to those identified by F as high fake potential were excluded. The assump- tion was made that the correlation reduced by random selection should be greater than the correlation reduced by F. The critical F score Obtained through the above procedure was then applied to a male and female stratified random sample of 66 over- achievers, under-achievers and normals. Sixteen individuals possessing total F scores as large as or greater than the critical F score were then excluded from the sample. In addition, an equal sized sample was randomly excluded to determine its effect on homogeneity. Hoyt's analysis of variance technique for estimating internal consistency reliability was used to obtain an estimate of consistency of the GSCI discriminating items before and after application of the F scale. The effects on reliability after application of the F scale are summarized in Table XI. 78 Table XI. Effects on GSCI Internal Consistency Reliability Before and After Application of the Male and Female F Scale Before Application After Application of F of F n r n r Ldales Total Sample 66 . 85 50 . 83 Randomly Excluded Sample 50 . 85 Females Total Sample 66 . 78 44 . 63 Randomly Excluded Sample 44 . 69 Although no significant differences between reliability coefficients were found, the magnitudinal direction of the coefficients did not sub- stantiate the hypothesis: as previously stated, after application of the F scale instrument reliability should increase. Effect of F on Test Validity The effect of the F scale on the validity coefficient between GSCI raw scores and standardized grade point averages was also determined. It was hypothesized that the Pearsonian validity coefficient between the two variables would increase after application of the F scale. The effects of F on validity for males and females are summarized in Tables XII and XIII. 79 Table XII. Effects on the Validity Coefficient Between GSCI Raw Scores and Standardized Grade Point Averages After Application of The Male and Female F Scale Level of Significanc e P = 0 n r (Percent) Ldales Correlation Before Application of F 66 . 582 . 005 Correlation After Application of F 50 . 501 . 005 Correlation After Random Exclusion 50 . 564 . 005 Females Correlation Before Application of F 66 . 243 . 03 Correlation After Application of F 44 . 394 . 005 Correlation After Random Exclusion 44 . 322 . 025 A linear regression line was plotted using GSCI raw scores and standardized grade point averages to locate placement of high F score males and females (see Figures VIII and IX). Eighteen percent of the males and thirty-eight percent of the females selecting a high number of rarity items fell one standard error of estimate below or above the regression line. Eighty-two percent of the high F males and forty-one percent of the high F females fell in the lower left quadrant of the regression plot. This area represents location of low achieving students . 80 Table XIII. Significance of Difference Between Validity Correlation Coefficients Before and After Application of the Male and Female F Scale r After r After Application of F Random Exclusion hdales Correlation Before Application of F , 59* (, 27%)>:’-< Females Correlation Before Application of F , 84>:<(, 20%)>:w:< , 41* (. 34%):{0}: Correlation After Application of F , 43* (, 33%)>:<>:< >2 z Transformation Score 5 >2: >:< Level of Significance ‘ d . . . . . . . . . . . . p . . o q t u v . . . . . e .711 .1l 4 . . . . . t .. o - . . . . . . . . . . . o l v i . t . o e . . o . . . u o s 1+ to O o u o o t o t o n o t 8 Y11a1 1' 1t . o c . . o o s n . . u u . r c o u o o 171 v o I s . . . . . a . . . - 9 11+ llll. u o c o c t o . c o u 1 .1101. c . t . . r . . 11¢ t t c . . . . . o . s 111101.. 0 s L L I, . e o o . o o c s c c v v o 9 a o s o o c 6 .q o 0 o e r o o o w v o o . v u . c e f 91 .v1 . t1 n a o A a L 11\ r L 1 . 1 1 A; . o > a o o a a o o c o .0 3 s q r u f + A 4. «I .1. o e o o o s . e a a o o . o o o o . ¢11fi1 o -7 o o L L L L _ L L L r o a v r . L o o o v o o o c . o t v 0 1 4119‘¢‘o. - t o o 1 111-. 4 1 + o c . a o s o c o o 011 .0 o 4 9 L L . u o a . c e a . c v c e e o t .o o .o 10.111“..le1 .- o o o O t t O H o o e o c c o c o 1V110111114|+ .e. v. L L L L L L . i o e n s o . . o v o o 1 n c t o v .1..1.113io|io . O t o . . 51.1 1.6... t c c f c o o o o o 1114‘1L1lo 0, o \J > x V ..... I 1 JJ 1 1 r L . c a o t o o o n c o o o s . o 9 o o L 0 s It lolljejt1.1¢ .x . o v o o I v. ‘11‘L1 o r t 4. o o a r o . c a o o . 1 1? o o o t . . e . . Q ~ 1 1. t . t c o 9 v 31 0 o 0 Av o 011.13.11jlt 16 . . c . o c a o . s . e o o o o o o b ljjltl #141 A o c . o c t r v D 1 .v c r O o 9 O1 9 o o 9 .0 1 O 0 v1 1j111l11$ 1‘ t n 4 3 V . u 1 n n r e c . . u c o s o b11?10 o . . . . c . . v o . 4 o o c t L ! O O .o e L w 1 c 11: .e v o v o o 4 t . o . lr1111 1. . t o a... ...-. ........11.1--L. v u | 7 111 o A n - I t 0 9 7 V e I l I I I v n t 1’ 111.111 .0 n g n u o 9 V .91‘0 11- a n u p . ... _ ‘.. . o n 01 It. 11 t . . . t . u . 61111! c o . r . ‘ (L o t t l 1, ~ . .L . . . . . t 11‘“ o o a L n ._ I 1' I O O o . 0 v t O O C V 1Y1 1‘ O D O O t is 11 .t u o . . . . e v . 1 a o t 1t . u. b . 11‘1111. . . - . . . . 11 .1 . v e o . o . _ L F L b .0 b 0 I c O O I '1 C 0 a A Q 6 c L L . o v t 1.LT.ll1c . t a . . . o1l$1.H o o c o . L L L 1m . o o .5 11 10.1 o o . c . v b1li111o » v o o. . 8 . . r o e .V r it o c . l s s . . 111.11.; < o . . A v 8 o .1131 0 c o . r o a t t l 11' t o . . o o v 1 o 1 o r o 3 o . o t 1? l .19 t r . . o o b .t 8 4 o . o . c o r v 1.1.1 o c o c n T— t .111. . t . . t . . t o .1 . . . o .. . o 9111 .9 v s . o c e c 4110 . . o o1 q 9 . v 0.111 t o c s r t v . v r . . . o o c . o u. 7 l v. 9.1 9 u o I o o o t u o o 11 c - o b . u. r s 1 4 o c 41191 r r + o o . 5L 8 e v o 1v v 4 o o w T L L L LY .YL1--. L..L. .#.L . A. L L L L L .1L..L.1a. ....L1...LL .1 L L L L . L L L fl! o .i.1 a e o 1o . o o v . . . . . o . c L L L w!» L L 1i! o 0 T. .Q o I t o r o n 1c 0 V o n 0 L efit .+...11vL ... ..-..o..... . _ 1 .1 4 a 9. 111.111# . v t t o o o v. 10 o o t o o o .11.». o 1 o t t t s s t 1 t . . 11.15 o o . Q 1'||7||O V T11. ‘ O o 0 1V 171 O I 0 IOI 1. o O 0 O O 0 1?le O O 6 0 19 0 e I 4 O .1 71 0 O 0 v - To! 0. o o. c o v o o c o 1.11 9 o v o r L L13 . o 51 o o c r v e v Li. o w t .Y . L L L L L L. a . H L L 4 L L L fl L L L r 11+ o 9 +1.10 5. 16. o . 4.131i o t o .6. 9 L . L _ L L L L _ Q. 01.. 1V a . A s o i o c 9 v1.7 + o o . L L _ L L L o 4 o 1. v. v o o o . e k . % m2... . . . c _ _ V L L L L L L L . o .91 o s . t a o o 4 1t c a t a c e e L 1 if H i r L . I . . . O a n? 11 as .L 1L I L t t i «W L1. } FWL s\ vVM I\ E I.‘ 6 I . )L ‘ \. L l n )- 111 gut-arm: tn the Inch NA -q 82 --v-- w--- L L L L _ t o t . . 4 q . . . t . o o . . . t o . . ...1-111 . o 1 t v . . e 1. - o . . . . . 1F.t . .- q r a . . . ..1 + s . r f..- 4141 L L L L L . o . c 1 r o o o t . . v I e o . e v 10 . . . c . 1 #151 t o o . . c r 9 1 v o v . . . . . .1191 . s a . o 1 . o c a 4 t r ‘11. o .16 6. L L L . . 4 o t e 4 r . . . . 1 1111 l . . . c .1 . Ir . . . . t . 1.41 1. v . v t A a i t . . . . . L .11131 t L t . 1. . . . . L o1 11 1 L- . . .L r v 1.1? .9 1 L L L L L _ L . 1 o .9 1 v t s c o s . t t v1 o1 o . v . s o a. 11... t . . a o o o o 1T11 t c a o o o o 1 v . v . . v o o 9111111t- . o 0 a o c .1 .1 v . s o a 4 o. .I Lv1161611119 I L L L L L .J 1 L V LL P 1 ll 1. 4 4 . If” t t o t ,1 o o 4 . + c1 c 1L-1111o . a c 1 1 . A 10111 L _ ' . a o t 4 141 11111 .v o o c r k e -.1v-1i1Y1 o s 01 5.. . U . . . . r 1.1. 91.1 16.- 1 r L. , v . s . 111% 1.1...- f...- » 511 L . u 4 o 11? t . Y.|¢1 « o A e e u . . 10111011011 I v I n L IOI ~\v / k r 9 4 . i L {N r. ..11.1. r1-.111L..LL L L L L -11.... .-.. .Ti-L ....- .1111- . L L L L y o c . s L o . o 4 o o o 4 v . . V1.fi1 .0 1 . t 1 + L e . . . o . . o t c v s v 1&11 . . o T o 41 It 10101. .-I L. L e L r c1 I. L a L .d 1A o . o 3 .t. 14 01 4 u 9 t 4 1 11‘ 111V1 4 4 . c v . 1110119. L . v 1 I 1111-9111 o c c 4 o v 9 o o .L . 111L . v a + 3131 L . c v . i o c «1? 3--.? . .l.1¢1t111. + . . «141 o L L L L L L L c 4111.. v a .1 u c . 41911 . a . .1111 . \1 L r. r r L L 1; L L In, 61 a 3 A 9 It 1... 0 I 0 + I. 1.1 1. >1. 5 141311 0| .0 o < o 9 111 L L L L L . 1110151111 .v 31113.1... 0 4 1 . .a 1.1011l... 9 .11141131 8 r o F.1o1 L L. L L L L 11.1; 1. v t 6 o t . . v . ..-.+111¢1t. LL .1. o + .o. . a L L L w L . L 17 q u 4 l I o o Q L. e c o 119 L s I v o w 91 .91 1&1 o L . D r a s L 1\ 4 L L a A s L L c 0 71+ 1. o o o . 4. 3 . s o o 14111161191 v t o +1 . L L L L 17141 1.1.. 1 If 1 lo 19 11116. v v r . v o. 41 t 1 1| IT} 141141 - .L. L L 1v... 111v '11 o o t .141111411L . n a u . a b. 161411111416 4. L L _ 4114.101 . o 4 s . s 0 411141111611F f... \- L. L . 1 L 1r 1 L L . . L L 414 .L 4 4 r 1101 c 9 11114.1.41 . . o . A v o 111 . L L L L. L L L m 111111911171 1. t t I- s v a v 41 91111191 t L . c o o +11 1 L L L L a 111.. . .e c . n c L L 1v.1.+11fi tlJ r s a +1.? .1. o . v..L.| + .1 - 1.1-4. . . a L+.. I. 4 +1+1L - 1 T e1 +1 1v .A H ’ 4 q 1111 1Y11e 11 1 11.... LL L L . .s.‘1 .11;- 1- 01-1. v . L L L r F 1 v I no! 0. 0 ’ D O D .f x L L711-.. . L . L L L Y L 111.1‘1? . .9 lo 7 L. H. .. 11111114111 .9 v +, L X L .11 ..01 A 1.1. L- . .V- o o t o t I 111. . q .9 1L 1].: 19 1 1? 1 v 1 s L L . L o o . f 1t. .1 lo . a L L L L a o o .4 o e v .4 111141 r .411 o v k r L L s v o o t » IA‘L‘. 1L1 1 4 T .1v.114| 0 + 4 4 L L I 1f. O I + L L 1 A 11+. o t o O v -Lo 1 L L L L o 1 .+ 411.11»1 11 1 L 6 ‘1- n C) 6 5 (7‘ch Ti-mrnn 1‘1 Squares to the Inch 83 Summary The outcomes of the present investigation were presented in this chapter. Selection of the items comprising the final F scale was based upon commonly selected rarity responses between validation and cross validation samples. In the final form the male F scale consists of twenty-five F items. The female F scale is composed of seventy-three F item-s. Sex differences in F item selection were determined. Males selected five rarity items which were not chosen by females. Females selected fifty-three rarity items which were not chosen by males. Both males and females selected in common twenty F items. Hoyt's analysis of variance technique was used in determining F scale reliability. For the males a reliability coefficient of . 729 was obtained. - For the females a reliability coefficient of . 746 was determined. For scaling purposes this is slightly less than desirable. The critical F score for both males and females was determined by plotting respective F distribution curves for misclassified and properly classified over- and under-achievers. The point of overlap where mis- classified under-achievers scored as properly classified under-achievers on F items was identified as three rarity responses for males and six for females after cross-validation. To obtain evidence of F scale validity three approaches were examined for the effect of F on: 1) expectancy of response fake; 2) test reliability; and 3) test validity. Under-achievers selected significantly more F items than over- achievers in both male and female samples. Consequently, the rational of high fake expectancy was clearly substantiated. The respective critical F scores were applied to a sample of males and females. Individuals possessing F scores as large as or greater than 84 the critical score were excluded from the sample. Hoyt's analysis of variance technique for estimating internal consistency was used to obtain a reliability statement of the GSCI discriminating items before and after application of the F scale. For both male and female samples no signifi- cant differences in reliability coefficients were obtained. The effects on validity correlation of GSCI raw scores and standard- ized grade point averages before and after application of the male and female F scale were determined. Before application of the male and female validity coefficients were . 582 and . 243, respectively. After use of the F scale the male correlation decreased to . 501 and the female validity coefficient increased to . 394. However, no significant differences in correlations were obtained after F was applied. All validity corre- lations were significant from zero at the 3% or better level of confidence. A linear regression was plotted using GSCI raw scores and standard- ized grade point averages to locate placement of high F score males and females. Eighteen percent of the males and thirty-eight percent of the females selecting a high number of F items fell one standard error of estimate below or above the regression line. Eighty-two percent of the high F males and forty-one percent of the high F females fell in the lower left quadrant of the regression plot. This area represents location of low achieving, :‘low ability students. The interpretation of the findings and the summary of the investi- gation are presented in Chapter V. CHAPTER V SUMMARY AND CONCLUSIONS The Problem This investigation was concerned with the development and valida- tion of an F scale for an objective test battery of motivation. Methodology The present investigation used the administered test protocols of participants in Farquhar' s motivational research project. Approximately 4200 eleventh grade Michigan public high school students comprised the population from which sample was drawn for this study. Instrumentation consisted of the Generalized Situational Choice Inventory, the Preferred Job Characteristics Scale, the Human Trait Inventory and The Word Rating List. The battery consists of 502 items. Rarity responses (based upon a 10% or less criterion for selection of the item) were determined separately for a stratified random sampling of 132 males and 132 females. A cross-validation sample of equal numbers was also used. Items comprising the final F scale were based upon commonly selected rarity responses between validation and cross validation samples. In the final form the male F scale consists of twenty-five F items. The female F scale is composed of seventy-three F items. 85 86 Sex differences in F item selection were determined. Males selected five rarity items which were not chosen by females. Females selected fifty-three rarity items which were not chosen by males. Both males and females selected in common twenty F items. Hoyt's analysis of variance technique was used in estimating F scale reliability. For the males a reliability coefficient of . 729 was obtained. For the females a reliability coefficient of . 746 was determined. The critical F score for both males and females was determined by plotting respective F distribution curves for misclassified and properly classified over- and under-achievers. The point of overlap where mis- classified under-achievers scored as properly classified under-achievers on F items was identified as three rarity responses for males and six for females after cross validation. To obtain evidence of F scale validity three approaches were examined for the effect of F on: 1) expectancy of response fake; 2) test reliability; and 3) test validity. Under-achievers selected significantly more F items than over- achievers in both male and female samples. Consequently, the rational of high fake expectancy was clearly substantiated. The respective critical F scores were applied to a sample of males and females. Individuals possessing F scores as large as or greater than the critical score were excluded from the sample. Hoyt's analysis of variance technique for estimating internal consistency was used to obtain a reliability statement of the GSCI discriminating items before and after application of the F scale. It was hypothesized that further evidence of the effectiveness of the F scale could be determined by its ability to remove unstable individuals who tend to lower instrument reliability by erratic test performance. Theoretically, reliability should increase with exclu- sion of unreliable subjects. However, the effect of homogeneity of test performance may operate also to reduce reliability. The question was raised as to which has the greater effect on reliability: erratic test per- formance or homogeneity of test performance. To test the effects of the 87 above question a random sample of subjects equal in magnitude to those identified by F as high fake potential were excluded. The assumption was made that the internal consistency reliability coefficient reduced by random selection should be greater than the reliability coefficient reduced by F selection. Although no significant differences between reliability coefficients were found, the magnitudinal direction of the coefficients did not sub- stantiate the hypothesis: as previously stated, after application of the F scale instrument reliability should increase. The effects on validity between GSCI raw scores and standardized grade point averages before and after application of the male and female F scale were determined. Before application of F the male and female validity coefficients were . 582 and . 243, respectively. After use of the F the male correlation decreased to . 501 and the female validity co- efficient increased to . 394. However, no significant differences in correlations were obtained after F was applied. All correlations were significant from zero at the 3% or better level of confidence. A linear regression line was plotted using GSCI raw scores and standardized grade point averages to locate placement of high F score males and females. Eighteen percent of the males and thirty-eight percent of the females selecting a high number of rarity items fell one standard error of estimate below or above the regression line. Eighty- two percent of the high F males and forty-one percent of the high F females fell in the lower left quadrant of the regression plot. This area represents location of low achieving, low ability students. Conclusions The following conclusions are based upon the findings of the investigation: 1. Females selected approximately three times as many rarity items as males. Males displayed a greater controversiality in response to test battery items. Conversely, females were significantly in greater agreement than males. The F scale therefore represents a measure of the presence of or lack of social conformity. 88 . Males selected five rarity items which were not chosen by females. Females selected fifty-three rarity items which were not chosen by males. Both males and females selected in common twenty F items. The F scale therefore possesses the ability to tap an academic masculinity-femininity continuum. The hypothesis that identified under—achievers would select more F items than identified over-achievers was significantly substantiated. The under-achiever selects significantly more F items because of the non-conformity, unstable characteris- tics of his behavior. The F scale is able to identify significantly male and female properly classified under-achievers from properly classified over-achievers, misclassified over-achievers from properly classified over-achievers, misclassified under-achievers from properly classified under-achievers and misclassified under-achievers from properly classified over-achievers. The hypothesis that GSCI reliability would increase after exclusion of high F score individuals was not substantiated. The assumption that correlation reduced by random selection should be greater than correlation reduced by F selection was not significantly supported. A greater decrease of reliability coefficient would have doubt- less occurred if the population (from which the investigation sample had been drawn) had not been screened previously. Protocols of manifested unc00perative test-takers and individuals with erratic test performance were removed along with tests displaying obvious clerical errors before the present investi- gation was initiated. Hence, through earlier screening, the very individuals whom the F scale attempts to identify were removed. However, regardless of the previous screening significant differences between under- and over-achievers on F scale performance were obtained. The hypothesis that the correlation between GSCI raw scores and standardized grade point averages would increase after application of the F scale was not significantly substantiated. The male validity coefficient decreased while the female coefficient increase appreciably. Although no significant dif- ferences between validity coefficients occurred, the magnitudinal direction of the female coefficient supported the hypothesis. The lack of range of the male F scale (only 25 F items from the test battery) doubtless penalized the effectiveness of F to raise the male validity coefficient. 8. 10. 11. 89 The reason for lack of increase in male validity coefficient was determined by plotting the regression line. Only eighteen percent of the high F males fell one standard error of estimate below and above the regression line. Thus, by excluding 82% of the stable individuals the validity coefficient decreased because of restriction. Conversely, the female validity co— efficient increased because thirty—eight percent of the high F individuals fell one standard error below and above the regression line. Thus, a larger group of unstable individuals were excluded. .' The male F scale successfully identified 60% of the individuals falling in the lower left quadrant of the regression plot. This area represents low achieving males. Consequently, the F scale appears to be an effective instrument in identification of low achieving males. Further investigation with the F scale should be conducted before actual employment of the scale in test battery interpretation. Although there is significant empirical evidence to show that the F scale can differentiate reliably between under- and over- achievers, the complete validation of the instrument is lacking. For females the F appreciably but not significantly increases validity; for males the evidence is not so clear. Re-evaluation of the F scale concept as used in the MMPI should be conducted. Lack of MMPI F scale validity casts serious doubt on its utility as a validation scale. In addition, from the evidence presented in this study scoring in a rarity direction does not necessarily preclude normalcy or valid protocols. Evidence supports the conclusion that rarity of response is not a basis for a validation key. F appears to possess potential for a dis- criminating scale between various subtle behavioral phenomena. Irnplications for Further Research Implications for further investigation include: 1. Increase male F item selection criterion to a fifteen percent level of reSponse frequency. With increase magnitudinal range the male F scale would doubtless influence more decidedly test battery interpretation. . Analyze factorially the structure of the items comprising the F scale. 10. 9O Determine the potential of the F scale as a test of conformity. Subjects with high agreement scores (as opposed to F item low agreement criterion) should be investigated to determine relationship between frequency items and conformity character- istics. Examine behavioral characteristics of individuals selecting a high frequency of rarity items. Construct and validate a research masculinity-femininity scale using male and female F scale items. Examine item response and location of individual scores within the various quadrants of a regression plot to determine commonality of variables impinging upon behavioral character- istics. Develop and validate as research instruments a response bias scale, a K scale (items on which under-achievers score as over-achievers, etc. ), and a lie scale for the motivational battery. Construct an adolescent independent-dependent striving response bias scale by using test items which have implications for tapping the above respective variables. Construct an adolescent power-striving response bias scale by using the above procedure. Test items which possess implications for tapping adolescent power-needs would be determined and placed in a power scale. Conduct rigorous validation investigations of validity scales on response distortion. BIBLIOGRAPHY 91 BIBLIOGRAPHY A. BOOKS Edwards, A. L. Edwards Personal Preference Schedule. New York: The Psychological Corporation, 1959. 27 pp. Fricke, B. G. The Opinion, Attitude and Interest Survey. Minneapolis: Investors Diversified Services, 1955. Hartshorne, Hugh and May, M. A. Studies in Deceit. New York: Macmillan, 1928. 248 pp. Hathaway, S. R. Supplementary Manual for the MMPI. New York: The Psychological Corporation, 1946. Hathaway, S. R. and McKinley, J. C. The Minnesota Multiphasic Personality Schedule. Minneapolis: University of Minnesota Press, 1942. Hathaway, S. R. and McKinley, J. C. Manual for the MMPI. New York: The Psychological Corporation, 1946. Maller, M. B. Character Sketches. New York: Bureau of Publications, Teachers College, Columbia University, 1932. 388 pp. Maller, J. B. “Personality Tests.” In J. M. Hunt, Personality and the Behavior Disorders. New York: Ronald Press, 1944. Kuder, G. F. Kuder Preference Record Vocational. Chicago: Science Research Associates, 1956. 35 pp. Kuder, G. F. Kuder Preference Record Occupational. Chicago: Science Research Associates, 1959. 18 pp. Ruch, F. L. "A Technique for’Detecting Attempts to Fake Performance on a Self-Inventory Type of Personality Test. " In Quinn McNemar and M. A. Merrill, Studies in Personality. New York: McGraw- Hill, 1942. Strong, E. K. Vocational Interest of Men and Women. Stanford: Stanford University Press, 1943. 746 pp. Symonds, P. M. Diagnosing Personality and Conduct. New York: Appleton-Century, 1932. 602 pp. 92 93 B. PERIODICALS Adams, C. R. "A New Measure of Personality, " Journal of Applied Psychology, 1941, 25:141-151. Allport, G. W. "A Test for Ascendance-Submission, " Journal of Abnormal Psychology, 1928, 23:118-136. Allport, G. W. "The Use of Personal Documents in Psychological Science, " Social Science Research Council Bulletin, 1942, Number 42. Benton, A. L. "The Interpretation of Questionnaire Items in a Personality Inventory, " Archives of Psychology, 1935, Number 190. Benton, A. L. "The MMPI in Clinical Practice, " Journal of Nervous and Mental Disorders, 1945, 102:416-420. Bernreuter, R. G. ”Validity of the Personality Inventory, " Personality Journal, 1933, 11:383-386. Bernreuter, R. G. "Theory and Construction of the Personality Inventory, " Journal of Social Psychology, 1933, 4:387-405. Bernreuter, R. G. "The Present Status of Personality Trait Tests, " Educational Research Supplement, 1940, 21: 160-171. Bills, Marion. "Selection of Casualty and Life Insurance Agents, " Journal of Applied Psychology, 1941, 25:6-10. Bordin, E. S. ”A Theory of Vocational Interests as Dynamic Phenomena, " Educational and Psychological Measurement, 1943, 3:49-65. Cady, V. M. "The Estimation of Juvenile Incorrigibility, " Journal of Delinquency Monographs, 1923, Number 2. Cofer, C. N., Chance, J. and Judson, A. J. "A Study of Malingering on the MMPI, " Journal of Psychology, 1949, 27:491-499. Cottle, W. C. "Card Versus Booklet Forms of the MMPI, " Journal of Applied Psychology, 1950, 34:255-259. Cottle, W. C. "The MMPI: A Review, " Kansas Studies in Education, 1953, 3:1-82. 94 Cronbach, L. J. "Response Sets and Test Validity, " Educational and Psychological Measurement, 1946, 6:475-494. Cronbach, L. J. "Further Evidence on Response Sets and Tests Designs, " Educational and Psychological Measurement, 1950, 10:3-31. Eisenberg, P. "Individual Interpretation of Psychoneurotic Inventory Items, " Journal of Genetic Psychology, 1941, 25:19-40. Eisenberg, P. and Wesman, A. "A Consistency in ReSponses and Logical Interpretation of Psychoneurotic Inventory Items, " Journal of Educational Psychology, 1941, 32:321-338. Frenkel-Brunswik, E. "Mechanisms of Self-Deception, " Journal of Social Psychology, 1939, 10:409-420. Fricke, B. G. "Conversion Hysterics and the MMPI, " Journal of Clinical Psychology, 1956, 12:322-326. Fricke, B. G. "A Response Bias Scale for the MMPI, " Journal of Counseling Psychology, 1957, 4:149-153. Fricke, B. G. "Subtle and Obvious Test Items and ReSponse Set, " Journal of Consulting Psychology, 1957, 21:250-252. Gough, H. G. "Simulated Patterns on the MMPI, " Journal of Abnormal and Social Psychology, 1947, 42:215. Gough, H. G. ”Factors Relating to the Academic Achievement of High School Students, " Journal of Educational Psychology, 1949, 40: 65-78. Gough, H. G. "What Determines the Academic Achievement of High School Students, " Journal of Educational Research, 1953, 46: 321-331. Gough, H. G. "The F Minus K Dis simulation Index for the MMPI, " Journal of Consulting Psychology, 1950, 14:408-413. Guilford, J. P. and Guilford, R. B. "Personality Factors S, E, and M and Their Measurement, " Journal of Psychology, 1936, 2:109-127. Hathaway, S. R. and McKinley, J. C. "A Multiphasic Personality Schedule: 1. Construction of the Schedule, " Journal of Psychology, 1940, 10:249-254. 95 Hathaway, S. R. and McKinley, J. C. "A Multiphasic Personality Schedule: III. The Measurement of Symptomatic Depression, " Journal of Psychology, 1942, 14:73-84. Horst, Paul. "The Prediction of Personal Adjustment, " Social Science Research Council Bulletin, 1941, Number 48. Hovey, H. B. "Detection of Circumvention in the MMPI, " Journal of Clinical Psychology, 1948, 4:97. Hoyt, C. J. "Test Reliability Estimated by Analysis of Variance, " Psychometrika, 1941, 6:133-160. Humm, D. G. and Humm, K. A. "Validity of the Humm-Wadsworth Temperament Scale: With Consideration of the Effects of Subjects' Response-Bias, " Journal of Psychology, 1944, 18:55-64. Humm, D. G., Storment, R. C. and Iorns, M. E. "Combination Scores for the Humm-Wadsworth Temperament Scale, " Journal of Psychology, 1939, 7:227-253. Humm, D. G. and Wadsworth, G. W. ”The Humm-Wadsworth Tempera- ment Scale, " American Journal of Psychiatry, 1935, 92:163-200. Hunt, H. F. "A Study of the Differential Diagnostic Efficiency of the MMPI, " Journal of Consulting Psychology, 1948, 12:331-336. Hunt, H. F. "The Effect of Deliberate Deception on MMPI Performance, " Journal of Consulting Psychology, 1948, 12: 396-402. Hunt, W. A. ”The Detection of Malingering: A Further Study, " United States Naval Medical Bulletin, 1946, 46:249. Hunt, W. A. and Older, H. J. "Detection of Malingering Through Psychometric Tests, " United States Naval Medical Bulletin, 1943, 41:1318. Kazan, A. T. and Sheinberg, I. M. "Clinical Note on the Significance of the Validity Score F in the MMPI, " American Journal of Psychiatry, 1945, 102: 181-183. Kelly, E. L., Miles, C. C. and Terman, L. M. "Ability to Influence One's Score on a Typical Pencil and Paper Test of Personality, "' Character and Personality, 1936, 4: 206-215. 96 Krumboltz, J. D. and Farquhar, W. W. "The Effect of Three Teaching Methods on Achievement and Motivational Outcomes in a How-to- Study Course, " Psychological Monographs, 1957, 71:Number 14. Laird, D. A. "Detecting Abnormal Behavior, " Journal of Abnormal Psychology, 1926, 20:128-141. Landis, C. and Katz, S. E. ”The Validity of Certain Questions Which Purport to Measure Neurotic Tendencies, " Journal of Applied Psychology, 1934, 18:343-356. Levine, A. S. "A Technique for Developing Suppression Tests, " Educational and Psychological Measurement, 1952, 12:313-315. Maller, J. B. "The Effect of Signing One's Name, " School and Society, 1930, 31:882-884. McKinley, J. C. and Hathaway, S. R. "A Multiphasic Personality Schedule: V. Hysteria, Hypomania and Psychopathic Deviate, " Journal of Applied Psychology, 1942, 14:73-84. McKinley, J. C., Hathaway, S. R. and Meehl, P. E. "The MMPI: VI. The K Scale, " Journal of Consulting Psychology, 1948, 12:20-31. McNemar, Quinn. "The Mode of Operation of Suppressant Variables, " American Journal of Psychology, 1945, 58:554-555. McQuary, J. J- and Truax, W. E. ”An Under Achievement Scale, " Journal of Educational Research, 1955, 48:393-399. Meehl, P. E. "An Investigation of General Normality Control Factor in Personality Testing, " Psychological Monographs, 1945, 59: Number 4. Meehl, P. E. "The Dynamics of Structured Personality Tests, " Journal of Clinical Psychology, 1945, 1:296-303. Meehl, P. E. and Hathaway, S. R. "The K Factor as a Suppressor Variable in the MMPI, " Journal of Applied Psychology, 1946, 30:525-564. Metfessel, M. "Personality Factors in Motion Picture Writing, " Journal of Social and Abnormal Psychology, 1935, 30:333-347. 97 Middleton, George and Gutherie, G. M. "Personality Syndromes and Academic Achievement, " Journal of Educational Psychology, 1959, 50: Number 2. Mosier, C. I. "A Note on Item Analysis and the Criterion of Internal Consistency, " Psychometrika, 1936, 1:275-282. Olson, W. C. "The Waiver of Signature in Personal Reports, " Journal of Applied Psychology, 1936, 20:442-450. Ossipov, V. P. "Malingering: The Simulation of Psychosis, " Bulletin of the Menninger Clinic, 1944, 8:39-42. Rosen, E. "Self Appraisal and Perceived Desirability of MMPI Personality Traits, " Journal of Counseling Psychology, 1956, 3:44-51. Rosen, E. "Self Appraisal, Personal Desirability, and Perceived Social Desirability of Personality Traits, ” Journal of Abnormal and Social Psychology, 1956, 52:151-158. Rosenzweig, Saul. "A Suggestion for Making Verbal Personality Tests More Valid, " Psychological Review, 1934, 41:400-401. Rosenzweig, Saul. "A Basis for the Improvement of Personality Tests with Special Reference to the M-F Battery, " Journal of Abnormal and Social Psychology, 1938, 33:476-488. Schmidt, H. O. "Test Profiles as a Diagnostic Aid: The MMPI, " Journal of Applied Psychology, 1945, 29:115-131. Schmidt, H. 0. "Notes on the MMPI: The K Factor, " Journal of Consulting Psychology, 1948, 12:337-342. Schneck, J. M. "Clinical Evaluation of the F Scale on the MMPI, " American Journal of Psychiatry, 1948, 104:440-442. Shoben, E. J. "The Assessment of Parental Attitudes in Relation to Child Adjustment, " Genetic Psychology Monographs, 1949, 39:101-148. Spencer, D. "Frankness of Subjects on Personality Measures, " Journal of Educational Psychology, 1938, 29:26-35. Steinmetz, H. C. "Measuring Ability to Fake Occupation Interest, " Journal of Applied Psychology, 1932, 16: 123-230. 98 Sweetland, A. "Hypnotic Neurosis-Hypochondriasis and Depression, " Journal of Genetic Psychology, 1948, 39:19-105. Vernon, P. E. ”The Attitude of the Subject in Personality Testing, " Journal of Applied Psychology, 1934, 18:165-177. Washburne, J. N. "A Test of Social Adjustment, " Journal of Applied Psychology, 1935, 19: 125-244. Wherry, R. J. "Test Selection and Suppressor Variables, " Psychometrika, 1946, 11:239-247. Wiener, D. N. "Selecting Salesmen with Subtle-Obvious Keys for the MMPI, " American Psychologist, 1948, 3:364. Wiener, D. N. "Subtle and Obvious Keys for the MMPI, " Journal of Consulting Psychology, 1948, 12:164-170. Wiener, D. N. "A Control Factor in Social Adjustment, " Journal of Abnormal and Social Psychology, 1951, 46:3-8. Willoughby, R. R. and Morse, M. E. ”Spontaneous Reactions to a Personality Inventory, " American Journal of Orthopsychiatry, 1936, 6:562-575. C . UNPUBLISHED MATERIALS Arnold, D. A. "The Clinical Validity of the Humm-Wadsworth Tempera- ment Scale in Psychiatric Diagnosis. " Unpublished Doctor's dissertation, University of Minnesota, Minneapolis, 1942. Farquhar, William W. "A Comprehensive Study of the Motivational Factors Underlying Achievement of Eleventh Grade High School Students. ” East Lansing: Approved Research Application to the Commissioner of Education, United States Office of Education, 1959. (Mimeographed.) Fricke, B. G. "The Development of an Empirically Validated Personality Test Employing Configural Analysis for the Prediction of Academic Achievement. " Unpublished Doctor' 3 dissertation, University of Minnesota, 1954. 99 Hendrickson, G. "Attitudes and Interests of Teachers and PrOSpective Teachers. " Paper read before Section Q, AAAS, Atlantic City, December 27, 1932. Jeffery, M. E. "Some Factors Influencing Answers on the Multiphasic K Scale. " Unpublished Doctor's dissertation, University of Minnesota, Minneapolis, 1946. APPENDIX 100 MOTIVATIONAL TEST BATTERY F SCALE ITEMS 0 = Male Rarity Item X = Female Rarity Item * = Male and Female Rarity Item Human Trait Inventory Agree Direction 0-2 >:<-25 >:<-52 >:<-e7 x-79 0-81 X-85 *-87 *-88 X-97 X-112 0—115 x-1zo X-124 Word Rating Li st I like collecting flowers or growing house plants. I have played that I am sick to get out of doing something. Most of my school subjects are a complete waste of time. When I was a youngster I stole things. I have played hooky from school. There was a time in my life when I liked to play with dolls. My parents object to the friends I choose. I have been sent to the principal for misbehaving in class. I have trouble with my muscles twitching or jumping. One or more times a week I suddenly feel hot all over for no apparent reason. I wish I were a child again. I feel cross and grouchy without good reason. I feel that I haven't any goals or purpose in life. I would like to belong to a motorcycle club. Agree Direction Teachers feel that I am: ::._3 x-4 *-11 *-14 dull inefficient unsucc es sful "blah" 101 x-19 $-21 x-37 X-43 x-44 >.<-45 x-51 x-53 X-62 X-64 x—es X-66 X-69 x-79 >=<-9o x-92 X-93 $-94 X-108 x- 110 >:<- 111 x- 114 102 uninterested unreliable childish cold below average reckless a goof off lazy unreasonable a "wheel" a "grind" fool-hearty retiring a "brain" outsider a person who delays indecisive irresponsible fault-finding dominant inaccurate pushed Generalized Situational Choice Inventory Agree Direction I would prefer to: x-z X-9 X-62 X-107 $-114 25-137 Do well in school. Be quick, but often incorrect. Receive grades which are like everyone elses'. Be known for what I could do. Learn by defeating an inexperienced player. Have decisions made for me. X- 144 X- 176 X- 180 X- 187 103 Live a life of leisure. Have an instructor who gave me an "A" and not care whether I learned anything or not. Accept what someone else says even though I don't agree. Date the smartest girl or boy in class. Disagree Direction 0- 17 *-20 X-50 0 - 51 X-77 X-88 *-102 X- 118 X- 119 X- 122 X-l39 X- 157 X- 170 Make something planned by somebody else. Accomplish a task in a hurry, but less carefully. Finish a job. Play a game against inexperienced players and win. Do a recognized but incomplete job. Take a course from an instructor who only gives "C's". Have an easier job which pays less. Buy something on credit and pay for it as I use it. Do what others think is right. Be known as an expert. Accomplish a difficult task fast. Win a game from an inexperienced player. Have lots of money. Preferred Job Characteristics Inventory I Prefer: X- 1b * - 7a * - 8b X- 14b >i< - 15b X - 17b A job with short working hours. A job which requires little thinking. A job where I make few if any decisions. A job which requires little thinking. A job where I make few if any decisions. A job where I'could not be fired. X- 25b X- 39a X- 46a X- 49a X- 57b X- 64b 104 A job which permits me to take days off when 1 want. A job where I could not be fired. A job where I could not be fired. A job which requires little thinking. A job where I make few if any decisions. A job which requires little thinking. "“l . l- ‘1’ .‘ I “*0 1 W1 rs...»— .’ ,...n-e"'- _ . .2" .2 ’oov'V ‘ A