MI l ”W W l | l ‘ WI 1 MM \ W \ MAMFEST SYRUCTURE Af‘e‘ALYSES 0F SGPERWSQR‘Y' TESTENG Thesis fa: fho Degree o§ M. A. MKHEGAfl STATE CCLLEGE Céayfon H, Rashteigh 1954 Thu-.1515 This is to certify that the thesis entitled IxL'NIFEST STRUCTURE ANALYSIS OF SUPERVISORY TESTING presented by Clayton H. Rashlei gh has been accepted towards fulfillment of the requirements for LLA. degree in PsychOIOgy Major professor hi 2 8 l 954 Date ay ’ 0-169 MANIFEST STRUCTURE ANALYSIS OF SUPERVISOHY TESTING Clayton H. Rashleigh A Thesis Submitted to the School of Graduate Studies of Michigan State College of Agriculture and Applied Science in partial fulfullment of the requirements for the degree of RMSTER OF ARTS Department of Psychology Year l95h ACKNOWLEDGIENTS The author wishes to express his sincere thanks to Dr. Frank M. du Mas for generously making available his knowledge, energy, and pioneering techniques of analysis. He is also greatly indebted to Dr. Carl F. Frost for much concrete help, unfailing interest, and encouragement. Grateful acknowledgement is also due to Dr. G. M. Gilbert for patience and cooperation in many ways. 331306 INTRODUCTION . . . . . THE PROBLEM . . . . . PROCEDURE . . . . . . . Subjects . . . . Criteria . . . . Apparatus . . . . Basic Data . . . Method of Analysis RESULTS AND DISCUSSION SUMMARY AND CONCLUSIONS BIBLIOGRAPHY . . . . . TABLE OF CONTENTS Page 11 12 12 12 13 1h 1h 20 E’- LIST OF FIGURES AND TABLES Page FIGURE 1 O O O O I O O O O O O O O O O O O O C O O O O O O O O 0 9a FIGEIRE 2 O O O O O O O O 0 O O O O O O O O O O O O O 0 O O O O O 17 T .t‘iBLE I 0 O O O O O O O O O 0 O O O O Q 0 O O 0 O O O O O O O O O 16 INTRODUCTION The purpose of the present study is to investigate a new technique of item validation and scale construction recently formulated by du Mas (h). The need for valid tests in industry and business, as well as in the armed services, has been emphasized by Super (1h) and Lawshe (9). Testing of Personnel for selection, placement, or promotion in in- dustry has become an increasingly important development in psychology in recent years. Super (1h) calls testing big husiness, stating that one million Americans took sixty million tests in one year. Lawshe (9) com- ments that recent war years clearly demonstrated the effectiveness of personnel tests both in industry and in the military services. Lawshe (9) sees a need for tailor-made tests in all areas, and re- marks on a growing tendency toward selective scoring of commercially available standard tests for a specific situation. The important ques- tion, he asks, is whether or not the test helps to identify the persons who are apt to be most successful on this particular job. In this con- nection, he states: "Whether it is a matter of test construction or the selection of significant items in commercially available tests, the prob- lem of item validation is one and the same." (9, p. 17), The usefulness of valid testing instruments has been illustrated by many studies. 'Wadsworth (17) has shown that test-selected employees proved satisfactory more often than non-test selected employees and produced a smaller percent of problem employees. Strong (12) showed that 56% of insurance salesmen scoring A on an interest test had individual sales totals of $150,000 while onlv 6% of salesmen scoring C on the test achieved that figure. In the Army Aviation Testing Program, four percent of cadets with stanine score 9 were eliminated from.primary flying school while seventy- seven percent of the cadets with stanine score 1 were eliminated. File and Remmers (5) found that of forty-six men selected as super- visors in a company 80% scored above average on the How Supervise Test, while of fifteen men by-passed because Judged lacking in ability only 15% scored above average on that test. anderlic (18) found that among representatives of a personal- finance company, 86% of those employed a.year or more made above a cri- tical score on the Personnel Test, while only 35% of those who were dis- missed or left the company made above that score. However, many tests do not achieve adequate validity. Also, many tests fail to stand up under cross-validation. That is, the results achieved with the first sample are not verified in a comparable but in- dependent sample, using the same criterion. Super (1h) feels that ex- ternal evidence of validity is the only adequate basis for judging a test, that is, verification against a criterion. Validity immediately suggests the criterion problem. A question that Jenkins (8) asks is: "Validity for what?" The answer must be in terms of a good criterion. Clear and simple definitions of a good criterion are not plentiful. A criterion, in this context, might be described as a measureable or quantifiable standard of behavior in a given situation, or a measure of worker performance. A criterion may be simple, such as the number of pieces per hour by men on a certain type of machine, or it may be composite and multivariate. For example, a criterion might consist of weighted combination of output, quality of work, ratings, and possibly other factors. Thus, criteria may be objec- tive as records of production,.or subjective as ratings of adaptibility. To attempt a consensus of expressed requirements, a good criterion should be: reliable, relevant, related to other criteria, suitable to the job analysis, available, acceptable to management, modifiable in terms of changes in the situation, and quantifiable. Rush (10) in an elaborate factor analysis, concluded that the cri- terion of sales success is multidimensional rather than unitary, and hence the use of global measures of success or failure would seem unde- sireable, since this might obscure underlying relationships in valida- tion studies. He also concluded that the development of effective selec- tion devices may be facilitated by a knowledge of the component elements of job criteria. The findings of Taylor, Schneider, and Symonds (15) seem to disag- ree with the above conclusions. Their factor analysis of 13 graphic rating scales of salesmen.yielded only one clear factor. They concluded that basic salary constituted management's considered judgement of the ‘ value of the man to the organization, expressed on a dollar continuum. Using basic salary as their criterion, they found a cross validation coefficient of .h? for their form of forcedpchoice tetrads and rating h scales. The validity generalized to another group of salesmen in a dif- ferent division of the company, the correlation being .52. Super (1h, p.h8) uses the terms standardization and validation inter- changeably, "because the standardization of a vocational test implies col- lecting data which make possible validation.“ It seems apparent that most of the steps dealing with selection of tests and test construction have as their goal a test which is valid for its specific use. Super (1h) says that the minimum.correlation coefficient, or validi- ty coefficient, for psychological tests has been generally set at .hS for individual tests; but lower validity coefficients may be combined useful- ly in test batteries. Validity coefficients are not likely to exceed .70, according to Super, because of the unreliability of criteria. As an example, he cites the unreliability between supervisors' ratings, that is, lack of agreement between raters. His argument appears logically sound. But one might ask the question, what if the criterion were more reliable? In a highly competitive industrial situation, where incompetence cannot be tolerated, might not the supervisor's salary represent the carefully considered judgement of his value to the company, or even a relatively accurate extimate of his demonstrated value? would it not seem.logical to consider supervisors 3 selected group who reach and main- tain their position through special effort for special reward? Some evi- dence supports the assumption that motivation is generally higher in higher socio-economic levels as suggested by Barnett, Handelsman, Stewart and Super (1). Testing of supervisors offers Special problems. Lawshe (9) comments U1 that supervisory jobs vany tremendously. Gibb (6) states that there is no one-leadership type of personality. Cleeton and Mason (2) agree that there is no general executive type. Super (1h) points out that although a great deal of time and money is being spent on the application of psy- chological methods to the selection of executive personnel, little has been published on it in the psychological journals. He lists five cur- rent types of work in executive selection and evaluation: the develop- ment of custompbuilt batteries of tests such as the Cleeton-Mason Vbca- tional Aptitude Examination; the validation of standard tests for this particular purpose, as in the University of Minnesota's College of Busi- ness Administration project; the deveIOpment of single tests for execu- tive interests or other traits, best illustrated by Strong's (ll, 13) work with executives and public administrators; the clinical use of in— terviews and tests as commonly done by consulting psychologists and the use of clinically evaluated situation tests as developed by the British ‘War Officer Selection Boards and carried further by the U. S. Office of Strategic Services. In the field of executive selection, Thompson (16) found positive results with a battery of standard tests administered to 15 superior and 10 average executives of a firm of consulting management engineers. The tests included the anderlic Personnel Test, Michigan Vbcabulary Profile Test, Cardall Test of Practical Judgement, Kuder Preference Record, Adams-Leply Personal Audit, Beckman Revision of the Allport APS Reaction Study, GuilfordpMartin Personnel Inventory, and Rood I-E Test. The cri- terion consisted of performance records (not described) and ratings by partners (reliability not stated). Differences, at or above the 5% level, were found with the wonderlic, Michigan Vocabulary Profile, Kuder, and the Adams-Leply. All of the reported differences favored the superior execu- tives, except that on the Kuder Social Service Scale. The results desc- ribe the successful management engineer executive as superior to less suc- cessful partners in mental ability, interests, firmness, and stability, and inferior in interest in social service. No cross-validation study was reported, therefore these results must be considered highly tentative, especially with such a small sample. Harrell (7) reported on h2 overseers, in three different cotton mills, rated satisfactory'or unsatisfactory by their superiors. 'With a critical I.Q. of 100, on the Otis Self-Administering Test of Mental Ability, only 70% of the unsatisfactory, but 100% of the satisfactory achieved this I.Q. In view of the discussion of criteria, above, this study appears Open to criticism. ’ Lawshe (9) comments that there is little evidence of successful vali- dity studies in the executive brackets. He attributes this to the diffi- culty of setting up criterion groups at this level, and partially also to failure to develop adequate measuring instruments. Cleeton and.Mason (2) point out that, since successful executives generally score relatively high on a wide variety of ability tests, they would seem to be well roun- ded personalities. Lawshe (9) suggests mental ability tests, tempera- ment tests, interest tests, and, specifically, the Michigan Vbcabulary Profile Test as most promising in this area. Two important problems in testing generally seem.to be: 1. choice of test, or construction of a new one, and 2. validation of the test in a specific situation. For the development of a new vocational test, Super (1h) suggests seven major steps: job analysis, selection of traits to test, selection of criteria of success, item construction, standardization, validation, and cross-validation. He points out that one or more of these steps may be slighted or omitted in special circumstances. With reference to test construction, Super (1h) stresses the impor- tance of selecting a criterion early. He indicates that the criterion should be considered as soon as the characteristic to be tested has been isolated and selected on the basis of job analysis. This should also in- dicate the choice of the type of test to be constructed. Then should fol- low the problems of constructing apparatus and drawing or writing items, the first trial of the tentative form, further revision, collection of data on a larger group ofssubjects, analysis of the internal consistency, analysis of the scoring key, and another revision of the test.' These problems need not be gone into in detail here. However, it should be apparent that conventional test construction is a very complex and time-consuming task. All the problems mentioned above occur before data is collected for standardization and the establishing of norms. The test then must be validated in a specific situation, and cross-validated on another group not included in this first validation group, but using the same external criterion in both groups. This points up the complexity of test construction with conventional methods, and specifically the crucial function of item analysis, or validation of the items which make up the test. In conventional methods, test items are selected on the basis of values generated from.theory or inference from properties of the stimulus items, according to du Ivlas (h). The finished instrument often does not have sufficient validity for effective use, resulting in a great loss of time and effort. Research which is so expensive and time-consuming, and' which may turn out a total loss, is hard to justify to management. There is a need for a procedure that will consider the situation as a whole, a total dynamic field, including the personality. Such a test should consider evaluation of biographical data, personality factors, and test performance in tests ofzachievement and skills. The special need for this type of evaluation in industry exists in the selection of supervisors, as has been pointed out above. Until now, a scale composed of items of such a heterogeneous nature has received little attention; but this study addresses itself precisely to this point. Marv techniques have been suggested for item analysis in test or scale construction, according to du Mas (h). In most of these, each item is related individually to a variable-e-often the total score for all items in the scale---then those items having high correlations with the variable and low correlation with each other are selected, and a weight is often assigned each item on the basis of the itempvariable correlation. The correlational methods most often used are biserial, point biserial, tetrachoric and Phi. Du has (h) holds, further, that these techniques are Open to certain criticisms. The data seldom fit the assumptions upon which the various methods of item analysis are based. The methods most widely used in item analysis depend markedly upon agreement in difficulty of the items. The criteria of conventional item analysis for the retention of an item make it necessary to discard often perfect or near perfect scale items. Tests constructed by these methods practically always seriously violate scale concepts and/or criteria. Therefore he concludes that tests constructed from.item.analyses should not in general be regarded as scales, but rather as primitive, useful and probably necessary antecedents to more adequate instruments constructed by more rational methods. Mhnifest Structure Analysis is a new method of scaling introduced and described in detail by du Mas (h) for the purpose of extracting an ordered set of categories from a domain. The set of categories then can be utili- zed as a measuring instrument and not merely as a set of predictors. He defines Manifest Structure Analysis as the analysis of an ordered struc- ture which is operationally extracted from.an apparently chaotic domain by reference to the manifest relations existing between a set of categories and a continuum.of magnitudes, or criterion scale. It is different from all other methods of scaling in that values of the continuum are manifest and are not generated as an inference from the stimulus items. Because of this, objects, items, or events which exhibit no phenomenal similarity, relationship, or order may be scaled. Practically, manifest Structure Analysis utilizes an ordered crite- rion (e.g., income levels) as the ordinate, and categories (e.g., test UOHHGd'Ir-“flfl CHICO) HMUF—‘UIP' SEGMENTAL MODEL Categories 1 2 3 1; S...;1...m FIGURE I Association surface for the Segmental Model (After du Mas, h, p. 87) 9a 10 scores) along the abcissa of a coordinate. The plotting of test scores automatically scales them to the criterionsscale, yielding an ordered set of categories which indicate, or reflect, the magnitude, intensity, or degree of the criterion at any given point on the criterion scale. Scores which do not manifestly discriminate are drappede Thu: underlying notion is that categories may be differentially associated with some mani- fest variable in such a way as to form an ordered structure. (See Fig. l, p. 9a). Thus, an ordered criterion of income may yield an ordered struc- ture of scaled categories or items empirically obtained by Manifest Struc- ture Analysis. For the present study, data were made available by a client of an industrial psychological consultant of Michigan State College. The client, a manufacturing company of a highly specialized product, had accomplished a testing program of supervisors as part of a.broader personnel evaluation ' program, underthe direction of the consultant. The testing consisted of a battery of standard tests which had been found satisfactory in the pre- vious testing and a personal information form. The testing was done in the offices of a professional psychologist who evaluated the results. The tests were administered and scored by a competent psychometrician. The testing session required about six hours for each subject. 11 THE PROBLEM The problem of this study was to select a set of ordered and weighted items or categories from biographical data forms and standard test infor- mation by means of which we could predict a subject's potential value to the company. The general hypothesis we wished to test was: There is'a set of categories from.biographical and test data which forms an empirical § analogue of the Segmental Mbdel (See Fig l, p. 9a). 12 PROCEDURE Subjects Fifty-one supervisors or potential supervisors, all employees of the same manufacturing company, participated in a six hour testing program. Since criterion datawere available on only forty-two of the subjects, by reason of transfer or leaving the company, it was necessary to discard the data of nine subjects. 'Of the remaining forty-two, ten subjects were held out to serve as a cross-validation group. They were selected on the basis of range along the criterion scale, only; That is, the second from the top, the second from the bottom, and then eight other subjects, were selected so as to be representative of the major criterion intervals. The number of subjects for the original sample was then 32. The number of subjects forche cross-validation group was 10. ‘ Criterion The industrial psychological consultant, mentioned above, obtained from.the company the following information for possible use as criteria: a numerical job classification which gave a numerical value to the level of supervisory responsibility; annual income (coded); hourly income (coded); merit ratings for 1953 and 1951;; income change from 1950 to l9Sh; and job tenure. All names of supervisors were also represented by code numbers. Only the job classification and the coded hourly income appeared to reflect the various levels of supervisory function. 13 The coded hourly income, which excluded bonus and overtime pay, was selected as the best available criterion. This was operationally defined as a coded, dollars-per-hour, manifest scale of value of the supervisor's performance in this company. Apparatus The du Mas Scaling Frame held 87 removeable slats placed together so as to present a flat surface slantirg away from the vertical, at an angle convenient for placing thumbtacks into the slats while standirg in front of it. Its outside dimensions were about four and one-half feet in height and five feet in width. One hundred holes for thumbtacks were drilled down the length of each slat at one-half inch intervals. Since the slats were one-half inch wide, and held firmly together, the holes formed straight lines top-to-bottom, across, and diagonally. The effect might be visualized as a rectangular surface made up of (one-half inch squares or cells with a hole for a thumbtack in the center of each cell. There were 87 cells in each row across the frame, and 100 cells in each colum. Each slat, or column, could be removed and shifted to another position where it would again fit this pattern of cells. The criterion scale was attached to the left-hand margin of the scaling frame so that each individual, represented by a name code number and an hourly income code value, coincided with a horizontal row of cells. The criterion values were ordered from highest to lowest, with the highest at the top row of cells. 1b The slats, or columns, were numbered to represent categories, such as test score intervals. Thus, if an individual scored within this in- terval, the datum was entered into the appropriate cell where the indivi- dual's row and the category's colmm intersected. Basic Data The battery of standard tests consisted of Bernreuter's Personality Inventory; the Social Intelligence Test prepared.by Moss, Hunt, and Omwake; anderlic's Personnel Test; Bennett's Test of Mechanical Compre- hension, Form.AA; the Minnesota Clerical Test, by Andrew, Paterson, and Longstaff; How'Supervise?, by File and Remmers; the Eachigan Vbcabulary Profile Test, by Greene; The Kuder Preference Record; the‘washburne S-A Inventory (thaspic edition); the Study cf Values, by Allport, Vernon and Lindsey; The Guilford-Zimmerman Temperament Survey and the Thurstone Tempera- ment schedule were used as alternate tests and therefore could not be utilized statistically. Records were not complete on the Wide Range Vecabulary Test, and therefore it could not be used. The Personal History form.was constructed by Harry G. Yudin, profes- sional psychologist, in whose offices the testing was accomplished. The testing session was about six hours for each subject. Method of Analysis If the data were numerical values, they were divided into three intervals, each representing roughly one-third of the subjects. Thus, if the scores for all the subjects on one test ranged from 60 to 90, and 15 if they appeared to be fairly evenly distributed, the three categories for this test became O to 70, 71 to 80, and 81 up. Each of these catego- ries was numbered. If a subject's score placed him in the 71 to 80 cate- gory, a thumbtack was entered in the cell where his row and the category column intersected in the scaling frame. Biographical data, if not nu- merical, were divided into 'yes' and 'no' categories, on the assumption that a 'yes' response might be characteristic of top and bottom criterion ranges, and therefore not discriminate, while the 'no' response would, in this instance, discriminate the middle/triterion range. The biographical categories were also given category numbers. Thus, Marital Status might become the following numbered categories: Married, Widower, Separated, Single, Divorced. There was a total of 293 categories representing test scores, sub- test scores, percentiles, numerical biographical data (e.g., age), and non-numerical biographical data (e.g., birthplace). Since the du Mas scaling frame held only 87 slats at a time, it was necessary to use extra slats, filling and evaluating 87 categories at a time.' All the data were I entered into their appropriate cells. " Certain categories were seen, when inspected individually, to discri-i minate some portion of the upper, lower, or middle range of the criterion scale. These categories were moved to the left side of the scaling frame by simply lifting the slat out of the frame and putting it back into the frame on the left side, after sliding the other slats to the right. Categories were rejected to the right side of the scaling frame if they were: multimodal, gappy, associated with a large part of the range, or Hz \Omflme’WNE-J 0 F15 12. [.4 w | 1h. 15. 16. 17. 18. 19. 20. 21. 22. 23. 2h. 25. 26. 27. 28. 29. 30. 31. 32. 33. 3h. 35. 36. 37. 38. 39. Categ. Ifo. 161 220 at 172 178 31 57 166 153 5 120 bl 286 Highest pay ever received. TABLE I THE CATESCALE Categories Job at time of test. Organizations belong to. Siblings. Number of previous jobs. Minn. Clerical Test, Numbers 212 - up 0 - 15 Soc. Int. Test, Recog. Mental State. Soc. Int. Test, Humor. washburn Social Adjust., Wishes, first 3. Minn Clerical, Number of books or other important last month.O-l Minn. Clerical, Names score. Personnel Test. fiile * Education 8th or below. Bennett Mech. Comprehension, Foremen Sile 67—up Kuder Pref. Record, Clerical Personnel Test. Weight Mich. Vocabulary Prof. , Human Rel. Mich. Vocabulary Prof., Commercial Bernreuter, I. Q. Extrovert. $151 per week — up Supervision 1 None 0 - 1 ll - 16 Numbers zile. 86 - up 0 - 110 o - 3h 0 - ho below'llS Bernreuter, Emotional Stability o - 3h Social Intelig. Test, Judgement. O - 20 How Supervise?, Shop Practice Sc. 0 - 12 How Supervise7, Company Policy. 0 - 12 186 - 200 Social Intelligence Test, Total PsyChologist's evaluation (rated by author) low—1 Minn. Clerical, Names. Psych. eval. (rated on 7 point scale) average- h *- wrong. Soc. Intel. Test., College Zile. Personnel Test, Highest pay ever received. Siblings. Two Social Intelligence Test, Nashburn S-A Inv., one of first 3 wishes Social; Jbb at time of test. Number right. Humor. washburn S-A Inv. Score: "t" Number of children Scale of Values, Social score 0 - 1h 0 - 12 0 - 3h 0 - 90 5 - up 0 - 3h 0 - 12 O - 99 per wk 0 - 10 Non-supervisory 6 - 10 1 31 - 35 * Catescale is divided into upper, middle, and lower thirds. Job 16 Scale Value h060.0 2888.0 2380.0 2200.0 2151.2 1982.0 .1980.0 1957.5 1956.7 1932.7 1916.6 18h1.o 1833.0 1811.7 1802.7 1799.6 1791.6 1780.0 176h.6 176h.3 '176h.0 1761.1 1759.0 1753.7 1751.8 17h2.5 1738.9 1735.0 1723.3 1700.0 1682.6 1665.7 1660.9 1627.5 1622.2 1618.6 1600.0 1562.8 Fig. 2 . Segmental CateseQ/e- \) GHQ/"(U Samp/e # M./, 2 3 4 5 3* - 6 7 9 9/0 // 12/314 ., 7 seem j g g /5 I4 /7 )3, I I.“ VZIUQs. ‘5 ‘9"? 66‘“ M y» “$14009 “51 3'? g. 91:34! ‘33 97‘ :3: ”Q 2;” 2” 15 26 27 22 2730 3/ 32 3336135 34. 375939 N “a“ " "" w 1‘ 7° 5 L“) w? \ \z‘kfim \ 0 fi ‘1‘ é 0”" 1‘" ~"" “1’00“ 01‘ 6°18“ 3536‘s a“) as 5 st $9.6 a ‘3‘" : I P N026 r \ 6 ‘ ‘9 ’5‘ ‘4 \" ‘6 'C' 9." {‘9 \O\‘ 93* a» '15 0"“ 4" 6% w“ '5’ 1" 07“ \‘ «9 ~‘ 1r \ \x t Y 1 8: 2“ IL“- 2 lb 379.7 1 ‘ I' ‘ ‘ 7 V '9 999‘ ‘\ "‘06; A“ ”“8" 3'1: 4 a a ' 3 e 333 ’ ' 7 , 7 ~ * 347 I 61 32 2770 1 7‘ Km“ 7 1.1770 5 I Lao ‘ >< x x x x x x , L" X ‘ 1:44. . 6 .5 1.440 x x i x i x * ”105,9 ; 7 25 2.070 X X x x g x x ‘ 3’3'4 E 5' 39 2.074 x . ‘ x X X x x x x x x " - “ . 7997b 1 9 43 I 0 X l‘ X at x x X X X K 7‘ 7W‘ E3.1/3.6 a ’3” *§:W§“ “1".“11 . x x X 106’ 27 llégixxxxng ;*¥:xx *xxx ‘;XXXXX$ xxx*x§ ”34:; 35 . X a 9 ~ 1193 9 {73¢ 22:" ‘xK‘WxxxxxX’ 1*1,x " ‘i In” 5’ 1740 i X X ‘l X 1. x 1 X X ‘7” X X. X i y. X ,1 ’83 I7 @740 >1 x x 7 a x Y " “ 7" >< * x x x 1"; WWW 4/ 1. a x x “ " * “*6“ x x x. it 77734 47 [60 x ,7 3) ’fi * * ‘x XXX XXxi ,3; , :177319 K? [the x x W“ 3‘ X X X x at x x " “I 777.47 7 15,0 1‘ x X X ‘ ’L X' )( X I 3'9.“ 46 IJ’D X 1 1 X x ., x X X X X X X 1 x X i x / 72/.0 3‘ [50* xx): ‘x xw X ‘3‘ XKXKX Xixx ' ”a” X x< wt x . * 20 [5 D X x x x i x X X “ ’7753 M [5’07( 1: x x X X X x X i x ' , ’72-?) 49 IJIoinix x x X xx.( 176. 27 1.490 x x x X x x x 17:4 {I W. [430 X x 1,701.7 ’3 “Hon! )1 x x " 7‘ J'D Marx ,7 xxx‘ : *fi‘x xxxxx r, o . .41 -1 ox * x x x x >< x x ’7 g X “ ‘1 X x [I . , . Crass-Velldeb/on Semi»); 3 a. n ,y .5 ~ g ' - ' A > - ‘ . . u a i» I, q, a? t 9- '5 a?! \. a Q v ‘\ x w; . .. .~ «M 1.. m s 1W ; ”5045 R «my. genre 4). ~07 oqoss \‘ . \‘ \ 1 £ 4"; 400° 1‘: \ g a;\\ ’\ 3 a; 1; {\ ff ,0; Iv“ ((3 v \3"\O°\¢¢\'b 09‘ \\ 7“) 0} \3’ y \‘2‘ \5 f 3330 L X ' . x x {31 24 40 '2‘ x I X t + x K . ’x . ,4- 48 .470 t 2 7 l“ i H xxx 15“.. #5 34’ [810$ 1! iii}. “my ix 24" x X *’ 117' ZZ’ ’1‘égi-iukyi1x at x 1. *“"‘ .xxx’x 8x ; . 1 ‘ 1 7» 1» X g 30.-..../57027H$t$w<’ I *ui. ’9 1 " ‘ I _. ,3 [.490 4‘ 2+“ 4“) ”$7 * t. 1 1. x x - . 3 as; 1. i 1. v ~I~ a x* 7 7* x t x1 1 l 7, ~11"— rs.» ‘ ,- u._ _~ ,- ‘ v~ .tc‘C—Qh up]: 3/ a: 13 ”(:5 1.373739 2513 1/3‘ 26272917” . w. .. .w.m.o..u.uvw ”m4 v.4.” aflmmflmb. 3.-. 1 : 10' IV 01 11 . 2n-9nn~:wnnsvnnnw7n3n:2,7m a z , .. 3 I2/7—I’I’.”Ill/I/l/b'llo/I’oIII rL I ”III In”! — ii «at «sf } } T4,” * X‘Kfl‘d‘vfi x 1"} Q“ A . x i h. 1 IN! 1‘. l. X an X an an an «at \\ Xx x X X a. .\ 9\ ,f «1. XXX... 1. i 4. X J... M.“ i K XXX it . X 1!; . ‘ afih' i» X"’ up w. K K .vv x XXX X X X XXX X 8).» xx x «.4 xx xxxx X.) x x 1 3.3.9 I X X X! XX X x X wfifi... 43333.. «a m a“ (x .x $91.? XI} 45 Y «5% «Avanx In ,3 «X xix XxxxxxXXXxxflxxlxx x x P .Nsov\ XX «515‘ 1 XX an X X X m not sufficiently associated with individuals in the sample, in accordance with the du Mas (h) specifications. Thirty-nine categories were thus selected. (See Table I, p. 16). The scale value for each category was calculated by means of the du Mas (h) formula: Sum R V‘-------, N where: V: category scale value Sum R: criterion score (income) of all individuals associated with a a particular category ' N: number of R values associatedIJith the category. The selected categories were then ordered with regard to the magnitude of their scale values, in accordance with the du Mas (h) instructions. (See Fig. 2, p. 17) These scale values constituted a catescale. (See Table I) A.score was then calculated for each individual, by means of the du Has (’4) formula: Sum.V n S- 9 where: S: score from the catescale. . Surxv: sum of catescale values of all categories with which an individual is associated. n: number of categories with which an individual is associated. This operation attempted to predict the criterion score from.the catescale extracted from.the data. The product moment correlation was calculated between the criterion distribution R and the predicted score distribution S. 19 One supervisor did not appear in any category, and was therefore not scored. The proportion D of the sample for whom a score is determinate was therefore calculated; by the du Mas formula: ”3 R V. where NR . the number of individuals in the sample that have an R value; and NS : the number of individuals in the sample for which a score is determinate. ’ Cross-validation was done by applying the scale values, or weights, of the catescale constructed from the first sample, to the cross-valida- tion sample of 10 subjects. A product moment correlation was calculated between the distribution of criterion values in this sample and the pre- dicted score distribution. 20 RESULTS AND DISCUSSION A Catescale (categories possessing scale values) consisting of 39 categories, was operationally extracted from.a total of 293 categories of biographical and standard test data. (See Table I, p. 16). The Gatescale values were used to predict the original criterion scale values of the original sample of 32 supervisors. The validity coefficient was r = .95 . The same Catescale values were used to predict the criterion scale values of-a cross-validation group of 10 supervisors. The correlation between the predicted criterion values and the original criterion values of the cross-validation group resulted in a coefficient of r = .80 . The general hypothesis: There is a set of categories from.biographi- cal and standard test data which forms an empirical analogue of the Segmen- tal Mbdel (Fig. l, p. 93). This was clearly supported. Reference to Figure 2, page 17, will reveal the Criterion, R, values in the third column from the left. ' Also, in the extreme right hand column, the Predicted Scores, S, will be seen. It will also be noted that each individual is represented.by a name code number_in the column next to his criterion value column, and that the N of the original sample is 32. The Criterion, R, column is the original criterion.scale,ordered from highest to lowest, of the original sample. The Predicted Scores, 5, column contains the predicted criterion scores, or values, calculated from.the Catescale values in the row along the top of Figure 2. A predic- ted score is the mean of the catescale values represented.by:x's in the individual's row. 21 Considering the Original Sample, upper part of Figure 2, the validity coefficient has been expressed as the correlation between the distribution of the Criterion, R, values and the distribution of predicted criterion, 3, values. As stated above, the validity coefficient was: r I .95 . Considering the crOSdealidation sample, the lower part of Figure 2, the same column headings will be seen. Also, it will be noted that the Catescale values are identical with those in the original sample above. The N of this group is 10. The name code numbers reveal that these supervisors are’not included in the original sample. The Catescale found for the first sample was applied.to the cross-validation sample, and the predicted cri- terion values, 3, calculated in the same way, by taking the mean of the catescale values represented by x's in the rows of the individuals. Corre- lation between the criterion, R, distribution of the cross-validation group and its distribution of predicted criterion scores, 8, yielded: r a .80 . The Catescale of 39 categories, with category'numhers and scale values, is presented separately in Table I, page 16. The category numbers and scale values may also be identified in the second and third rows of Figure 2,(p.l7). In Table l, (p.16), the/content of the categories of the Catescale are shown. It may be recalled that the categories were made, and.numbered individually, by dividing numerical data (e.g., scores on a test) into three intervals . and'biographical data into appropriate categories (e.g., married, single). The 39 categories of Table I, with their scale values, represent the Catescale operationally extracted from.293 categories as¢iescribed.under Method of Analysis beginnign on page lb of this thesis. It may be noted in Table I, page 16, that the Catescale is divided into thirds. \‘ 22 Apparently, it should be possible to read a general description of the upper third of the discriminating supervisory qualities of our catescale. waever, all of the supervisors in the upper third of our sample are not uniformly associated with the upper third of the categories. H w these categories interact, what effect they may'haye on each other when associa- ted together, or what combinations produce what results, requires a type of analysis or speculation beyond the scope of this study. The categories themselves are of interest. Some were quite surprising and unexpected. Some might have been expected in the light of conventional theory. However, these categories were operationally and objectively extracted from the chaotic domain of all the data in conformance with the principles of manifest structure analysis, as presented by du Has (h). The fact remains that these categories, or combinations of categories, appear to discriminate with high validation and cross-validation the various levels of the criterion scale. Several modifications of procedure suggest themselves which might yield a special type of catescale for a specific purpose. Fbr example, an intensive study might be made of a specific area, or of one test, or of one kind of test, for which du Has (’4) had described an intensive mdd. In this study, items were selected for highest possible discrimination. Item selection could be somewhat more liberal in specific areas, or in s11 areas, so as to include a more complete description from the catescale, even though this would increase the variance and therefore decrease the validity somewhat. Dichotomizing all data, instead of dividing them.into three intervals was another possibility in dealing with categories. VL‘L‘. 23 It would be possible to make several interesting a posteriori inter- pretations of these categories. This suggests itself as a potentially rich source of new ideas and psychological insights. This however, was not the purpose of this study. The speed with which new catescales can be selected, and weights calculated, seems to offer possibilities for succes— .sive approximations to a best ordered structure, as du Has (h) suggests. Also it would also appear quite possible to substitute different criteria in the same set of data. Since this was the first empirical study of supervisors with manifest Structure Analysis, the apparent utility of this method has by no means been fully explored. This is especially true in.view of the scarcity of published evaluative results, and the even greater scarcity of positive findings, in the field of supervision, as pointed out in the introduction - - of this paper. Further research and greater experience with this method should reveal the most fruitful areas of application and the specific uses in which results would be most definitive. 2h SUMMARY AND CONCLUSIONS The problem of this study was to select a set of ordered and weighted categories or items from biographical forms and standard test information, by means of which a subject's potential value to a company might be predic- ted. A group of 32 supervisors or potential supervisors participated in a six hour testing program which included a personal information form and a battery of standard tests. A criterion scale of coded hourly income was obtained and defined as a coded, dollars-per-hour, manifest scale of value of the supervisor's performance in this company. A Catescale (literally, scale-weighted categories) was extracted from the biographical and test data. arranged on a criterion contimlum of coded hourly income. (See Fig. 2, p. 17). The validity of the Catescale in predicting the original criterion values was: r = .95 (See Table I, p.16). A cross-validation group~of 10 supervisors, selected only for range along the criterion continuum, was scored with ‘ the catescale values found on the original sample. Cross-validation correlated: r = .80 (See Fig. 2, p.17). Both validity and cross-validation coefficients were well beyond the one percent level of confidence. The general hypothesis was: There is a set of categories from biogra- phical and standard test data which forms an empirical analogue of the Segmental Model. The hypothesis was clearly supported. The Segmental Model (Fig. 1, p- 98.) is one of several models presented by du Has (1:) for scale construction in Manifest Structure Analysis. It is a mathematical model representing a perfect correlation. It is represented as a scatter diagram with a criterion scale at the ordinate, the left side, and catego- ries along the abcissa, at the top. The criterion scale is ordered with the highest value at the top. The model would represent a Pearson r of 1. 'The important difference between conventional item analysis and mani- fest structure analysis is that categories (columns) are selected to meet the assumptibns of Pearson r computation, by inspection of each column (category with individuals plotted) separately, ordering the columns in terms of mean value, and computing Pearson r between the original criterion distribution and a predicted distribution. (See Fig. 2, p. 17). Category values, or catescale weights, are the meals of the criterion scale values represented in the columns. Predicted criterion values are the means the rows in terms of the column values. Data are represented in the scatter by x's. Justification for the procedure is cross-validation. Practical conclusions: A new technique has been developed which in one case has extracted a catescale of 39 categories from 293 categories ' of (biographical and test data, with extremely high validation and cross- validation. Practical advantages of this technique are: simplicity, speed, and low cost. Highly trained personnel would not be required in a situa- tion where scoring tables and graphs could.be set up. Flexibility and extremly wide applicability, "wherever rating scales or tests are appli- cable"were indicated in agreement with du.!hs (3, p. 117). Analysis of case histories, interview forms, and other multivariate datapgathering instruments of psychosocial phenomena appears to be uniquely within the scope of this technique. a? 1‘" _ . ‘Qt‘; :x , 12. 13. 1h. 15. BIBLIOGRAPHY Barnett, G. J., Handelsman, 1., Stewart, L. H., and Super, D. E. The occupational level scale as a measure of drive. Psych. Monograph}: 1952, 3h2. Cleeton, G. U. and Mason, C. W. Executive ability. The Antioch Press, Yellow Springs, Ohio. 19’46. du Mas, F. Y. Continua, catemensions, catescales. J. Clin. Psychol., 73 1951. 112-117- du Mas, F. M. Manifest structure analysis. Unpublished treatise. File, Q. W. and Remmers, H. H. Studies in supervisory evaluation. :_J__ Gibb, C. A. The principles and traits of leadership. J. Abn. and Soc. Psychol. 19117, 272. Harrell, W. Testing cotton mill supervisors. J. Appl. Pachol. 19110, 214, 31-35. Jenkins, J. G. Validity for what? J. Consult. Psychol. 19146, 10, 93-98 Lawsche, C. H., Jr. Erincil les of personnel testing; First Edition. McGraw Hill, Inc. N.Y. 19148. Rush, C. H., Jr. A factorial study of sales criteria. Pers. Psychol. 6. 1953. Strong, E. K., Jr. Interests of senior and junior public administra— tors. J. AIQI. Psychol. l9h6, 30, 55-71. Strong, E. K., Jr. Vocational. interests of men and women. Stanford Univ. Press, Palo Alto. 1%3. Strong, E. K., Jr. Vocational guidance of executives. J. Appl. Pg- Ch°1°: 1927: 11: 331'3h7 Super, D. E. A raisi vocational fitness- by means of psychological tests. Harper Eros., N.§., 1955. Taylor, E. K., Schneider, D. E., and Symonds, N. A. A short forced- choice evaluation form for salesmen. Pers. Psychol., 6, 1951;. BIBLIOGRAPHY 16. Thompson, C. E. Selecting executives by psychological tests. Educ. P33761101. Measmto 19117, 7, 773-7780 17. wadsworth, G. W. Tests prove their worth in a utility. Pers. J., 1935, it, 183-187. 18. anderlic, E. F., and Hovland, C. J. The personnel test; a restandar- dized abridgement of the Otis S-A test for business and industrial use. J. Appl. Psychol;, 1939, 23, 685-702. I‘dflfu .1 '.‘ _> .— u-uma‘ (its: can s. '- “A \ 888M USE It‘d! Yum 16‘.:‘ . innit? '55 ”TI" 'Y’ ‘Ifi , I - I ; qu-th‘T 3.3.133: .11 . . . l MICHIGAN STATE UNIVERSITY LIBRARIES 0 1 1 375825 3 1293 A I. .- l- - ' J