MI

l

”W

W

l

|

l

‘

WI

1

MM

\

W

\

 

MAMFEST SYRUCTURE Af‘e‘ALYSES
0F SGPERWSQR‘Y' TESTENG

Thesis fa: fho Degree o§ M. A.
MKHEGAﬂ STATE CCLLEGE
Céayfon H, Rashteigh

1954

Thu-.1515

This is to certify that the

thesis entitled

IxL'NIFEST STRUCTURE ANALYSIS OF SUPERVISORY
TESTING
presented by

Clayton H. Rashlei gh

has been accepted towards fulfillment
of the requirements for

LLA.
degree in PsychOIOgy

Major professor

 

hi 2 8 l 954
Date ay ’

 

0-169

 

 

MANIFEST STRUCTURE ANALYSIS OF
SUPERVISOHY TESTING

Clayton H. Rashleigh

A Thesis

Submitted to the School of Graduate Studies of Michigan
State College of Agriculture and Applied Science
in partial fulfullment of the requirements
for the degree of

RMSTER OF ARTS
Department of Psychology

Year l95h

ACKNOWLEDGIENTS

The author wishes to express his sincere thanks to Dr. Frank M.
du Mas for generously making available his knowledge, energy, and
pioneering techniques of analysis.

He is also greatly indebted to Dr. Carl F. Frost for much concrete
help, unfailing interest, and encouragement.

Grateful acknowledgement is also due to Dr. G. M. Gilbert for

patience and cooperation in many ways.

331306

INTRODUCTION . . . . .
THE PROBLEM . . . . .
PROCEDURE . . . . . .
. Subjects . . . .
Criteria . . . .
Apparatus . . . .
Basic Data . . .
Method of Analysis
RESULTS AND DISCUSSION
SUMMARY AND CONCLUSIONS

BIBLIOGRAPHY . . . . .

TABLE

OF CONTENTS

Page

11
12
12
12
13
1h
1h

20

E’-

LIST OF FIGURES AND TABLES

Page

FIGURE 1 O O O O I O O O O O O O O O O O O O C O O O O O O O O 0 9a
FIGEIRE 2 O O O O O O O O 0 O O O O O O O O O O O O O 0 O O O O O 17

T .t‘iBLE I 0 O O O O O O O O O 0 O O O O Q 0 O O 0 O O O O O O O O O 16

INTRODUCTION

The purpose of the present study is to investigate a new technique
of item validation and scale construction recently formulated by du Mas
(h). The need for valid tests in industry and business, as well as in
the armed services, has been emphasized by Super (1h) and Lawshe (9).

Testing of Personnel for selection, placement, or promotion in in-
dustry has become an increasingly important development in psychology in
recent years. Super (1h) calls testing big husiness, stating that one
million Americans took sixty million tests in one year. Lawshe (9) com-
ments that recent war years clearly demonstrated the effectiveness of
personnel tests both in industry and in the military services.

Lawshe (9) sees a need for tailor-made tests in all areas, and re-
marks on a growing tendency toward selective scoring of commercially
available standard tests for a specific situation. The important ques-
tion, he asks, is whether or not the test helps to identify the persons
who are apt to be most successful on this particular job. In this con-
nection, he states: "Whether it is a matter of test construction or the
selection of significant items in commercially available tests, the prob-
lem of item validation is one and the same." (9, p. 17),

The usefulness of valid testing instruments has been illustrated by
many studies. 'Wadsworth (17) has shown that test-selected employees

proved satisfactory more often than non-test selected employees and

produced a smaller percent of problem employees.

Strong (12) showed that 56% of insurance salesmen scoring A on an
interest test had individual sales totals of $150,000 while onlv 6% of
salesmen scoring C on the test achieved that figure.

In the Army Aviation Testing Program, four percent of cadets with
stanine score 9 were eliminated from.primary flying school while seventy-
seven percent of the cadets with stanine score 1 were eliminated.

File and Remmers (5) found that of forty-six men selected as super-
visors in a company 80% scored above average on the How Supervise Test,
while of fifteen men by-passed because Judged lacking in ability only
15% scored above average on that test.

anderlic (18) found that among representatives of a personal-
finance company, 86% of those employed a.year or more made above a cri-
tical score on the Personnel Test, while only 35% of those who were dis-
missed or left the company made above that score.

However, many tests do not achieve adequate validity. Also, many
tests fail to stand up under cross-validation. That is, the results
achieved with the first sample are not verified in a comparable but in-
dependent sample, using the same criterion. Super (1h) feels that ex-
ternal evidence of validity is the only adequate basis for judging a
test, that is, verification against a criterion.

Validity immediately suggests the criterion problem. A question
that Jenkins (8) asks is: "Validity for what?" The answer must be in
terms of a good criterion. Clear and simple definitions of a good

criterion are not plentiful. A criterion, in this context, might be

described as a measureable or quantifiable standard of behavior in a
given situation, or a measure of worker performance. A criterion may
be simple, such as the number of pieces per hour by men on a certain
type of machine, or it may be composite and multivariate. For example,
a criterion might consist of weighted combination of output, quality of
work, ratings, and possibly other factors. Thus, criteria may be objec-
tive as records of production,.or subjective as ratings of adaptibility.
To attempt a consensus of expressed requirements, a good criterion
should be: reliable, relevant, related to other criteria, suitable to
the job analysis, available, acceptable to management, modifiable in
terms of changes in the situation, and quantifiable.

Rush (10) in an elaborate factor analysis, concluded that the cri-
terion of sales success is multidimensional rather than unitary, and
hence the use of global measures of success or failure would seem unde-
sireable, since this might obscure underlying relationships in valida-
tion studies. He also concluded that the development of effective selec-
tion devices may be facilitated by a knowledge of the component elements
of job criteria.

The findings of Taylor, Schneider, and Symonds (15) seem to disag-
ree with the above conclusions. Their factor analysis of 13 graphic
rating scales of salesmen.yielded only one clear factor. They concluded
that basic salary constituted management's considered judgement of the ‘
value of the man to the organization, expressed on a dollar continuum.
Using basic salary as their criterion, they found a cross validation

coefficient of .h? for their form of forcedpchoice tetrads and rating

h

scales. The validity generalized to another group of salesmen in a dif-
ferent division of the company, the correlation being .52.

Super (1h, p.h8) uses the terms standardization and validation inter-
changeably, "because the standardization of a vocational test implies col-
lecting data which make possible validation.“ It seems apparent that most
of the steps dealing with selection of tests and test construction have as
their goal a test which is valid for its specific use.

Super (1h) says that the minimum.correlation coefficient, or validi-
ty coefficient, for psychological tests has been generally set at .hS for
individual tests; but lower validity coefficients may be combined useful-
ly in test batteries. Validity coefficients are not likely to exceed
.70, according to Super, because of the unreliability of criteria. As an
example, he cites the unreliability between supervisors' ratings, that is,
lack of agreement between raters. His argument appears logically sound.
But one might ask the question, what if the criterion were more reliable?

In a highly competitive industrial situation, where incompetence
cannot be tolerated, might not the supervisor's salary represent the
carefully considered judgement of his value to the company, or even a
relatively accurate extimate of his demonstrated value? would it not
seem.logical to consider supervisors 3 selected group who reach and main-
tain their position through special effort for special reward? Some evi-
dence supports the assumption that motivation is generally higher in
higher socio-economic levels as suggested by Barnett, Handelsman, Stewart
and Super (1).

Testing of supervisors offers Special problems. Lawshe (9) comments

U1

that supervisory jobs vany tremendously. Gibb (6) states that there is
no one-leadership type of personality. Cleeton and Mason (2) agree that
there is no general executive type. Super (1h) points out that although
a great deal of time and money is being spent on the application of psy-
chological methods to the selection of executive personnel, little has
been published on it in the psychological journals. He lists five cur-
rent types of work in executive selection and evaluation: the develop-
ment of custompbuilt batteries of tests such as the Cleeton-Mason Vbca-
tional Aptitude Examination; the validation of standard tests for this
particular purpose, as in the University of Minnesota's College of Busi-
ness Administration project; the deveIOpment of single tests for execu-
tive interests or other traits, best illustrated by Strong's (ll, 13)
work with executives and public administrators; the clinical use of in—
terviews and tests as commonly done by consulting psychologists and the
use of clinically evaluated situation tests as developed by the British
‘War Officer Selection Boards and carried further by the U. S. Office of
Strategic Services.

In the field of executive selection, Thompson (16) found positive
results with a battery of standard tests administered to 15 superior and
10 average executives of a firm of consulting management engineers. The
tests included the anderlic Personnel Test, Michigan Vbcabulary Profile
Test, Cardall Test of Practical Judgement, Kuder Preference Record,
Adams-Leply Personal Audit, Beckman Revision of the Allport APS Reaction
Study, GuilfordpMartin Personnel Inventory, and Rood I-E Test. The cri-

terion consisted of performance records (not described) and ratings by

partners (reliability not stated). Differences, at or above the 5% level,
were found with the wonderlic, Michigan Vocabulary Profile, Kuder, and the
Adams-Leply. All of the reported differences favored the superior execu-
tives, except that on the Kuder Social Service Scale. The results desc-
ribe the successful management engineer executive as superior to less suc-
cessful partners in mental ability, interests, firmness, and stability,
and inferior in interest in social service. No cross-validation study was
reported, therefore these results must be considered highly tentative,
especially with such a small sample.

Harrell (7) reported on h2 overseers, in three different cotton mills,
rated satisfactory'or unsatisfactory by their superiors. 'With a critical
I.Q. of 100, on the Otis Self-Administering Test of Mental Ability, only
70% of the unsatisfactory, but 100% of the satisfactory achieved this I.Q.
In view of the discussion of criteria, above, this study appears Open to
criticism. ’

Lawshe (9) comments that there is little evidence of successful vali-
dity studies in the executive brackets. He attributes this to the diffi-
culty of setting up criterion groups at this level, and partially also to
failure to develop adequate measuring instruments. Cleeton and.Mason (2)
point out that, since successful executives generally score relatively
high on a wide variety of ability tests, they would seem to be well roun-
ded personalities. Lawshe (9) suggests mental ability tests, tempera-
ment tests, interest tests, and, specifically, the Michigan Vbcabulary
Profile Test as most promising in this area.

Two important problems in testing generally seem.to be: 1. choice of

test, or construction of a new one, and 2. validation of the test in a
specific situation. For the development of a new vocational test, Super
(1h) suggests seven major steps: job analysis, selection of traits to
test, selection of criteria of success, item construction, standardization,
validation, and cross-validation. He points out that one or more of these
steps may be slighted or omitted in special circumstances.

With reference to test construction, Super (1h) stresses the impor-
tance of selecting a criterion early. He indicates that the criterion
should be considered as soon as the characteristic to be tested has been
isolated and selected on the basis of job analysis. This should also in-
dicate the choice of the type of test to be constructed. Then should fol-
low the problems of constructing apparatus and drawing or writing items,
the first trial of the tentative form, further revision, collection of
data on a larger group ofssubjects, analysis of the internal consistency,
analysis of the scoring key, and another revision of the test.'

These problems need not be gone into in detail here. However, it
should be apparent that conventional test construction is a very complex
and time-consuming task. All the problems mentioned above occur before
data is collected for standardization and the establishing of norms. The
test then must be validated in a specific situation, and cross-validated
on another group not included in this first validation group, but using
the same external criterion in both groups. This points up the complexity

of test construction with conventional methods, and specifically the

crucial function of item analysis, or validation of the items which make
up the test.

In conventional methods, test items are selected on the basis of
values generated from.theory or inference from properties of the stimulus
items, according to du Ivlas (h). The finished instrument often does not
have sufficient validity for effective use, resulting in a great loss of
time and effort. Research which is so expensive and time-consuming, and'
which may turn out a total loss, is hard to justify to management.

There is a need for a procedure that will consider the situation as
a whole, a total dynamic field, including the personality. Such a test
should consider evaluation of biographical data, personality factors, and
test performance in tests ofzachievement and skills. The special need for
this type of evaluation in industry exists in the selection of supervisors,
as has been pointed out above. Until now, a scale composed of items of
such a heterogeneous nature has received little attention; but this study
addresses itself precisely to this point.

Marv techniques have been suggested for item analysis in test or
scale construction, according to du Mas (h). In most of these, each item
is related individually to a variable-e-often the total score for all
items in the scale---then those items having high correlations with the
variable and low correlation with each other are selected, and a weight is
often assigned each item on the basis of the itempvariable correlation.
The correlational methods most often used are biserial, point biserial,

tetrachoric and Phi.

Du has (h) holds, further, that these techniques are Open to certain
criticisms. The data seldom fit the assumptions upon which the various
methods of item analysis are based. The methods most widely used in item
analysis depend markedly upon agreement in difficulty of the items. The
criteria of conventional item analysis for the retention of an item make
it necessary to discard often perfect or near perfect scale items. Tests
constructed by these methods practically always seriously violate scale
concepts and/or criteria. Therefore he concludes that tests constructed
from.item.analyses should not in general be regarded as scales, but rather
as primitive, useful and probably necessary antecedents to more adequate
instruments constructed by more rational methods.

Mhnifest Structure Analysis is a new method of scaling introduced and
described in detail by du Mas (h) for the purpose of extracting an ordered
set of categories from a domain. The set of categories then can be utili-
zed as a measuring instrument and not merely as a set of predictors. He
defines Manifest Structure Analysis as the analysis of an ordered struc-
ture which is operationally extracted from.an apparently chaotic domain by
reference to the manifest relations existing between a set of categories
and a continuum.of magnitudes, or criterion scale. It is different from
all other methods of scaling in that values of the continuum are manifest
and are not generated as an inference from the stimulus items. Because of
this, objects, items, or events which exhibit no phenomenal similarity,
relationship, or order may be scaled.

Practically, manifest Structure Analysis utilizes an ordered crite-

rion (e.g., income levels) as the ordinate, and categories (e.g., test

UOHHGd'Ir-“ﬂﬂ

CHICO)

HMUF—‘UIP'

SEGMENTAL MODEL

Categories
1 2 3 1; S...;1...m

 

 

 

 

FIGURE I
Association surface for the
Segmental Model

(After du Mas, h, p. 87)

9a

10

scores) along the abcissa of a coordinate. The plotting of test scores
automatically scales them to the criterionsscale, yielding an ordered set
of categories which indicate, or reflect, the magnitude, intensity, or
degree of the criterion at any given point on the criterion scale. Scores
which do not manifestly discriminate are drappede Thu: underlying
notion is that categories may be differentially associated with some mani-
fest variable in such a way as to form an ordered structure. (See Fig. l,
p. 9a). Thus, an ordered criterion of income may yield an ordered struc-
ture of scaled categories or items empirically obtained by Manifest Struc-
ture Analysis.

For the present study, data were made available by a client of an
industrial psychological consultant of Michigan State College. The client,
a manufacturing company of a highly specialized product, had accomplished
a testing program of supervisors as part of a.broader personnel evaluation '
program, underthe direction of the consultant. The testing consisted of
a battery of standard tests which had been found satisfactory in the pre-
vious testing and a personal information form. The testing was done in
the offices of a professional psychologist who evaluated the results. The
tests were administered and scored by a competent psychometrician. The

testing session required about six hours for each subject.

11

THE PROBLEM

The problem of this study was to select a set of ordered and weighted
items or categories from biographical data forms and standard test infor-
mation by means of which we could predict a subject's potential value to
the company. The general hypothesis we wished to test was: There is'a
set of categories from.biographical and test data which forms an empirical

§

analogue of the Segmental Mbdel (See Fig l, p. 9a).

12

PROCEDURE

Subjects

Fifty-one supervisors or potential supervisors, all employees of the
same manufacturing company, participated in a six hour testing program.
Since criterion datawere available on only forty-two of the subjects, by
reason of transfer or leaving the company, it was necessary to discard
the data of nine subjects. 'Of the remaining forty-two, ten subjects were
held out to serve as a cross-validation group. They were selected on the
basis of range along the criterion scale, only; That is, the second from
the top, the second from the bottom, and then eight other subjects, were
selected so as to be representative of the major criterion intervals.
The number of subjects for the original sample was then 32. The number

of subjects forche cross-validation group was 10.

‘ Criterion

The industrial psychological consultant, mentioned above, obtained
from.the company the following information for possible use as criteria:
a numerical job classification which gave a numerical value to the level
of supervisory responsibility; annual income (coded); hourly income
(coded); merit ratings for 1953 and 1951;; income change from 1950 to
l9Sh; and job tenure. All names of supervisors were also represented by
code numbers. Only the job classification and the coded hourly income

appeared to reflect the various levels of supervisory function.

13

The coded hourly income, which excluded bonus and overtime pay, was
selected as the best available criterion. This was operationally defined
as a coded, dollars-per-hour, manifest scale of value of the supervisor's

performance in this company.

Apparatus

The du Mas Scaling Frame held 87 removeable slats placed together so
as to present a flat surface slantirg away from the vertical, at an angle
convenient for placing thumbtacks into the slats while standirg in front
of it. Its outside dimensions were about four and one-half feet in height
and five feet in width. One hundred holes for thumbtacks were drilled
down the length of each slat at one-half inch intervals. Since the slats
were one-half inch wide, and held firmly together, the holes formed
straight lines top-to-bottom, across, and diagonally. The effect might
be visualized as a rectangular surface made up of (one-half inch squares
or cells with a hole for a thumbtack in the center of each cell. There
were 87 cells in each row across the frame, and 100 cells in each colum.
Each slat, or column, could be removed and shifted to another position
where it would again fit this pattern of cells.

The criterion scale was attached to the left-hand margin of the
scaling frame so that each individual, represented by a name code number
and an hourly income code value, coincided with a horizontal row of cells.

The criterion values were ordered from highest to lowest, with the highest

at the top row of cells.

1b

The slats, or columns, were numbered to represent categories, such
as test score intervals. Thus, if an individual scored within this in-
terval, the datum was entered into the appropriate cell where the indivi-

dual's row and the category's colmm intersected.

Basic Data

The battery of standard tests consisted of Bernreuter's Personality
Inventory; the Social Intelligence Test prepared.by Moss, Hunt, and
Omwake; anderlic's Personnel Test; Bennett's Test of Mechanical Compre-
hension, Form.AA; the Minnesota Clerical Test, by Andrew, Paterson, and
Longstaff; How'Supervise?, by File and Remmers; the Eachigan Vbcabulary
Profile Test, by Greene; The Kuder Preference Record; the‘washburne S-A
Inventory (thaspic edition); the Study cf Values, by Allport, Vernon and
Lindsey;

The Guilford-Zimmerman Temperament Survey and the Thurstone Tempera-
ment schedule were used as alternate tests and therefore could not be
utilized statistically. Records were not complete on the Wide Range
Vecabulary Test, and therefore it could not be used.

The Personal History form.was constructed by Harry G. Yudin, profes-
sional psychologist, in whose offices the testing was accomplished. The

testing session was about six hours for each subject.

Method of Analysis
If the data were numerical values, they were divided into three
intervals, each representing roughly one-third of the subjects. Thus, if

the scores for all the subjects on one test ranged from 60 to 90, and

15

if they appeared to be fairly evenly distributed, the three categories
for this test became O to 70, 71 to 80, and 81 up. Each of these catego-
ries was numbered. If a subject's score placed him in the 71 to 80 cate-
gory, a thumbtack was entered in the cell where his row and the category
column intersected in the scaling frame. Biographical data, if not nu-
merical, were divided into 'yes' and 'no' categories, on the assumption
that a 'yes' response might be characteristic of top and bottom criterion
ranges, and therefore not discriminate, while the 'no' response would, in
this instance, discriminate the middle/triterion range. The biographical
categories were also given category numbers. Thus, Marital Status might
become the following numbered categories: Married, Widower, Separated,
Single, Divorced.

There was a total of 293 categories representing test scores, sub-
test scores, percentiles, numerical biographical data (e.g., age), and
non-numerical biographical data (e.g., birthplace). Since the du Mas
scaling frame held only 87 slats at a time, it was necessary to use extra
slats, filling and evaluating 87 categories at a time.' All the data were
I entered into their appropriate cells. "

Certain categories were seen, when inspected individually, to discri-i
minate some portion of the upper, lower, or middle range of the criterion
scale. These categories were moved to the left side of the scaling frame
by simply lifting the slat out of the frame and putting it back into the
frame on the left side, after sliding the other slats to the right.
Categories were rejected to the right side of the scaling frame if they

were: multimodal, gappy, associated with a large part of the range, or

Hz

\Omﬂme’WNE-J
0

F15

12.

[.4
w
|

1h.
15.
16.
17.
18.
19.
20.
21.
22.
23.
2h.
25.
26.

27.
28.
29.
30.
31.
32.
33.
3h.
35.
36.
37.
38.
39.

Categ. Ifo.

 

161

220
at
172
178
31
57
166
153
5
120
bl
286

Highest pay ever received.

TABLE I

THE CATESCALE

Categories

 

Job at time of test.
Organizations belong to.

Siblings.

Number of previous jobs.
Minn. Clerical Test, Numbers 212 - up
0 - 15

Soc. Int. Test, Recog. Mental State.

Soc. Int. Test, Humor.
washburn Social Adjust., Wishes, first 3.

Minn Clerical,
Number of books or other important last month.O-l
Minn. Clerical, Names score.
Personnel Test.

ﬁile
*

Education 8th or below.

Bennett Mech. Comprehension, Foremen Sile 67—up
Kuder Pref. Record, Clerical
Personnel Test.

Weight

Mich. Vocabulary Prof. , Human Rel.
Mich. Vocabulary Prof., Commercial
Bernreuter,

I. Q.

Extrovert.

$151 per week — up
Supervision

1

None
0 - 1

ll - 16

Numbers zile.

86 - up
0 - 110

o - 3h

0 - ho

below'llS
Bernreuter, Emotional Stability o - 3h
Social Intelig. Test, Judgement. O - 20
How Supervise?, Shop Practice Sc. 0 - 12
How Supervise7, Company Policy.

0 - 12

186 - 200

Social Intelligence Test, Total

PsyChologist's evaluation (rated by author) low—1
Minn. Clerical, Names.
Psych. eval. (rated on 7 point scale) average- h

*-

wrong.

Soc. Intel. Test., College Zile.

Personnel Test,
Highest pay ever received.
Siblings.

Two

Social Intelligence Test,

Nashburn S-A Inv., one of first 3 wishes Social;
Jbb at time of test.

Number right.

Humor.

washburn S-A Inv. Score: "t"
Number of children

Scale of Values,

Social score

0 - 1h
0 - 12
0 - 3h
0 - 90

5 - up

0 - 3h
0 - 12

O - 99 per wk

0 - 10

Non-supervisory

6 - 10
1

31 - 35
* Catescale is divided into upper, middle, and lower thirds.

Job

16

Scale
Value

 

h060.0
2888.0
2380.0
2200.0
2151.2
1982.0

.1980.0

1957.5
1956.7
1932.7
1916.6
18h1.o

1833.0
1811.7
1802.7
1799.6
1791.6
1780.0
176h.6
176h.3

'176h.0

1761.1
1759.0
1753.7
1751.8

17h2.5
1738.9
1735.0
1723.3
1700.0
1682.6
1665.7
1660.9
1627.5
1622.2
1618.6
1600.0
1562.8

Fig. 2 .
Segmental CateseQ/e- \)
GHQ/"(U Samp/e #

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

   

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

M./, 2 3 4 5
3* - 6 7 9 9/0 // 12/314 .,
7 seem j g g /5 I4 /7 )3,
I I.“ VZIUQs. ‘5 ‘9"? 66‘“ M y» “$14009 “51 3'? g. 91:34! ‘33 97‘ :3: ”Q 2;” 2” 15 26 27 22 2730 3/ 32 3336135 34. 375939
N “a“ " "" w 1‘ 7° 5 L“) w? \ \z‘kﬁm \ 0 ﬁ ‘1‘ é 0”" 1‘" ~"" “1’00“ 01‘ 6°18“ 3536‘s a“) as 5 st $9.6 a ‘3‘" :
I P N026 r \ 6 ‘ ‘9 ’5‘ ‘4 \" ‘6 'C' 9." {‘9 \O\‘ 93* a» '15 0"“ 4" 6% w“ '5’ 1" 07“ \‘ «9 ~‘ 1r \ \x t Y 1 8: 2“ IL“-
2 lb 379.7 1 ‘ I' ‘ ‘ 7 V '9 999‘ ‘\ "‘06; A“ ”“8" 3'1:
4 a a '
3 e 333 ’ ' 7 , 7 ~ * 347 I
61 32 2770 1 7‘ Km“ 7 1.1770
5 I Lao ‘ >< x x x x x x , L" X ‘ 1:44.
. 6 .5 1.440 x x i x i x * ”105,9
; 7 25 2.070 X X x x g x x ‘ 3’3'4
E 5' 39 2.074 x . ‘ x X X x x x x x x " - “ . 7997b
1 9 43 I 0 X l‘ X at x x X X X K 7‘ 7W‘ E3.1/3.6
a ’3” *§:W§“ “1".“11
. x x X 106’
27 llégixxxxng ;*¥:xx *xxx ‘;XXXXX$ xxx*x§ ”34:;
35 . X a 9 ~ 1193
9 {73¢ 22:" ‘xK‘WxxxxxX’ 1*1,x " ‘i In”
5’ 1740 i X X ‘l X 1. x 1 X X ‘7” X X. X i y. X ,1 ’83
I7 @740 >1 x x 7 a x Y " “ 7" >< * x x x 1"; WWW
4/ 1. a x x “ " * “*6“ x x x. it 77734
47 [60 x ,7 3) ’ﬁ * * ‘x XXX XXxi ,3; , :177319
K? [the x x W“ 3‘ X X X x at x x " “I 777.47
7 15,0 1‘ x X X ‘ ’L X' )( X I 3'9.“
46 IJ’D X 1 1 X x ., x X X X X X X 1 x X i x / 72/.0
3‘ [50* xx): ‘x xw X ‘3‘ XKXKX Xixx ' ”a”
X x< wt x . *
20 [5 D X x x x i x X X “ ’7753
M [5’07( 1: x x X X X x X i x ' , ’72-?)
49 IJIoinix x x X xx.( 176.
27 1.490 x x x X x x x 17:4 {I
W. [430 X x 1,701.7
’3 “Hon! )1 x x " 7‘
J'D Marx ,7 xxx‘ : *ﬁ‘x xxxxx r, o
. .41 -1 ox * x x x x >< x x ’7
g X “ ‘1 X x [I
. , . Crass-Velldeb/on Semi»);
3 a. n ,y .5 ~ g ' - ' A > - ‘
. . u a i» I, q, a? t 9- '5 a?! \. a Q v ‘\
x w; . .. .~ «M 1.. m s 1W
; ”5045 R «my. genre 4). ~07 oqoss \‘ . \‘ \
1 £ 4"; 400° 1‘: \ g a;\\ ’\ 3 a; 1; {\ ff ,0; Iv“ ((3 v \3"\O°\¢¢\'b 09‘ \\ 7“) 0} \3’ y \‘2‘ \5
f 3330 L X ' . x x
{31 24 40 '2‘ x I X t + x K . ’x .
,4- 48 .470 t 2 7 l“ i H xxx 15“..
#5 34’ [810$ 1! iii}. “my ix 24" x X *’
117' ZZ’ ’1‘égi-iukyi1x at x 1. *“"‘ .xxx’x 8x ;
. 1 ‘ 1 7» 1» X
g 30.-..../57027H$t$w<’ I *ui. ’9 1 " ‘ I
_. ,3 [.490 4‘ 2+“ 4“) ”$7 * t. 1 1. x x - . 3
as; 1. i 1. v ~I~ a x* 7 7* x t x1 1 l 7, ~11"—
rs.» ‘

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

,- u._ _~ ,-

‘ v~

 

.tc‘C—Qh
up]:

3/ a: 13 ”(:5 1.373739

2513 1/3‘ 26272917”

 

 

. w. .. .w.m.o..u.uvw ”m4 v.4.” aﬂmmﬂmb. 3.-.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1 : 10' IV 01 11
. 2n-9nn~:wnnsvnnnw7n3n:2,7m a z ,
.. 3 I2/7—I’I’.”Ill/I/l/b'llo/I’oIII
rL I ”III In”! —
ii
«at «sf } } T4,”
* X‘Kﬂ‘d‘vﬁ x 1"} Q“
A . x i h.
1 IN!
1‘. l. X
an X an an an «at
\\ Xx x X X a. .\
9\ ,f «1. XXX... 1. i 4. X J...
M.“ i K XXX it . X
1!;
. ‘
aﬁh' i» X"’ up w. K K
.vv x XXX X X X XXX X
8).» xx x «.4 xx xxxx X.) x x 1
3.3.9 I X X X! XX X x X
wﬁﬁ... 43333.. «a m a“ (x .x
$91.? XI} 45 Y «5% «Avanx In
,3 «X xix XxxxxxXXXxxﬂxxlxx x x P
.Nsov\ XX «515‘ 1 XX an X X X m

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

not sufficiently associated with individuals in the sample, in accordance
with the du Mas (h) specifications.

Thirty-nine categories were thus selected. (See Table I, p. 16). The
scale value for each category was calculated by means of the du Mas (h)

formula:
Sum R
V‘-------,
N

where:
V: category scale value
Sum R: criterion score (income) of all individuals associated with a
a particular category '
N: number of R values associatedIJith the category.

The selected categories were then ordered with regard to the magnitude of

their scale values, in accordance with the du Mas (h) instructions. (See

Fig. 2, p. 17) These scale values constituted a catescale. (See Table I)
A.score was then calculated for each individual, by means of the

du Has (’4) formula:
Sum.V

n

S-

 

9

where:
S: score from the catescale. .
Surxv: sum of catescale values of all categories with which an
individual is associated.
n: number of categories with which an individual is associated.

This operation attempted to predict the criterion score from.the catescale
extracted from.the data.
The product moment correlation was calculated between the criterion

distribution R and the predicted score distribution S.

19

One supervisor did not appear in any category, and was therefore not
scored. The proportion D of the sample for whom a score is determinate

was therefore calculated; by the du Mas formula:
”3

 

R

V.

where NR . the number of individuals in the sample that have an R value;
and NS : the number of individuals in the sample for which a score is
determinate. ’
Cross-validation was done by applying the scale values, or weights,
of the catescale constructed from the first sample, to the cross-valida-
tion sample of 10 subjects. A product moment correlation was calculated
between the distribution of criterion values in this sample and the pre-

dicted score distribution.

20

RESULTS AND DISCUSSION

A Catescale (categories possessing scale values) consisting of 39
categories, was operationally extracted from.a total of 293 categories of
biographical and standard test data. (See Table I, p. 16). The Gatescale
values were used to predict the original criterion scale values of the
original sample of 32 supervisors. The validity coefficient was r = .95 .

The same Catescale values were used to predict the criterion scale
values of-a cross-validation group of 10 supervisors. The correlation
between the predicted criterion values and the original criterion values
of the cross-validation group resulted in a coefficient of r = .80 .

The general hypothesis: There is a set of categories from.biographi-
cal and standard test data which forms an empirical analogue of the Segmen-
tal Mbdel (Fig. l, p. 93). This was clearly supported.

Reference to Figure 2, page 17, will reveal the Criterion, R, values
in the third column from the left. ' Also, in the extreme right hand column,
the Predicted Scores, S, will be seen. It will also be noted that each
individual is represented.by a name code number_in the column next to his
criterion value column, and that the N of the original sample is 32.

The Criterion, R, column is the original criterion.scale,ordered from
highest to lowest, of the original sample. The Predicted Scores, 5,
column contains the predicted criterion scores, or values, calculated
from.the Catescale values in the row along the top of Figure 2. A predic-
ted score is the mean of the catescale values represented.by:x's in the

individual's row.

21

Considering the Original Sample, upper part of Figure 2, the validity
coefficient has been expressed as the correlation between the distribution
of the Criterion, R, values and the distribution of predicted criterion, 3,
values. As stated above, the validity coefficient was: r I .95 .

Considering the crOSdealidation sample, the lower part of Figure 2,
the same column headings will be seen. Also, it will be noted that the
Catescale values are identical with those in the original sample above. The
N of this group is 10. The name code numbers reveal that these supervisors
are’not included in the original sample. The Catescale found for the first
sample was applied.to the cross-validation sample, and the predicted cri-
terion values, 3, calculated in the same way, by taking the mean of the
catescale values represented by x's in the rows of the individuals. Corre-
lation between the criterion, R, distribution of the cross-validation group
and its distribution of predicted criterion scores, 8, yielded: r a .80 .

The Catescale of 39 categories, with category'numhers and scale values,

is presented separately in Table I, page 16. The category numbers and scale

values may also be identified in the second and third rows of Figure 2,(p.l7).

In Table l, (p.16), the/content of the categories of the Catescale are shown.
It may be recalled that the categories were made, and.numbered individually,
by dividing numerical data (e.g., scores on a test) into three intervals

. and'biographical data into appropriate categories (e.g., married, single).
The 39 categories of Table I, with their scale values, represent the
Catescale operationally extracted from.293 categories as¢iescribed.under
Method of Analysis beginnign on page lb of this thesis. It may be noted

in Table I, page 16, that the Catescale is divided into thirds.

\‘

22

Apparently, it should be possible to read a general description of the
upper third of the discriminating supervisory qualities of our catescale.
waever, all of the supervisors in the upper third of our sample are not
uniformly associated with the upper third of the categories. H w these
categories interact, what effect they may'haye on each other when associa-
ted together, or what combinations produce what results, requires a type
of analysis or speculation beyond the scope of this study.

The categories themselves are of interest. Some were quite surprising
and unexpected. Some might have been expected in the light of conventional
theory. However, these categories were operationally and objectively
extracted from the chaotic domain of all the data in conformance with the
principles of manifest structure analysis, as presented by du Has (h).

The fact remains that these categories, or combinations of categories,
appear to discriminate with high validation and cross-validation the
various levels of the criterion scale.

Several modifications of procedure suggest themselves which might
yield a special type of catescale for a specific purpose. Fbr example,
an intensive study might be made of a specific area, or of one test, or
of one kind of test, for which du Has (’4) had described an intensive mdd.
In this study, items were selected for highest possible discrimination.
Item selection could be somewhat more liberal in specific areas, or in s11
areas, so as to include a more complete description from the catescale,
even though this would increase the variance and therefore decrease the
validity somewhat. Dichotomizing all data, instead of dividing them.into

three intervals was another possibility in dealing with categories.

VL‘L‘.

23

It would be possible to make several interesting a posteriori inter-
pretations of these categories. This suggests itself as a potentially
rich source of new ideas and psychological insights. This however, was
not the purpose of this study. The speed with which new catescales can be
selected, and weights calculated, seems to offer possibilities for succes—
.sive approximations to a best ordered structure, as du Has (h) suggests.
Also it would also appear quite possible to substitute different criteria
in the same set of data.

Since this was the first empirical study of supervisors with manifest
Structure Analysis, the apparent utility of this method has by no means
been fully explored. This is especially true in.view of the scarcity of
published evaluative results, and the even greater scarcity of positive
findings, in the field of supervision, as pointed out in the introduction -
- of this paper. Further research and greater experience with this method
should reveal the most fruitful areas of application and the specific

uses in which results would be most definitive.

2h

SUMMARY AND CONCLUSIONS

The problem of this study was to select a set of ordered and weighted
categories or items from biographical forms and standard test information,
by means of which a subject's potential value to a company might be predic-
ted. A group of 32 supervisors or potential supervisors participated in a
six hour testing program which included a personal information form and a
battery of standard tests. A criterion scale of coded hourly income was
obtained and defined as a coded, dollars-per-hour, manifest scale of value
of the supervisor's performance in this company.

A Catescale (literally, scale-weighted categories) was extracted from
the biographical and test data. arranged on a criterion contimlum of coded
hourly income. (See Fig. 2, p. 17). The validity of the Catescale in
predicting the original criterion values was: r = .95 (See Table I, p.16).
A cross-validation group~of 10 supervisors, selected only for range along
the criterion continuum, was scored with ‘ the catescale values found on the
original sample. Cross-validation correlated: r = .80 (See Fig. 2, p.17).
Both validity and cross-validation coefficients were well beyond the one
percent level of confidence.

The general hypothesis was: There is a set of categories from biogra-
phical and standard test data which forms an empirical analogue of the
Segmental Model. The hypothesis was clearly supported. The Segmental
Model (Fig. 1, p- 98.) is one of several models presented by du Has (1:) for

scale construction in Manifest Structure Analysis. It is a mathematical

model representing a perfect correlation. It is represented as a scatter
diagram with a criterion scale at the ordinate, the left side, and catego-
ries along the abcissa, at the top. The criterion scale is ordered with
the highest value at the top. The model would represent a Pearson r of 1.
'The important difference between conventional item analysis and mani-
fest structure analysis is that categories (columns) are selected to meet
the assumptibns of Pearson r computation, by inspection of each column
(category with individuals plotted) separately, ordering the columns in
terms of mean value, and computing Pearson r between the original criterion
distribution and a predicted distribution. (See Fig. 2, p. 17). Category
values, or catescale weights, are the meals of the criterion scale values
represented in the columns. Predicted criterion values are the means
the rows in terms of the column values. Data are represented in the
scatter by x's. Justification for the procedure is cross-validation.
Practical conclusions: A new technique has been developed which in
one case has extracted a catescale of 39 categories from 293 categories
' of (biographical and test data, with extremely high validation and cross-
validation. Practical advantages of this technique are: simplicity, speed,
and low cost. Highly trained personnel would not be required in a situa-
tion where scoring tables and graphs could.be set up. Flexibility and
extremly wide applicability, "wherever rating scales or tests are appli-
cable"were indicated in agreement with du.!hs (3, p. 117). Analysis of

case histories, interview forms, and other multivariate datapgathering

instruments of psychosocial phenomena appears to be uniquely within the

scope of this technique.

a?

1‘" _ .
‘Qt‘; :x ,

12.

13.

1h.

15.

BIBLIOGRAPHY

Barnett, G. J., Handelsman, 1., Stewart, L. H., and Super, D. E. The
occupational level scale as a measure of drive. Psych. Monograph}:

1952, 3h2.

Cleeton, G. U. and Mason, C. W. Executive ability. The Antioch Press,
Yellow Springs, Ohio. 19’46.

 

 

du Mas, F. Y. Continua, catemensions, catescales. J. Clin. Psychol.,
73 1951. 112-117-

 

du Mas, F. M. Manifest structure analysis. Unpublished treatise.

 

File, Q. W. and Remmers, H. H. Studies in supervisory evaluation. :_J__

Gibb, C. A. The principles and traits of leadership. J. Abn. and Soc.
Psychol. 19117, 272.

 

Harrell, W. Testing cotton mill supervisors. J. Appl. Pachol. 19110,
214, 31-35.

Jenkins, J. G. Validity for what? J. Consult. Psychol. 19146, 10, 93-98

 

 

Lawsche, C. H., Jr. Erincil les of personnel testing; First Edition.
McGraw Hill, Inc. N.Y. 19148.

 

 

Rush, C. H., Jr. A factorial study of sales criteria. Pers. Psychol.
6. 1953.

 

Strong, E. K., Jr. Interests of senior and junior public administra—
tors. J. AIQI. Psychol. l9h6, 30, 55-71.

 

Strong, E. K., Jr. Vocational. interests of men and women. Stanford
Univ. Press, Palo Alto. 1%3.

 

Strong, E. K., Jr. Vocational guidance of executives. J. Appl. Pg-
Ch°1°: 1927: 11: 331'3h7

Super, D. E. A raisi vocational fitness- by means of psychological
tests. Harper Eros., N.§., 1955.

Taylor, E. K., Schneider, D. E., and Symonds, N. A. A short forced-
choice evaluation form for salesmen. Pers. Psychol., 6, 1951;.

 

BIBLIOGRAPHY

16. Thompson, C. E. Selecting executives by psychological tests. Educ.
P33761101. Measmto 19117, 7, 773-7780

17. wadsworth, G. W. Tests prove their worth in a utility. Pers. J.,

1935, it, 183-187.

18. anderlic, E. F., and Hovland, C. J. The personnel test; a restandar-
dized abridgement of the Otis S-A test for business and industrial
use. J. Appl. Psychol;, 1939, 23, 685-702.

I‘dﬂfu .1 '.‘ _> .—

u-uma‘ (its:

can s. '-

“A
\

888M USE It‘d!

Yum 16‘.:‘ .

innit? '55
”TI" 'Y’ ‘Iﬁ , I - I
; qu-th‘T 3.3.133:

 

 

 

.11

  

. . . l

 

MICHIGAN STATE UNIVERSITY LIBRARIES

0 1 1

375825

3 1293

 

A

I.

  
    

.-
l-
-
'

J