THE CWFRRATWE EFFECm'ENESS
OF DIFFERENT ITEM ANALYSis
TECHNIQUES 5N INCREASING
CHANGE SCGRE RELIABEUTY

. ’Thesislfor the Degree of Ph. D. 7 '
> MICHIGANSTATE-UNIVERSITY
LINDA D. MITCHELL;
T_7 1970 .

THES'S

 

This is to certify that the

thesis entitled

THE COMPARATIVE EFFECTIVENESS OF DIFFERENT
ITEM ANALYSIS TECHNIQUES IN INCREASING

CHANGE SCORE RELIABILITY
presented by

Linda D. Mitchell

has been accepted towards fulfillment
of the requirements for

PhoDs degree in Education

ZZ/A/ddum z/léﬂtu’w”

Major professor

Date July 1, 1970

0-169

  
   
   
   

  

¥ ‘
BINDING BY

"DIG & SUNS’
BUUK MNDERYINB.

LIBRARY BINDE R5
SPIIIBPOI‘I, IICHISIJ
o‘— __-“ ‘— I

.‘I

EII
I
3.
L
I

 

 

 

 

ABSTRACT

THE COMPARATIVE EFFECTIVENESS OF DIFFERENT
ITEM ANALYSIS TECHNIQUES IN INCREASING
CHANGE SCORE RELIABILITY

By

Linda D. Mitchell

Four different procedures for selecting items to measure
individual change were studied to determine which would result in
sets of items with the highest change score reliability. The four
methods of item analysis used for these change items were: selec-
tion on the basis of change item score variance; selection on the
basis of pretest response frequency; selection on Saupe' s correla-
tion between change item score and total score; and selection on
triserial correlation.

The study was Specifically undertaken to determine whether
these methods of change item analysis could lead to the selection of
more reliable subsets of items than could be obtained by randomly
choosing items from a pool. Comparisons between the different

methods were also made. The sample used for item analysis and

 

 

cross -v:
Universi

meninl

group. C
conductec
each proc

control pr

 

subsets.
one dichot
W0 differ.
'I
Scored for
Were cach
descriptiv.
the em? e
hIDOtheseE
Smaller in.
subsets Ch
HmotheSeE

Tukev
aw

Linda D. Mitchell

cross-validation was a group of 263 students at Michigan State

University who had been tested on the Inventory of Beliefs as fresh-

 

men in 1958, and again as juniors in 1961.

Half of this sample were assigned to an initial item analysis
group. On the basis of their responses the four item analyses were
conducted and subsets of 15, 30, 60, and 90items were chosen by
each procedure from the original pool of 120 items. In addition, a
control procedure of random selection was also used to select item
subsets. Items were scored on both a one -to -four scale and a zero-
one dichotomy. Item analyses were carried out separately for these
two different scoring procedures.

The items selected by the item analysis methods were then
scored for the cross-validation group. Change score reliabilities
were calculated based upon these responses. To obtain the best
descriptive comparison, all reliability estimates were computed for
the entire cross -validation group of 131 students. To test the
hypotheses of the study, the cross -validation group was divided into
smaller independent samples andlchange score reliabilities for item
subsets chosen by different methods were computed on these samples.
Hypotheses were tested by using a two-way analysis of variance with

Tukey post hoc comparisons for mean differences.

 

 

scored 0
resulted
random 5
high Chan
quency ar.
\

methods (I

 

than did 1‘;
an“. sele
this Case,

N0 signinc

 

methOdS of

Linda D. Mitchell

The results of the analysis showed that when the items were
scored on a one -to -four scale, three methods of item analysis
resulted in significantly higher change score reliability than did

random selection. Saupe' s r was the most successful in producing

dD
high change score reliability. Selection on the basis of pretest fre-
quency and change score variance were equally effective.

When the items were scored on a zero -one basis, three
methods of item analysis resulted in greater change score reliability
than did random selection. These were: selection on change vari-
ance, selection on pretest frequency, and triserial correlation. In
this case, Saupe' s correlation was not superior to random selection.

No significant differences were found between the three successful

methods of change item analysis.

 

 

 

T}

 

THE COMPARATIVE EFFECTIVENESS OF DIFFERENT
ITEM ANALYSIS TECHNIQUES IN INCREASING

CHANGE SCORE RELIABILITY
By

\.,,/

Linda DT'IMitchell

A THESIS

Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY
Department of Educational Psychology

1970

 

 

 

 

Mehrens,
assistano
work and
suggestio
Committe
Bell - - a r
to Dr, Ir

enabled 1

and [‘eSe

ACKNOWLEDGME NTS

The author expresses her sincere thanks to Dr. William A.
Mehrens, Chairman of the Guidance Committee, for his counsel and
assistance throughout her doctoral program and in the experimental
work and preparation of the manuscript for this study. The helpful
suggestions and editorial comments of members of the Guidance
Committee--Dr. Andrew Porter, Dr. Leroy Olson, and Dr. Norman
Bell--are gratefully acknowledged. Special thanks is also extended
to Dr. Irvin J. Lehmann, who generously provided access to the
data used in this study.

The financial support of an NDEA Title IV Fellowship
enabled the author to carry out her doctoral program of coursework

and research at Michigan State University.

ii

 

LIST OF

 

CHAPTE]

I .

III.

TABLE OF CONTENTS

Page
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . v
CHAPTER
I. THE PROBLEM . . -. . . . . . . . . . . . . . 1
Purpose of This Study . . . . . . . . 5
Hypotheses . . . . . . . . . . . . . . . 6
Theoretical Rationale 7
An Overview . . . . . . . . . . . . . . . 12
11. REVIEW OF LITERATURE . . . . . . . . . . . 13
Summary . . . . . . . . . . . . . . . . . 20
III. DESIGN OF THE STUDY . . . . . . . . . . . . 22
The Sample . . . . . . . . . . . . . . . . 22
The Instrument . . . . . . . . . . . . . . 23
Design . . . . . . . . . . . . . 25
Item Analysis Pwrocedures . . . . . . . . . 27
Testable Hypotheses . . . . . . . . . . . . 28
Statistical Analysis . . . . . . . . . . . . 29
Summary . . . . . . . . . . . . . . . . . 30
IV. RESULTS . . . . . . . . . . . . . . . . . . . 32
Results for One -to -Four Scoring . . . . . . 32
Testing Hypotheses for One -to -Four
Scoring . . . . . . . . . . . 36
Results for Zero -One Scoring . . . . . . . . 38

Testing Hypotheses for Zero -One
Scoring . . . . . . . . . . . . . . . . 42
Summary . . . . . . . . . . . . . . . . . 44

iii

 

 

 

CHAPT.

 

 

 

BIBLIOG

APPE ND}

 

 

CHAPTER Page
V. SUMMARY AND CONCLUSIONS . . . . . . . . . 46
Summary . . . . . . . . . . . . . . . . . 46
Conclusions . . . . . . . . . . . . . . . . 48
Discussion . . . . . . . . . . . . . . . . 49
Implications for Future Research . . . . . . 51
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . 54

APPENDIX.......................57

iv

TABLE

LIST OF TAB LES

Change score reliability coefficients computed
for the total cross -validationsample using the
one-to -four scoring system .

Change score reliability coefficients computed
for independent cross -validation samples using
the one -to -four scoring system

Two -way analysis of variance for the effects of
item analysis method and number of items on
change score reliability (with the one -to -four
scoring system) .

Differences between reliability estimates for
items chosen by different item analysis
methods (one -to -four scoring)

Change score reliability coefficients computed
for the total cross -va1idation sample using
the zero —one scoring method

Change score reliability coefficients computed
for independent cross -validation samples
using the zero -one scoring system

Two -way analysis of variance for the effects of
item analysis method and number of items on
change score reliability (with zero -one scoring)

Differences between mean change score reli-
abilities for‘items chosen by different methods.
(Scores are Fisher r-to -Z transforms.)

Page

33

34

35

38

39

40

43

TABLE

A.1

A.2

A.3

 

 

A.4

A.5

A.6

A.7

151.8

A.9

 

TAB LE

A.1

Listing of subscales in which each change -item
first appearedafter item analysis with one —to -four
scoring system

Listing of subscales in which each change -item
first appeared after-item analysis with zero -one

scoring

Percentage of item overlap for scales chosen by
different item analysis methods -- 15 items,
one -to -four scoring

Percentage of item overlap for scales chosen by
different item analysis methods-- 30 items,
one -to -four scoring

Percentage of item overlap for scales chosen by
different item analysis methods-- 60 items,
one -to -four scoring

- Percentage of item overlap for scales chosen by

different item analysis methods-- 90 items,
one -to -four scoring

Percentage of item overlap for scales chosen by
different item analysis methods-- 15 items,
zero -one scoring ‘

Percentage of item overlap for scales chosen by
different item analysis methods —- 30 items,
zero -one scoring

Percentage of item overlap for scales chosen by
different item analysis methods-- 60 items,
zero -one scoring

Percentage 'of item overlap for scales chosen by

different item analysis methods --90 items,
zero —one scoring

vi

Page

66

72

78

78

78

79

79

80

80

81

 

 

 

researct
change f
this prot

individu;

Where I]
score a‘

Score,

CHAPTER I

THE PROBLEM

A methodological problem frequently encountered by
researchers in education is how to obtain measures of growth or
change for subjects over a given period of time. One approach to
this problem has been to calculate the change score for each

individual, using the formula:

D=X-Y, (1)

where D is the change score, Y is the score at time 1, and X is the
score at time 2. D has also been called a gain Score, or discrepancy
score.

Researchers who have attempted to use such change scores,
however, have been plagued by one persistent psychometric problem.
These change scores are remarkably unreliable. Noted measure-
ment experts such as Lord, Horst, Webster, and Bereiter have long
recognized this problem (Harris, 1963). When the researcher is

primarily interested in measuring change for a group, this problem

of low reliability is not .too serious; however, if he wishes to make
meaningful comparisons between individuals on the basis of their
growth or attitude change, then the lack of reliability becomes
crucial.

When the traditional formula for the reliability of change
scores is examined, two factors seem to be necessary to obtain high

changerscore reliability. This formula as derived by Gulliksen (1950,

p. 353): is:
r _ r - rXY (2)
DD 1 - rXY

where r 18 the mean of rXX and rYY' From ﬁne It appears that In

order to obtain a high value for *r the interrtal consistency of the

DD’

test at time 1 (r Y) and at time 2 (rXX) should be high, but the

Y

stability coefficient for the test over time (r ) should be somewhat

XY
lower. Thus change score reliability can be increased if the test-
retest correlationcan be reduced while test homogeneity (or internal
consistency): is maintained at a high level for each separate adminis -
tration of the test.

When the reliability of an instrument is unsatisfactory, a

common psychometric practice is to construct more items, since

longer tests are usually more reliable. The obvious drawback in

this procedure is that for many testing situations the number of items
must be kept to a minimum for practical considerations of time and
economy. When this is the case, item analysis techniques are usually
employed to select subsets of the most discriminating items from the
original pool so that the test can be shortened without seriously reduc-
ing its reliability.

Ordinary item analysis procedures, usually based upon a
sing-1e test administration, are designed to improve test internal
consistency or to yield a test which correlates highly with some
criterion. Such methods are not guaranteed to work for change score
reliability. Theorists such as Bereiter (1963), Saupe (1966 and 1961),
and Lord (1968, p. 331) have suggested that a researcher'who desires
to construct an instrument, sensitive to individual change, should use
item analysis techniques suited for that purpose.

Several new techniques for such item analyses have recently
appeared in the literature. One of these methods is based upon
observing the response changes to items over time (Gruber and
Weitman, 1962). Items for which there is a "moderate" change
frequency when a group is tested and retested at a later date should
be selected. This tends to eliminate those items for-which the
group exhibited little change in response over time as well as items

for which the group displayed a universal change over time. In other

words, items for which there-is"'moderate" rate of change will be
items for-which there was variation between subjects in their
response changes.

A second method uses response frequency to items on the
pretest only. With this method the expected direction of change must
be known in advance (Gruber and Weitman, 1962). The experimenter
then selects items which had a. low percentage of negative responses
on the pretest if a high percentage of negative responses is expected
on the posttest, or vice versa.

In a third method items are selected which have a high cor-
relation between item response change and total change score. This
correlation is determined from a formula derived by Saupe (1966),
which is equivalent to the Pearson Product Moment correlationvalue.

A fourth method of item analysis was employed in this study
which had not been revealed when literature in this area was reviewed.
With this method items are selected if they have high triserial cor-
relation values when the correlation between total change score and
trichotomized change in item response is computed.

Because of the relative newness of these-item analysis
methods there has been little empirical research to determine
whether or not they could effect increases in change score reliability.

Also, the comparative efficiency and effectiveness of these different

procedures is completely unknown. Such informationis sorely
needed by researchers who face the problem of constructing: instru-
ments to reliably measure growth or attitude change for individuals
over time (Lord, 1968, p. 331).

' To provide further-information on this topic, an empirical
study was designed to examine these various item analysis procedures

and their effects upon change score reliability.

Purpose of This Study

 

The purpose of this study was to determine whether use of
the item analysis methods previously discussed could increase the
reliability of change scores on a collegiate attitude survey. Four
specific questions of central importance to this issue were raised.

1. Which of these four item analysis methods would result in
selecting a subset of items with the highest change score
' reliability ?

2. Whichcorrelational method would result in the higher
estimate of change score reliability for selected subsets
of items?

3. Wouldthe response frequency method, based upon the
variances of response changes from pretest to posttest,
result in higher change score reliability than the method

which uses only pretest response frequency?

4. Could reliability of change scores for items selected on the
basis of pretest response frequency exceed the reliability

of an equal number of randomly chosen items?

This fourth questionwas particularly interesting because of
its practical significance for test construction. In many attempts to
measure change the experimenter simply does not have time to con-
struct his instrument and run a complete item analysis on test-retest
data before he can gather his data. (This is especially true for
longitudinal studies.) Thus, if a method could be developed to elimi-
nate useless items on the basis of pretest characteristics alone, it
wouldbe extremely helpful and time -saving for the researcher and

his subjects.

Hypotheses

 

On the basis of reliability. and item analysis theory, four
general hypotheses were formulated in an attempt to answer the
questions under investigation in this study. These hypotheses were:

1. Use of Method III (computing the PPM correlation for total

change score and item change response) would result in a

subset of items with higher change score reliability than

that of item subsets chosen by any other method or by

random selection.

2. Method IV (computing the triserial correlation between
change scores and change in item response) would result in
a subset of items with higher change score reliability than
could be obtained for items chosen by response frequency
methods or by random selection.

3. Method I (selecting items which showed variance in changes
in response frequency over time) would result in a subset
of items with higher change score reliability than that of
items selected by Method II (selection on the basis of pre -
test response only).

4. A subset of items could be selected by Method II (pretest
response frequency) which would have higher change score

reliability than a randomly selected subset of items.

Theoretical Rationale

 

The idea of attacking the unreliability of change scores at
the item level can be credited to Bereiter, who formulated the con-
cept of the change item. The change item was defined in this way:

A single item administered on two occasions yields an item
change score which is the difference between the item scores on
the two occasions. If the item is scored dichotomously, 1 or O,
oneach occasion, then the change item may take any of three
values, 1, O, or —1. (Bereiter, 1963, p. 10)

This definition can be expanded to include items which have more
than 0 or 1 as a possible score on each occasion, such as those
found on many attitude scales. Change items may thus be scored

for both direction and amount of change, and change item scores may

be summed, like ordinary item scores, to get a total change score,
D = Zdi (3)

where d. = x. - y. . (4)

In this definition yi is the individual' 8 response to item 1 at time 1,
and xi is his response to item i at time 2.

Bereiter believed that item analysis procedures could be
carried out on the change items to improve change score reliability.
Furthermore, he maintained that change score reliability could be
adequately defined using a classical definition of reliability. Using

change item scores the formula becomes:

ZS2

 

d.
1
r = 1 - (5)
DD 2
ZS d. + zcdd.
1 1 J
i f J'
where Szd is the variance of a change item and Cd d is the covari-
i i J'

ance for the change item scores of items i and j. Bereiter then

 

i.I.ll.-II Ill-Iii '5‘ 4

AT; . I... .2 .FFQL‘

hypothesized that increases in change score reliability could be
attained by selecting items in such a way as to maximize the change
item covariances.

Clearly the two methods of item analysis which use corre-
lations between total change score anditem change score as the
indices for selecting items are directly based on this line of thought.
Change items which have high correlations with total change score
must have high intercorrelations with‘each other.

Further consideration of thetwo correlational indices for
change item analysis reveals that they correspond directly to two
popular indices often used for selection of regular dichotomous items.
The familiar point -biseria1 correlation coefficient for dichotomous
items is actually a Pearson Product Moment correlation (Magnusson,
1966, p. 199). In addition, the triserial correlation is derived
usingthe same assumptions as the well -known biserial correlation,
and the formulae for these two statistics are identical, except for
the inclusion of the parameters for the third category in the tri-
serial expression. Expressions for both biserial and triserial
correlations can be derived from the general expression for the
multiserial correlation coefficient given by Jaspen (1946). (A more
complete discussion of this topic follows in theliterature review in

Chapter II. ) These similarities should help to answer the question:

10

Which correlational method of change item analysis will result in
greater change score reliability? Lord (1968, p. 344) pointed out
that when there are ability differences between item analysis and
cross-validation groups, the biserial correlation, which is unaffected
by the factor of item difficulty, might be better for selecting items
with high reliability across groups; however, when the groups are
similar, the point biserial method might produce a more reliable
test. In this experiment subjects from the same population were
randomly assigned to item analysis and cross -validation groups.
Since the two groups could be expected to be fairly similar, it was
hypothesized that the PPM method (the point biserial method) would
result in more reliable change scores than would the triserial index.
The rationale for selecting items on the basis of response
frequency is apparent from Formula (5). One way to increase
reliability is to increase the item covariance/ variance ratio.
Assuming item intercorrelations remain constant, this can be
accomplished by selecting items which have large variances. As
individual item variances are increased, item covariances must also
increase, but the total of the item covariances will increase at a
faster rate than the total of the item variance. The change items
with the greatest variances will be those with moderate "difficulty"

levels or frequencies of response change. This is clearly illustrated

11

if two extreme cases for change items are considered. An item for
which there was no change inresponse between testings will have a
mean change score of 0 and a change variance of 0. Likewise an
item for which there was a universal shift for the group from positive
to negative response will have a mean change score of 1 and a change
variance of 0. Such items can contribute nothing to reliability (Shoe -
maker, 1969.); however, items which have moderate frequencies of
response changes will have variances which are larger and can
reﬂect differences in individual changes which are necessary for high
change item covariances and, consequently, high change score
reliability.

If the direction of change cannot be predicted, it is irnpos-
sible to choose items on the basis of pretest response frequency to
insure that they will have adequate response change. If the direction
of the change can be predicted, then the items can be chosen which
are likely to have response shifts that will produce the desired rate
of change. For example, if a shift toward a positive ”Agree"
response is expected over time, then items which initially have a
high proportion of ”Disagree" responses will be likely to have a
moderate frequency of response changes overtime. (Obviously the
researcher must hope that a total shift to the "Agree" response does

not occur.) This procedure is more risky than selecting items when

12

the actual response changes and their variance can be computed from
a complete set of test-retest data.

It was generally expected that correlational methods would
be superior to response frequency methods for item selection
because the correlational procedures depend upon both item variances
and their intercorrelations, while the frequency methods fail to con-

sider how an individual item covaries‘with others in the item pool.

An Overview

 

Further discussion of theoretical works and empirical
research studies whichare related to the problem of selecting items
to measure change and the reliability of change-scores is presented
in the Review of Literature, Chapter II. An empirical study designed
to compare several different change item analysis methods is
described in Chapter III. In Chapter IV the method of statistical
analysis used to test the hypotheses of this study and the results of
that analysis are presented. The conclusions from this study, dis-
cussion of the results, and some implications for future researchin

this area have been summarized in the fifth and final chapter.

CHAPTER II

REVIEW OF LITERATURE

Whenever the problem of measuring change is considered,
the researcher must be careful to specify whether he wishes to
evaluate mean change for a group or to study relative changes
between individuals within a group. The need for this distinction has

been pointed out by Lord (1963), Webster (1963), and Tucker et a1.

 

(1966). If individual differences are the main interest, then the
researcher must be concerned with the reliability of his observations
of change (Webster, 1963).

Traditionally difference scores have been regarded as so
unreliable that Gulliksen (1950, p. 354) urged that standardized test
publishers should warn their users of this fact and actually report
difference score reliability in their technical manuals. Lord (1958)
urged that counselors should make very cautious interpretations when
advising individuals on the basis of difference scores.

Concern over the reliability of difference scores led to the

development of several different expressions for its estimation.

13

14

Onewell -known expression for this reliability was given by Gulliksen

(1950, p. 353):

XY
Fm) ' 17?“); - ‘6’
The value for r is found by computmg the meanof rXX and rYY'

Lord (1963) cautioned users of this formula to remember that it
requires the assumption that S2X = SZY .

The difference scores used in Gulliksen' s formula were
usually computed by subtracting an examinee' 8 score on Test A from
his score on Test B when A and B are composed of different test
items. Change scores, however, are usually difference scores com-
puted when the same test is administered to an individual on two
separate occasions. For this reason Webster (1963) indicated that
Formula (6) may be unsatisfactory for computing change score reli-
ability. He noted that this‘formula derivation rests upon the
assumption. that errors of measurement are completely uncorrelated,
but maintained that this assumption may be unrealistic when the same
form of a test is administered twice to an individual. By substituting
changerscores into the familiar formula for‘Kuder Richardson 20, he

derived an expression for change score reliability which does not

require this questionable assumption. Furthermore, it does not

15

require that the test have equal variances at time 1 and time 2. This
formula for change score reliability uses data at the item level and
is written:
k
k 3? + Y + Z (1:2

r=——-——-l- g

(7)
DD k-l S

 

fem-2?
D

In this expression, f is the number of items scored 1 on both occa-
sions; 3? is the mean of scores at time 1; Y, the mean of the scores
at time 2; J? is the mean score of all individuals in the group on item
x; 5? is the mean for-item y; and k is the number of items.

Bereiter (1963) used the general expression of Cronbach' s
coefficient alpha and, by substituting change item score for traditional

item scores, he defined change score reliability as:

252

 

d.
1
r = 1 - (8)
DD 2
ZS d. + zcdd.
1 1 j
i f- l
82d is the variance of the change item scores and Cd (1 is their
1 i l

covariance. This expression can be shown to be equivalent to

Webster' 3 derivation (Formula 7) except for the absence of the

 

in the derivation by Bereiter. This computational

k
factor k _ 1

formula for change score reliability also uses data at the item level.

16

In addition, it has the advantage of removing the restriction of
dichotomously scored test items.

Based upon this formula Bereiter developed a plan for
manipulating change score reliability at the item level. He suggested
that item analysis techniques could be used to select items which had
large change item covariances. When the change item covariances
are large for a set of items, there is large variability between sub-
jects on their changes in response to these items. Such variance in
change scores results in a lower stability coefficient, rXY , and con—

sequently a higher estimate of r It should be noted that this

DD’
relationshiprexists regardless of which formula is used to calculate
I‘DD.

In an empirical study using an attitude questionnaire for
college students, Webster and Bereiter (1963) reported that they-were
able to effect large gains in change score reliability when they
employed such an item selection technique. Bereiter, however,
makes little mention of the actual index or decision rule used in this
item selection.

Horst (1966, p. 387) indicated that most item analysis pro-
cedures for'raising reliability fall into one of two main categories:

correlational and counting procedures. (Counting procedures use

response frequency data.) It is obvious that this categorization

5mm

 

 

17

scheme can be extended to the realm of change item analysis as
well. This classification will be used in the remaining discussion
of item analysis procedures designed for selecting items to give
reliable change measures.

In 1962, Gruber and Weitman studied changes in item
responses to an achievement test. They wanted to measure students'
retention of subject matter over time using a pretest -posttest
design. Although their goal was not to improve'reliability of change
scores per se, they suggested that changes in response frequency
to items might be useful as a basis for item selection in the measure-
ment of change. In their study the researchers employed two methods
of item selection. The first method was based upon observing shifts
in response frequency toward a specified, optimal level of difficulty
from initial testing to retesting. In the second procedure items were
selected on the basis of their pretest response level only. The
researchers emphasized the necessity of knowing in advance the
direction in which response change is likely to occur over time.

The results of this study indicated that it was possible to improve
discrimination on the posttest by selecting items on the basis of
response shifts. Selecting items on the basis of their pretest dif-
ficulty level did not significantly improve the discrimination between

subjects on the posttest. However, the authors felt that this could

 

18

have been due to the limitations of ceiling effect on their instrument
and the small sample size rather than to ineffectiveness of the method
itself. No data for the change score‘reliability was reported, although
these estimates could have been easily computed. Thus it is not
known if these item selection techniques could have improved the
reliability estimate for changes in retention.

The first correlational method of change item analysis was
derived by Saupe (1966). Based upon Gulliksen' s formula .for the
correlation between a component element and a composite, Saupe' s
formula for the correlation for change item with total change score

is:

_ CxX + Cy Xy

Y "CY
rdD‘
2 2 V2 2
st+Sy-2ny SX+SY-2CXY

where x and y denote item scores, X and Y are total scores, and C

- C
(9)

 

 

 

is covariance.

Lord (1968, p. 331) urged that empirical studies be under-
taken in this area and stressed the need for development of still other
item analysis procedures for change items.

The primary reason that most traditional item analysis
methods cannot be used with change scores is due to the nature of

the change item itself. As defined by Bereiter (1963), the change

19

item, (11 = xi - yi, can have at least three values, 0, 1, or -1.
This rules out the possibility of using the biserial correlation which
is frequently employed as an index foritem selection. (The biserial
correlation, point biserial correlation, tetrachoric and phi coefficient
all require dichotomously scored items.) J aspen (1946) developed
a formula for the triserial correlation. This was intended to serve
as a computational formula for a correlation between two variables
when both were assumed to have underlying normal, continuous
distributions, but when one distribution had been artificially divided
into three categories. This expression is a direct counterpart to
the biserial correlation used for computing correlations between a
continuous variable and a variable classified into two artificial cate -
gories.

Jenkins (1956) presented a simplified version of Jaspen' S

formula for triserial r:

 

r _ Mhyh+Mm(yl yh) 'Mlyl (10)
tris - +( _ )2 +
O- y h yI yh y 1
ph pm pl

where M = mean, y = curve ordinate, and 0": s.d. of scores.

The letters h, m, and 1 represent high, medium, and low categories.

20

If the item change scores of 1, 0, and -1 are used to

designate the divisions of high, medium, and low, it is seen that

 

 

rtris could be-written as an index for change item analysis:
Mlyl + Mo‘y-I ' VI) ’ M-ly-l
r . = L (11)
tris 2 + ( _ )2 + 2
y 1 ‘y-l y1 y -1
PI po p-l

Triserial r, however, had never been used as an index for item

selection, despite its availability.

Summary

To summarize this review of literature onitem analysis
methods for change items and change score reliability, several key
points should be noted. First, in recent years there has been great
interest in the problems of measuring change and considerable con-
cern over the lack of reliability for change scores. This low reli-
ability made it extremely difficult to predict change for an individual,
or to make counseling or placement decisions based on change
scores.

Several different formulae for change score reliability were
discussed in this chapter. It seems best to conclude that when the

same form of a test is used for both initial and final testing, then

21

Bereiter' 3 Formula (8) or Webster's Formula (7) for change score
reliability, is preferable to the traditional formula, derived by
Gulliksen (Formula 6).

Regardless of which formula is used for rDD’ several
researchers have suggested that low change score reliability can
perhaps be improved through item analysis procedures. Three tech-
niques have been suggested in the literature. These are: selection
of items on the basis of observed shifts in response; selection of
items on the basis of pretest response frequency when the direction
of expected change can be predicted; and use of a correlational index
based upon correlation between change item score and total change
score.

The development of a formula for triserial correlation was
also presented and this statistic was suggested as a fourth possible
index for-item selection in the measurement of change.

Reports on studies comparing these various change item
analysis methods are "conspicuous by their absence" in this review.
It is readily apparent that empirical investigation of these methods
is essential to determine if they can be successfully used to improve
change score reliability. It was toward this end that the study of
change item analysis techniques, described in Chapter III, was

undertaken.

CHAPTER III

DESIGN OF THE STUDY

An empirical study was designed to compare the change
score reliability for subsets of items selected by four different
change item analysis procedures. The four procedures compared
were Saupe' 8 item -total score correlation for r dD , triserial corre-
lation between change item and change score, selection of items
having high variance in change scores, and selection of items on the

basis of pretest response frequency. In addition, a control method,

selecting items randomly from the original item pool, was used.

The Sample

 

In the fall of 1958, the first-term freshman class at Michi-
gan State University was tested on a variety of achievement, aptitude,
attitude, and personality measures. All freshmen were included in
the population who met the following criteria: (1) The student must
have been a first time freshman--not a past dropout or a transfer
from another university; (2) The student must have been a native born

American.

22

23

In 1961, a sample was drawn from this original population.
This groupeconsisted of students who were still enrolled in the
university at that time. These students, then juniors at MSU, were
retested on the same measures. The test ~retest data from 263

students in this sample were used for this item analysis experiment.

The Instrument

 

The instrument selected for use in this study was the

Inventory of Beliefs, Form 1.. This attitude survey was developed

 

by the Cooperative Study of Evaluation in General Education under
the sponsorship of the American Council on Education Committee on
Measurement and Evaluation. The scale was designed to measure
an individual' 8 tendency to subscribe to stereotypic beliefs (Lehmann
and Dressel, 1963).

Items on this inventory were taken from an original pool of
one thousand items, composed by a panel of counselors and evalua-
tion officers from twenty colleges which participated in the Coopera-
tive Study. The 120 statements which were selected for the final
scale were written in the form of "pseudo -rational cliches. "
(American Council on Education, 1953).

Some sample items from this inventory are:

"No world organization should have the right to tell Americans
what they can or cannot do. "

24
"We would be better off if there were fewer psychoanalysts
probing and delving into the human min ."

"When things seem black, a person should not complain,
for it may be God' swill. "

"Most Negroes would become overbearing and disagreeable
if not kept in their place. "

There were four possible responses to each item--Strongly Agree,
Agree, Disagree, and Strongly Disagree.

Two separate scoringschemes were used in this study.
The scoring instructions from the Instructor' s Manual award the
examinee-with one point for each Disagree or Strongly Disagree
response. The second scoring scheme used in the item analysis
study awarded one point for a response of Strongly Agree; two points
for Agree; three points for Disagree; and four points for Strongly
Disagree. Lehmann and Dressel (1963, p. 27) characterized the
higher scorer as "mature, ﬂexible, adaptive, and democratic in his
relationships with others; a low scorer is immature, rigid in outlook,
compulsive, and authoritarian in his relationships with others. "

Thevreliability coefficients reported for this scale in the
MSU study ranged from . 68 to . 95, with a median value of . 86 (Leh-
mann and Dressel, 1963).

The Inventory of Beliefs was considered appropriate for use

 

in this change item analysis study for the following reasons:

25

1. The scale was designed expressly for the purpose of
measuring the attainment of educational objectives. Thus
change scores over time were expected to be fairly large
and could be meaningfully interpreted.

2. The instrument was professionally developed. Considerable
effort went into construction and item analysis. Reliability
and validity for this scale had been demonstrated (Dressel
and Mayhew, 1952).

3. The internal consistency reliabilities reported were high,
but test -retest reliability coefficients after a lapse of time
were lower, thus indicating a fairly wide range of individual

differences in attitude change.

123.939.

An item analysis, cross-validation design was employed in
this study. The sample was randomly split into two groups. The
item analysis group consisted of 132 students; the cross-validation
group was composed of 131 students.

Items from the 1958 and 1961 test administrations for these
students were scored by both the zero -one scoring method and the
one -to -four method described earlier. Item change scores were

computed in accordance with Formula (4),

26

and total change scores were computed for each student.

The data from the item analysis group, scored on the zero-
one basis, was submitted to four different item analysis procedures
and a control procedure of random selection. Data scored with the
one -to -four system was submitted to three item analysis procedures
and random selection. (It was necessary to omit the triserial corre-
lation method because it was only appropriate for dichotomously
scored items.) Subsets of 15, 30, 60, and 90 items were chosen
under each procedure. These item subsets were used for computing
reliability estimates for change scores on the cross -validation group
data. The actual computation formula for the change score reliability
was obtained by substituting Bereiter' s definition of change score
reliability (Formula 8) into Webster' 8 expression for the Kuder-
Richardson 20 for change scores (Formula 7) to get a change score
version of Cronbach' s coefficient alpha (Cronbach, 1951).

2S2

d
(12)

 

i
2

ZSdC+ZCid
1 13

1753'

27

Item Analysis Procedures

 

Method I was an item analysis procedure based on the

variance of the change item scores. After the change item scores,

di’ were computed, the mean change score di and the change score

variance 82 d were found for each item. Items with the largest

values for S d were selected. On this basis subsets of 15, 30, 60,
i

and 90 items were chosen from the original set of 120 items.
Method 11 required that items for the subsets be chosen

on the basis of pretest response frequency. With this method it was

necessary to take into account the expected direction of the change.

Because the Inventory of Beliefs had been developed to measure

 

attainment of objectives of higher education, it seemed reasonable
to predict that students' scores would increase over time. (Data
from the Lehmann and Dressel study upheld this prediction.) Item
means, 551, were computed for each item on the pretest (the measure
taken in 1958, when the students were freshmen). Items with the
lowest mean scores were selected into the 15, 30, 60, and 90 item
subsets.

Method III was a correlational item analysis procedure for
which the index of item selection was the expression derived by

Saupe (Formula 9):

 

 

Items

WGI‘EE

betwee

ingtol

 

 

“ﬁns ‘
the test

data 111'

“15. :

ChOSen

28

= CxX+CyY-C 'CXy

r xY
dD
2 2 V2 2
‘\/Sx+Sy-2ny SX+SY-2CXY

Items which had the greatest correlations with total change score

 

 

 

were selected into the test subsets.
For Method IV the triserial correlation coefficients
between change item and total change score‘were computed accord-

ing to Formula (11):

M1y1 + M0(y-1 ' 3'1) ‘ M-Iy-I

 

 

rtris = 2 2 2
y 1 (y 1 - yl) y _1
0‘ — + + ——
pI po -1
Items with the highest positive values for rtris were selected into

the test subsets. This method, of course, was only applied to the

data that had been scored on a zero -one basis on the original tests.
The control method consisted of selecting randomly subsets

of 15, 30, 60, and 90 items for comparison with those which had been

chosen by the systematic item analysis procedures.

Te stable Hypotheses

 

The specific hypotheses tested in this study were:
1. The mean change score reliability for items chosen by

Method III (Saupe' s correlation between change item and

 

 

 

t‘ .
M

Using the

 

29

total change score) would be greater than the mean
reliability for item subsets chosen by any other item
analysis method or by the control method of random
selection.

2. Mean change score reliability for subsets of items chosen
by Method IV (triserial correlation) would be greater than
the mean reliability for the subsets of items chosen by
either the response frequency methods or by random
selection.

3. Mean change score reliability for the subsets of items
selected by Method I (using change item variance) would be
greater than the mean reliability of item subsets chosen by
pretest response frequency or by random selection.

4. Mean change score reliability for the subsets of items
selected by Method II (using pretest response frequency)
would be greater than the mean reliability of subsets of

randomly selected items.

Statistical Analysis

 

Two procedures were used to compare the change score
reliability coefficients computed on the cross -validation sample.

Using the first method, all reliability estimates for each subset of

 

 

 

Henn-

smde

SOD.

ences
vahda
one sa
then re

score I

 

Ofmer

withat

 

of iterrx
Tukeyv
Was us:

ofthe1

gan 81
an em;

and Dr

30

items were computed on the whole cross -validation sample of 131
students. This was to provide the best overall descriptive compari-
son.

To test the statistical significance of the observed differ-
ences in the change score reliability coefficients, the cross-
validation group was divided into smaller independent samples --
one sample for each item analysis method. These samples were
then randomly assigned to the item analysis procedures and change
score reliabilities were computed. Fisher r-to -Z transformations
of the reliability coefficients were used, and the values were analyzed
with a two —way analysis of variance. (One main effect was method
of item analysis; the otherwas number of items in the subset.)
Tukey' 8 test for an honestly significant difference (Kirk, 1968, p. 88)
was used to test the significance of the differences between the means

of the reliability estimates.

Summary

Test -retest data were obtained from a sample of 263 Michi-
gan State University students in their freshman and junior years on

an attitude survey called the Inventory of Beliefs. (These data had

 

been collected as part of a longitudinal study conducted by Lehmann

and Dressel from 1958 to 1962.)

31

The data were scored by two different methods --a zero -one
scoring method and a one -to -four scaling method. Item change
scores were computed for all 120 items on the questionnaire.

Data from half of the sample were subjected to four different
item analysis procedures and a control procedure of random selection.
The item analysis procedures used for items scored zero -one were:
Saupe' s correlation index, triserial correlation, selection for large
change variance, and selection on the basis of pretest response
frequency. All of these same procedures were used for the data
scored on a one -to -four scale, except for triserial correlation. Sub -
sets of 15, 30, 60, and 90 items were chosen by each method.

Change score reliabilities for these subsets of items were
computed using change score data from the cross-validation group.

A change score reliability version of coefficient alpha was used. A
two -way analysis of variance and a Tukey post hoc comparison test
were used to test for differences in change -score reliability for the

items chosen by different methods.

 

Result

—7

 

a one -
Control
of 15,

analys-
on ller

Change

 

item a

il‘om
r elied

rElia”

SEUM

‘" 911a]

CHAPTER IV

RESULTS

Results for One -to -Four Scorig

 

 

When the items of the Inventory of Beliefs were scored on

 

a one —to -four scale, three methods of change item analysis and a
control method of random selection were employed to select subsets
of 15, 30, 60, and 90 items. The three methods of change item
analysis were: selection on pretest response frequency, selection
on item change score variance, and Saupe' s correlation between
change item score and total change score. Detailed results of the
item analyses are presented in the Appendix.

After the subsets of items had been selected, using data
from the 132 students in the item analysis group, the change score
reliability for each item subset was computed using the item
responses of the 131 students in the cross -validation group. These
reliability coefficients are presented in Table 4. 1.

From the results presented in Table 4. 1, it is apparent that
Saupe' 8 method of change item analysis consistently resulted in more

reliable subsets of items than did either of the other two item analysis

32

33

methods or the control method of random selection. There was
little difference between the. reliability coefficients of item subsets
chosen by the two response frequency methods (Method I and
Method II); however, both of these methods resulted in higher
reliability of change scores than did the control method for subsets

of 15, 30, 60, and 90 items.

 

TABLE 4. 1. --Change score reliability coefficients computed for the
total cross-validation sample using the one -to -four
scoring system.

 

 

 

 

Number of Items
Item Analysis Method
15 30 60 90
Method I (Change Variance) . 50 . 61 . 75 . 83
Method 11 (Pretest Frequency) . 50 . 65 . 78 . 83
Method HI (Saupe' s rdD) . 63 .70 .80 . 85
Method IV (Random) . 30 . 49 . 70 . 80

 

 

 

 

 

Another point that should be noted from the data presented
in Table 4. 1 is that the differences between reliability coefficients
were greater when fewer-items were selected from the original pool.
At the 90 -item level the reliability values ranged only from . 85 for
Method III (Saupe' s) to . 80 for the control. At the 15—item level,
however, the range was from . 63 for Saupe' s method to .30 for the

control.

34

To test the statistical significance of the differences between

change score reliability estimates obtained for item subsets chosen
by the different methods, the cross-validation sample-was divided
into four random subsamples with 32 students in each group. Each
of these samples was then randomly assigned to a different item
analysis method. The reliability coefficients for 15, 30, 60, and 90
items chosen by a method were then computed using the data from
the small group which had been assigned to it. Thus reliability
estimates obtained under different item analysis methods were cal-
culated for independent samples to meet the assumptions of the
analysis of variance model. These change score reliability coeffi-
cients are reported in Table 4. 2.

TABLE 4. 2..--Change score reliability coefficients computed for

independent cross -validation samples using the one -
to -four scoring system.

 

 

 

 

 

Number of Items
Item Analysis Method

15 30 60 90

Method I (Change VarIance) . 48 _ 61 . 75 _ 34
Sample 1

Method II (Pretest Frequency) . 56 . 67 . 76 . 30
Sample 2

Method III (Saupe' s rdD) . 76 . 80 . 35 , 39
Sample 3

Method IV (Randmn) .36 .42 .64 .76
Sample 4

 

 

 

 

 

35

Fisher r-to -Z transformations of the values in Table 4. 2
were used as the dependent variables in a two -way analysis of
variance (fixed effects model) with one observation per cell (Winer,
1962, p. 217). In this analysis, item analysis method was one
independent factor with four levels; number of items was the second
factor with four repeated measures on each sample. Because there
was only one replication per cell, a Tukey one-degree -of -freedom
test for nonadditivity (Winer , 1962, p. 218) was conducted to test for
the confounding effects of an interaction in the error term prior to
running the two -way ANOVA. No significant interaction effect was

detected at the alpha level of .05.

TABLE 4. 3. --Two-way analysis of variance for the effects of item
analysis method and number of items on change score
reliability (with the one -to -four scoring system).

 

 

 

Source of Variance 33:12:: d. f. M. S. F Ratio
Item Analysis Method .622 3 .207 51. 75**
Number of Items .742 3 .247 61. 75**
Residual . 039 9 . 004
Total 1. 403 15

 

 

 

 

 

**Significant at alpha = . 01.

 

82

  
   
  
  
   
  
  
  
  

.' e alt—m'r' xii asr'Isv 9:0 to annltamotanm
to .ti's'lleznr vswe u'oVI £- 11? Sssldsi‘mv 11198! ‘
I
ywnih" ”~93 l n modem-mar. 4:10 this: (lebom ma'
‘ -' Unﬁfﬁrn >§:.".:'lrxgl; 11"!“ .ai'i‘d’ISﬂS m d .
moi mm W’

.. .. hwmoqs't 1M

«“3 ~ 2- ..z :3: 'tr I-ur'rnun

*"i'n’l! 7 I . ' . F '
in . 1 . t, .3”, I. 1 ‘3"; t‘ndknllqs‘l an” ‘
z , . ,,._ _ . . -. .u , , . w-tur. h ) -.;n~:i'yibbalo¢'
Is , ‘l _ 1'1: I' 2: ’13:.11'I': W‘H'
. ; . “-"r'. ww Md 9m
... , quart: Ia
. ‘v - “)"1‘ ,' — FM}
Jun-of: I .u_: ' ._ ~
I t
. .0
_ Q
-,_ ‘ ——‘.. " ’r

 

   

36

Results of the analysis of variance are presented in
Table 4. 3. The main effect for number of items was significant at
the alpha level of .01, using a conservative F test with 1 and 3
degrees of freedom. Main effect for item analysis method was
:61:

significant at the alpha level of . 01, using an F test with 3 and 3

degrees of freedom (Greenhouse and Geisser, 1959).

w «'14-

 

TestinLHypothes‘es for
One -to -Four Scoring

The hypotheses tested were:

1. The mean change score reliability for item subsets chosen
by Method 111 (Saupe' s r dD) would be greater than the mean
reliabilities for subsets of items chosen by any other item
analysis method or by'the control method of random selec-
tion.

2. Mean change score reliability for subsets of items chosen
by Method 1, using change item variance, would be greater
than the mean reliability of item subsets chosen by pretest
response frequency or by random selection.

3. Mean change score reliability forthe subsets of items
selected by Method II, using pretest response frequency,
would be greater than the mean reliability of subsets of

randomly selected items.

38

at II‘JUISEIS'IQ 916 eanal'mv to enum-

    

;, ‘
. ﬁlm»
In Insuilingie as»! gum lo Tednwn Tol mil. .-. '. -

    
   
  
   

E but: I MN. raw 71 ‘i-‘JUvar-Iezlrn a saint! J... I

new 170.1.”qu ataxia“- :7. --.‘ '10] tonne nlsM . .

if hm: 1’ git; L‘ 32¢»? '1 m' ‘ v: :x. .Z(‘ 10 level quII w
I» - ~51 ‘ a m; rm ., «5055119910, mobs”

o"
'I

. I j p
'tr); P589“ ywjg

3mm) in?
a-ra-.1Jnr~_z_istf!" ‘I. '

Hi:— ‘i. :I '- 'r‘ . . E ' '.‘ "4167)" £82.93?” ‘31"

(Iu'WT ., :‘f... 4'3 .. . . ‘ W71. “V3 -I-' "d
3:,“ - ' ,.. ‘ ‘ : ‘ xix-119m
_ ’7’1.$’_. . ‘ ’1
. . 11-6 -i‘_ .5178
Pt!
:9..:;_1J’;J ar'ﬁi ' . In ~11 I; "
‘M ’II
t v (‘1- . E, I, J 'M'...‘
C ‘ ‘- n' ‘
z,h;~.e
)--._b_ {-
I " 7’. .7 | ’
.‘l p
'\
‘5 I
I
’n
I ) t‘ I V A t. I
W
‘Auurtruw 'I'v -
- a
1'. I
I"
lc .-l..-,. »i ‘ Biz. .._ ,. . '.

 

     

37

To test these hypotheses, post hoc comparisons were made
to determine the significance of the differences between the mean
reliability values obtained under the different item analysis methods.
A multiple comparison test for making a series of pairwise compari-
sons, developed by Tukey, was employed (Kirk, 1968, p. 88). An
HSD (honestly significant difference) value was computed in accor-

dance with the formula:

HSD = q WM (13)
'y n

where n is the number of levels or treatments, q is a value obtained
from the tabled distribution of the studentized range statistic, and ’y
is the number of degrees of freedom associated with the error term.

Differences between the mean reliabilities for item subsets
selected by the various methods of item analysis are presented in
Table 4. 4. (Reliability estimates were converted to Fisher r-to -Z
transformations for testing. )

As Table 4. 4 Shows, the first hypothesis is supported.
Change score reliability for subsets of items chosen by Saupe' s
method is significantly greater than that of subsets of items selected
by any other method.

The first part of the second hypothesis is not supported.

The reliability for sets of change items selected for their variance

 

‘1..."- .1 Tn- '6

J-

I“

 
 
 
   
    
   
 
    
  

Y!

ulmm mswanoahsqmo nod taoq .asas'l - r
(”jam {uh «sewed esonm‘mib stir to 99m '

— L

ab(u‘b'\.r! maiden: rust? t1t919l‘1.b 9d) 'l'fl'InU Md“‘ 7 . -'
. . .4
.> 'I D
— inuqz‘rmu serene" Eu 4.21:“: 33 -' gtr‘ﬂq'ﬂ 1071851 WM .
T 6‘
Y!) I"? {91'1” -v anw 31min? 1‘ '

11’. Hit! g MG] ,A'Iz. .. _
_ V
um 11b Iabomn‘ﬂll r

t
‘I

, xo-v-n. m surname»- er» .. ”is; .

Kimmie} 91'. . .1

.1. gal: "- .. :‘Ll all!

 
   
 

1 -nt .‘L . . ‘
XV Dz: u I of . " . 'zw: .1 "WITH” 3"" " ' ‘
In'i'I: ( . ~ . ' U r If“? '0 ‘ emnwn ‘
. r ,
I. » ~ ‘m'
a 1. ' ._ . until . u
i I

   

’1' band?” :

L)" I '
ktlo

.l- .o await

.g.

. . ' ‘.r.\':{'.'.'an ’

w,

..

h

'1 .
'1“.
. .

'2 Asa-49’
-. boring! .

59?.FI‘1m - ' 7

:9 ("Li V". ‘ '. I":

 

 

   

38

is not betterthan reliability for items chosen on the basis of pretest
response frequency; it is, however, significantly greater than the
reliability of the randomly selected subsets of items.

TABLE 4. 4. --Differences between reliability estimates for items

chosen by different item analysis methods (one -to-
four scoring).

 

 

 

 

 

 

 

Change Pretest Sau e r
Variance Frequency p dD
Change Variance . 018 . 328*
Pretest Frequency .310*
Saupe r dD
Random .226* .244* .554**
*Significant at alpha = . 05, HSD = . 218.
**Significant at alpha = .01, HSD = .389.

The third hypothesis is also supported by the data. Items
can be chosen on the basis of pretest response frequency which have
higher change score reliability than an equal number of items ran-
domly chosen.

Results for Zero-
One Scoring

 

 

When the items on the attitude survey were scored on a

zero -one basis, it was possible to introduce a fifth method of item

 

 

    
   
  
   
 

feats-Id '20 stand 940 no agenda, email '80! w. . ‘9.
mi! merit 19115913 '(I’J'Irsotlmgta nsvswod ,u #3,

amen to aladdus botosisa 21“ f _‘ ‘

‘g'tI-is‘iu'x neawmd asoanm‘
men :nsns'tllb \{d mam
(when we!

amen to} as: rushes

I
' . . L .’ ' 1. a. ..
4.7!“ {mo} <JOI»I:'!:- 51:42:11.

  
     
   

I *'.1::" V ""“‘_L....—.-.:.’, I‘" ' '. Li:.._.""- -
I -C:.‘I I A |’ ,j') ‘Aﬁ.‘
‘ t «"1 ‘ ‘ }
u ' i .— - v . . w . "3:1".
’ 'I" " ' " "“7“." ..
. . . t
-. f > ‘ 5;. . wash“ . W .
'. H"
" i ' '1 -;:.~aeup91‘l
I ‘ '
. ' I m,"
I
l
_ I
' .
l' . 2519.0
‘i 11);: 't
- . 1
I . C 1‘ Int":
‘-'fl'
, .grm ”I .1 9d y“
_. ,t
. .I.
,, 71 . . . . - 15‘1“"
T
M’- {hunt I
ﬁt”.
.~ .- atlm '
wok: .— '
, - .-
(IN
. '4
'ZI,‘ - t a 1

:.i0'

 

   

39

selection (triserial correlation) in addition to the threes-item analysis
methods used for one -to -four scoring and random selection. The
change score reliabilities for the 15, 30, 60, and 90 item subsets
were computed using the responses of the entire cross-validation
sample. These change score reliability estimates are presented in
Table 4. 5. The differences between the methods of item analysis
were much less pronounced under this scoring system. In general,
however, all four methods of change item analysis consistently
resulted in higher estimates of change score reliability than did the
technique of random selection. The greatest differences, again,
were observed when fewer items were selected from the original pool.
TABLE 4. 5. --Change score reliability coefficients computed for the

total cross-validation sample using the zero -one
scoring method.

 

 

 

 

Number of Items
Item Analysis Method

15 30 6O 90
Method I (Change Variance) . 52 . 56 . 68 . 72
Method 11 (Pretest Frequency) . 36 . 52 . 67 . 72
Method III (Saupe' s rdD) .33 .49 .68 .74
Method IV (Triserial r) .37 . 56 . 68 .75
Method V (Random) . 21 . 48 . 57 . 67

 

 

 

 

 

 

 

starlet".

a". ntriioalsa mobnm bra.- gnmooe anal-oh“
at: ;.9.f._=. 1mm O“. has .95 .m'. ,8! who} 29ml

Loin:

' _
:13 A"): n'

wag];

."
V!!n«.‘a.a.to. i “7‘ " ’i - .x: ..b~. 133m ‘HIOI Ill ..

be

   
   
 
  
  
      

my)! «and: and: o: nobibbs ni (noﬂﬂnm; ,

bk£~‘».'» 21.1.0123 suit-.1“ 4,313.. n-ten01391 MI! W
«37...? 9.8 a)“ mt? ‘ ' '03‘"“'.'.'l a 0113 agenda “£8.31.

:‘rr not? '0 -:lx . " ,- i: xv ' usunsvta'lllbm “If.

' g. ! :--‘:»'{; ;_: 'r : " u béwnnwﬁnom and 9 '

6M .
.,

   
  
  

,+.§..' M. . , . .:-.vr.-:w‘I‘I-'13M¢t
:In) mmhr-J’l I0 9'

. ' haw-made j

  
   

I“: ‘03 " 'l ‘ V" I b 3.1- ’I r‘
; . .5.
- r 1‘. f. .
.h x'
-- _ m-‘i ‘ "v.31 (
(”I 3 ‘ ’ o

q
I

6

 

 

 

40

To test the statistical significance of the differences between
the change score reliability estimates of items selected by the various
item analysis methods, the cross -validation group‘was divided into
five independent samples with 26 students per sample. Each sample
was then randomly assigned to be used for calculating the reliabilities
for 15, 30, 60, and 90 items chosen by a particular item analysis
method. The change score reliability coefficients obtained on these
independent samples are reported in Table 4. 6.

TABLE 4. 6. --Change score reliability coefficients computed for

independent cross ~validation samples using the
zero -one scoring system. .

 

 

 

 

Number of Items
Item Analysis Method

15 30 60 90

Method I (Change Variance) . 50 . 67 . 72 . 76
Sample 1

Method H (Pretest Frequency) . 48 . 60 . 68 . 75
Sample 2

Method III (Saupe' s rdD) .32 .44 .65 , 74
Sample 3

Method IV (Triserial r) . 45 . 62 . 71 . 75
Sample 4

Method V (Random) _ 11 . 35 . 60 , 74
Sample 5)

 

 

 

 

 

0t

   
  
    
 
   

nauwtm zaons-mmb 5m ‘10 anusﬁlngjs “on.
awn ' .rs-r dd! {d [393091-22 eruui '20 summing up!!! 7 '
um: babivib asw quo'zg .‘Oilnbﬂi‘ﬁ 3301') 9d: ‘

.tvlthe :4: emohnm as mm 3mm. *

. .
._‘.A

volqmse dosil

wyk‘ifirmrfo-x um gtf'tiiu’ils: 0“ been ad 4") barman: . ,

   
  

‘ z t
o J;‘I
;‘.‘.»_A;Iln,. ,.- w: :3, .3 vu‘ .;-_1.:«.:—.:Y; amen 09 “‘5‘.
U... .'
“.4; ».-_ ; ,1. , mg.» 7". .-'-‘2.~'7l—,-v Jung ennui:
A.1-um; mu. aolqmc' i

.. t.-. m 4. rv . .

' 4‘ ~ .~ I. ’ ‘M:4;9bﬂl

( ,uqx

:' . ~

‘ll
- : ,
l
I‘
| .
A . V
v”! w"
‘ 1 \ l'g’ -

 

   

41

A two -way analysis of variance was performed using Fisher
r—to -Z transformations of the reliability coefficients in Table 4. 6.
2 Prior to running the ANOVA, a Tukey one -degree -of-freedom test
for nonadditivity was conducted to detect the significance of an inter-
action effect. No significant interaction effect was found at the alpha
level of .05. Results of the two -way ANOVA are presented in Table 4. 7.
TABLE 4. 7. --Two -way analysis of variance for the effects of item

analysis method and number of items on change score
reliability (with zero -one scoring).

 

 

 

 

 

 

 

 

Source of Variance Sums 0f d. f. M. S. F Ratio
Squares
Item Analysis Method .219 4 .055 7. 857*
Number of Items .914 3 .305 43. 570**
Residual . 085 12 . 007
Total 1. 218 19
*Significant at alpha = . 05.
**Significant at alpha = . 01.

Using the conservative F—test with 4 and 4 degrees of free-
dom, the main effect of item analysis method was significant at the
alpha level . 05. The effect of number of items was significant at the
alpha level of .01, using a conservative F-test with 1 and. 4 degrees

of freedom .

 

 

 

42

Testi_n_gr Hypotheses for

 

Zero -One Scoring.

 

When the items were scored on a zero -one system there

were four hypotheses of interest.

1.

The mean change score reliability for items chosen by
Method 111 (Saupe' s correlation) would be greater than the
mean reliability of item subsets chosen by any other item
analysis method or by random selection.

Mean change score reliability for subsets of items chosen
by Method IV (triserial correlation) would be greater than
the mean reliability for the subsets of items chosen by the
response frequency methods or by random selection.
Mean change score reliability for the subsets of items
selected by Method I (using change variance) would be
greater than the mean reliability of item subsets chosen by
frequency of pretest responses or by random selection.
Mean change score reliability for the subsets of items
selected by Method 11 (using pretest response frequency)
would be greater than the mean reliability of subsets of
randomly selected items.

A post hoc comparison test for differences between means

was employed to test these hypotheses. Tukey' 3 test for an honestly

    
 

.
R
»
~
II
a
,

l
H
.5

U4

n'

a

."

G
N

‘1

g.
3

'tﬂj‘i’ﬁl-F‘!‘ .- "IQUBZ’ II! M,“

u‘f.‘ 5‘3“" ‘1»: ». in g 4;" htfl', ;.
“I
.; z: . t 1 >2! " - = . , use. :0 'nitldsilL-‘l m.
" ‘T - '-' “ ' ' l {re-rt? m ninth!!!
-"- "" ‘ W -." W . ‘ _ m. min :1. (mail
Furl) ’3 a c 3 m. ' . g, j! 50,1,th {d
3 1“ f-k ‘ v .(1. 1 ",L‘l"? 3d:
‘ L“ ' I u'h-y'l
.' v, F v. I ‘ ’-“'I_
I r ‘N 1 J k .- . ~‘P
It a
‘v 1 7‘. ~ ”7 ‘ l A ’0 71¢"
'1 V I ’ a: u
a c | ~_ I . ,. I . . . x. '
‘ w
an ' *u ‘ . . .

 

     

43

significant difference was used for making the pairwise comparisons
between means. Results of these comparisons are reported in

Table 4. 8. Three methods of item analysis were significantly better
than random selection in producing reliable change scales. These
were: selecting on change item variance, selecting on pretest
response frequency, and triserial correlation. Saupe' s correlational
method did not produce results that were significantly better than
random selection. Only one significant difference was found between
the item analysis methods themselves. Selection of items on the
basis of change variance was found to yield higher mean change

reliability than selection on the basis of Saupe' s r dD'

TABLE 4. 8. --Differences between mean change score reliabilities
for items chosen by different methods. (Scores are
Fisher r -to -Z transforms. )

 

 

 

Change Pretest Triserial
. Saupe r
Var1ance Frequency r
Change Variance
Pretest Frequency .057
Saupe' s rdD .178 , .121
Triserial r .041 -. 016 -. 137
Random .285* .228* .125 .244*

 

 

 

 

 

*Significant at alpha = .05, HSD = .201.

 

80

saoahzsqmoa saint-sq mnmnlsm 101 ban:

   
  
   
   
  
  
   
   
   

n? behoq'u 91-. anoumaqmos sued: in

I ‘-
V

mum? vhnsoﬁingia ensw 31211835 and: 30 show " '
rue-3:61“ .aolrxn: «3:15:13 91:55qu1 gnfoubowq at w
lasing 2m pnhHJ-m detains. £11931 syuda lb.

. N

. u

'7LI-o': m». 1:71‘9tv':!'11 [X18 . ’ ' '
7 ‘Il

r-r‘L '
when 9'3an id " ‘1»

Hart? ‘11:}.W’5 H. r:3i'un;31« 21"..- .=. ‘. .
‘ .. s
‘ 1' ’ " 7.5 ’ ‘ z.‘ uro vino ruined. ’1

,

["- K’Uﬁlzrl'vm 2. "‘-rfr.;.<

". ‘\l I 1L" VIII, "f’£"“l 17:.If.‘ ‘tII
Vi} .‘I‘ a ,I .. l. m‘ t ., ‘_.. '. ribrniyufn mad“. ' H
"v' (I. -
..\ .I...
~. «. ,1 .. .7) . . u | at v 53.1mm?! sands
‘. "-
. 310.} vain m1!!! 4 v
-. , l 'E
l,- a ‘ j. y’! .t_,
y- I
N- I- - ' -
1.x. .
. |
_ ..... .. V , . )‘ ~.. ‘ . - -~
-. ;','F"v’9'. .
V
n "g'a! l ’I
~ 7
1:1 a ’ .
. ' I
.fu‘IO -~
‘|
LLx‘

 

44

Thus the first hypothesis is not supported. Saupe' 3 method
of change item analysis is not superior to other item analysis
methods, nor is it better than random selection, in choosing items
for reliable change scales.

The second hypothesis is partially supported. Triserial
correlation is better than random selection, but is not superior to
response frequency methods for selecting change items.

The third hypothesis is also only supported by the fact that
using change item variance as an index for item selection is better
than random selection. This method, however, is not significantly
better than selecting items on the basis of pretest response frequency.

The fourth hypothesis is upheld. Items can be chosen on
the basis of pretest response which have significantly higher change

score reliabilities than items which are randomly chosen.

Summary

Results of the two -way analysis of variance and Tukey
post hoc comparisons showed that for the one -to -four item scoring
system:

1. Saupe' s r was superior to random selection and to both

dD

selection for change score variance and selection on pre-

test response frequency.

 

M

[mfmru a 'swqusa
reievfme .mri 19:30 r..‘

a {'14}: .511 ‘::' W!

:sdt .ma'i .
mum ._ .7

:w

'«"-J«|.~’.u.MI;.» W- '

I I I'
| r
A‘i‘ ;' 1‘ J t I 1
‘ z)" 1'.
1 'c
r u"\ I
.V.
Ali‘l.‘ "
‘ "'4' l

.bxnmmue son at 333910qu .
writ-wry” Ion at a ,, .‘ '--

*i ..|n'7'.‘,‘1{vf£ (1101111741 mad! 1wu'ﬂn -.

   
  
  
  
  
  
  
  
  
  

.aslml O

- r».:>r'.‘0Q'-ejd m
“mu: R811: 7:33:96 '4 I’
~ ' uhcdyam vunsnpoﬂf
«Ivor, .' '2 1:0 9“
'=:r"s".' :nsz’i 9 ’
[IONS-JOE m
- --. use and

Mr: an

 

 

45

Both pretest response frequency and the change variance
methods were better than random selection for choosing
reliable change item subscales.

Selection on change item variance'was no better than
selection on pretest response frequency in providing reli-
ablechange scales.

When the items were scored dichotomously, the results

showed that:

1.

The methods of triserial correlation, selection for change
variance, and selection on pretest response frequency
were all superior to random selection of items for change
scales.

There were no significant differences between these suc -
cessful methods of item analysis.

Saupe' s r was not significantly better than random

dD

selection of items for measuring change.

   
  

f, , ‘P.£'-~"

LN" IJ'VHUVW’U 11' J'ft-IML';7;~"-L -...0"'-;.W 29.5qu no HOBO“

,.... ‘3‘)! u
. ':_=: -' ;. ' .»;3.-nr~r‘. ..
1’) '1‘.) 357!» if in“

‘.' ‘11: 1
PT! (“I'FT f "[j '.Y-»

 

eonsi'uav egusxlo 9th bna '(Jneupa'xl 91m.”
' gniaoorio 10'! stontxeluz vuobr‘n’; and: 13:19d 019'

.a. fueling men 9M:

1:11

11 i
1.1!!
, "u -, - ‘. wn’ um 5W!“
' ." K‘ﬁf‘f‘! UV
.1 . now
. 1.. ,
. - _ L ‘ |‘
I L 7'?"
V ill'n1 .
u ,
inf” '
".0...
1% .
..
~.

6;

   
   
 
   
    

,f‘
11")

u--- v
.111.

r.-,_-.!, awedo m .: n. .

a.‘
V
x.

12153:! 93mm: OM") I -

n‘

 

"war Ran“ 5d! 005w

 

 

CHAPTER V

SUMMARY AND CONCLUSIONS

Summary

In recent years researchers have become increasingly
interested in the problems of measuring change. Low change
score reliability has presented a particularly challenging problem
to researchers in this area. Bereiter (1963) suggested that item
analysis techniques could be applied to change items in an attempt
to improve change score reliability.

A review of the literature revealed that several techniques
for change item analysis were available; however, there was a
dearth of empirical research to demonstrate the effectiveness of
these procedures or to compare their ability to increase change
score reliability. The four methods of item analysis suitable for
change'items were: selection on the basis of change item score
variance; selection on the basis of pretest response frequency;
selection on Saupe' s correlation between change item score and total

score; and selection on triserial correlation. (The latter method

46

 

 

47

was restricted to the case where items were dichotomously scored
on each occasion.)

An empirical study was undertakentto determine whether
these methods of change item analysis could'lead to theselection of
more reliable subsets of items than could items chosen by random
selection. Comparisons between the various methods were also
made. The sample used for-item analysis and cross -validation was
a group of 263 students at Michigan State University who had been

tested on the Inventory of Beliefs as freshmen in 1958, and who were

 

retested on this attitude survey in 1961.

Half of this sample were assigned to an initial item analysis
group. On the basis of their responses the four item analysis pro-
cedures were carried out and subsets of 15, 30, 60, and 90 items
were selected by each procedure from the original pool of 120 items.
In addition, a control procedure of random selection was also used
to choose item subsets.

The items selected by the item analysis procedures were
then scored for the cross-validation group. Two change score
reliabilities were calculated from these responses. First, all
reliability estimates were computed for the entire group of 131
students. Secondly, the cross -validation group was divided into

smaller independent samples and reliabilities for item subsets chosen

  
   
   
    
 

19:9:de salary-19196 o: nsxmshnu new thon-
‘, 'In non'mI-va mi: of hp»! blum atavjsns meal a” ‘-
moons° rd rvquodu a ﬁrm}: 511K") mad: amen To ”W‘
l ' . . ‘ J
”Pl. m mu anuﬂfv-Tr dut)!‘u.—./ *HJ- nos-mad SWIM: ~'
3*.1.’ “mi: 1212? .3 ﬁrm. (“I k up ,’ .4 1131111)! Mal!”
{Y'J‘fvd be: o-Iw v.3 21“.: '. l -."‘7. xx; "'7 f»: ﬂushed. 3.“

2n, L 1' mo 1* ‘7' 3 .. “ ‘ - ' "F ' . 231C301“ V“
. \I.
'(l' ,‘ .__-" ’BII ‘ A.1“' 1‘. , ‘ (I 9 ‘ I “ '. 7", .ﬁ- '.i?-.’-d 9mm

: 1"8’1 ns'

111.: men 980‘“
. 3‘ "
.v -. 1. . _. , ' 1 9 r 93-9_1--.r{'T . '

(I._ ,1. , . A, _. ,. , . . z-nl b19109.“
I .
-.; . 1 ; (“viz/u ' . ‘ - ' ' -“ ﬁﬁﬁdimi

 

Zr» L .. v'-‘.tl'.‘ . v' 1 a." ”'1" ﬂuid“
..2: 5:33;: >:. ' Cm?» _ '1 I . " 7‘ .830“

I .

1,?!” :1 t'F, “ . - '11 in -- ' -1344)??? Im-

     

48

by different methods were computed on independent samples.
Hypotheses were tested by using a two -way analysis of variance with
post hoc comparisons.

The results of the analysis showed that when the items were
scored on a one -to -fourscale, the three methods of item analysis
used resulted in significantly higher change score reliability than
did random selection. Saupe' s r dD was the most successful in
producing high change score reliability. Selection on the basis of
pretest frequency and change score variance were equally effective.

When the items were scored on a zero -one basis, three
methods of item analysis resulted in greater change (score reliability
than did random selection--selection on change variance, selection
on pretest response frequency, and triserial correlation. One
method (Saupe' s rdD) proved to be no better than random selection.
No significant differences were found between the three methods

which were successful in improving change score reliability; they

were equally effective.

Conclusions

 

From this comparative study of change item analysis tech-
niques, several conclusions can be drawn.
1. It is possible to produce more reliable instruments for

measuring individual change if items are selected through

 

8)

-. aol’qmss ‘mabasqs but as 135111qu 919'

riJiw araxuﬂ-mv to amglam; 3:? w—owt a gnisu '{d w

-~' 4; 2mm av? ,~!.-. writ hﬂWOfls wa‘dkm. ed: lo

:0

,— Q
- “r.

_ (45:1: rrwJ:

I).
.1 L. {ti—Wm“: eta-‘11? )1" lgfﬁﬁﬁ 1u°,od-~ .
- l

s“
.‘_

r1311} '51r1n151'52r1 4:0

   
   
  
    
 
  

, a , "IJQU'D‘HHSI. at,
n- iu'ér'aw." :2 )7. . - -- ‘uvgluu". (10119910.
2. {U QC v4”. 1‘ ,2! -> ' _ ‘\.: ’2'. 3833‘” M ,7
[.L‘ .3
"9V1? |‘-:)'3 '4.- " - ' ‘i 'v I 1 '27 Elf? 1".1L‘JZ'N1’M
_ . :v
«1.: . «Ti-371 an) :19er -
. H ';- . v f
t. J.
.i u ’.
.9; P1. - ‘ ‘ ”
:5" LI?" '7' q ‘ '

 

 

 

49

the systematic item analysis procedures suggested in this
study. -

2. When a wide-range of item responses is permitted (such as
when items are scored on a one -.to -four or one -to -five
scale), the methods recommendedifor use are Saupe' s r dD’
selection on the basis of large change variance, and selec-
tion on the basis of pretest response frequency.

3. When the range of item responses is restricted to a
dichotomy, the recommended methods are selection on the
basis of large change variance, selection on pretest response

frequency, and triserial correlation.

Discussion

 

It should be noted that the differences between the item
analysis methods and between item analysis methods and random
selection were more dramatic when a smaller number of items was
chosen. This probably would indicate that change item analysis
techniques would be most useful when a small portion of items are
chosen from a larger original pool. It appears, however, to
contradict that fact that no significant interaction was found between
number of items selected and the method of selection. In view of

this it is likely that a Type 11 error occurred in the Tukey

 

   

 

50

one -degree -of-freedom test for interaction. Even if this were the
case, the results of the analysis -of -variance can be accepted with
confidence since the presence of an undetected interaction would
have resultedin an overestimate of errorvariance and, hence, a
more conservative statistical test for the main effects of number of
items and method of selection.

Another point that should not be overlooked is that, statis -

 

tically speaking, most of the items on this scale functioned effectively.
Less than ten items were found which had negative triserial r or
r (D values. It is not unreasonable to speculate that if there had
been a higher percentage of pooritems on the test, the item analysis
techniques might have worked even better, and/ or differences
between the techniques might have been more apparent.
Theoretically, it is not surprising that differences between
change item analysis techniques seemed to be greater‘when the one—
to -four scoring scheme was used. When there was a greater possible
range of response, there was a greater possible range for the
variances of the change scores and, consequently, a greater possible
range for change-item covariances. Thus a better distinction could
be made between "good" and "bad” items. Under the dichotomous

scoring system, the items tended to appear more similarwith

regard to their change variances and intercorrelations.

 

  
   
    
   
     

9th 919w aids-ii newt! .mﬂamaim 101“

. ‘ a
riﬁw beta-32 9d net) iﬁu‘lﬁi’iﬂV-30- Bird” 0‘, - '
biuew non-mist“! 1154393911“!!! no lo mann“.
I '1‘? 1

u .s'meﬂ. .bne vomit-x9..- é ': 13 lo mama's-mun“
‘rn wrfmuz‘. PO 2.1:.«2'1'»: (MEET! M! '10) 31-9! isouwm
nomads: lo
a, -
’4’]: "- ,‘5 ’ x ‘f Lint! .. m '81: Linc-‘3 1:730 mic-.1 M". ‘j'
- fcvx' :- 13: n..- .»r. «' o :9 . 5.3.‘3J 7 n 1038001 V .’
wa'. ' ‘L'n‘ --‘:.aw await 8‘.
" :.I'i-' ' - 2mm". Id 1 I". df 31 .

.." . .. . , . , ,. ‘5‘131'.')~~_'qu 33d“'

11 213131 .

f“ 11")
l' "5'; r] ‘ u,1"‘1">"
tr‘. , l l 9;.
n' . y 5‘le
/: . .v~
‘l ,.
uf;:J}1'-m-' .n: W. -- '. .v- -
. A .. . 1,. . ‘. . , . . .
. . 1
4 ’-
LIL-“III f/ 3 ,- ,. y ‘ ‘ , ‘.. a; 'IC‘o ”#7. -
. "
it '
.
' ‘
i’.‘ 1 ' ' H» n r5“
" f»
. ..
4- r ; . . , r 5:! ~

 

 

51

From a practitioner' s viewpoint, several of the findings
of this study can be applied to the area of constructing instruments
to measure change. First, it appears that change item analysis can
be a profitable approach to solving the change score reliability
problem. Certainly researchers should consider using these tech-
niques when constructing new instruments to measure change or
when forced to shorten an already existing scale.

Second, the use of a multiple aresponse format for items
seems to allow a more sensitive observation of change and makes
the selection of a method of change item analysis an important
consideration.

A third point, having great significance for the longitudinal
researcher, is that considerable .tirne and expense might be saved
by selecting items on the basis of pretest response alone. It should
be remembered, however, that this can only be done when the

direction of change can be predicted in advance.

Implications for Future Research

 

This study has been somewhat of a "pioneer exploration"
into the area of change item analysis. It has revealed, nonetheless,

that empirical research on techniques for selecting items to measure

 

I!

  
  
   
   
  
  
  
  
   

agnibnﬁ 9‘} 10 12-19st .mtoqw-ah u '
ammummni magmatic: 10 5012. ed! 07 ”MINI
ass) autism: mm“: smut-n can! aquqqa u .rrm! .
gamma: 91093 agenda mi: yu‘vloe m (to

- duo: and: Quinn 1911f'61101' L-zzsnriv msdtnsoa” ,

. ~13“! OJ an -r:!."f:.m WM! 3.1330“

'- ., dim-15:. cu harm‘s ‘

3 ~. ,r-irzsl

m 13"3311".

- nan mi LL'V'HV. {181:0i.'%'9'3-‘3?'.gl.",1.'zx .; '19 59-1.! to!" .

-Ltrmno in ”our: --z- ’ a -: ‘fi’ 310!“ I we“ -

_:-. ‘f‘ .' ';‘..Lu\rL‘5['ZﬁYO

HO .
39..
Uiilii‘l’ 3!: 1' " ‘ 1 . g ‘ 213"»; 1311!” A ”a I

'uii: 2! .1 H ‘ ‘

  
   
    

v-j. v .
{,1 M41}. . 1n}: ' j-l '. ‘71’ aﬂm
'9’ ' H . ~ 4.. \ h-g‘XSdm
who if; no” ‘ 3“
1".“‘3 ' I .Y ' 1 ﬂ}.
Q- a ‘ Ftv W
:31: I ‘ .

 

 

52

change can be a profitable venture yielding useful information for
test construction.

Only four possible methods of change item analysis were
compared in this study. As other methods are deVeloped, they
should be systematically compared with these techniques. Two
possible new methods which could be considered are factor analysis

and multiserial correlation. Change item scores could be subjected

 

to factor analysis and chosen on the basis of their factoraloadings,
just as regular items are often selected. This requires, however,
a much larger sample size than was employed in this study. Lange
(1969) found that factor loadings for 40items were unstable with a
sample size less than 400. Another method which could prove‘use-
ful would be the use of the general multiserial correlation developed
by Jaspen (1946). The triserial r employed in this study was a
simplified version of Jaspen' s multiserial correlation formula.
The general formula could be expanded to render correlationacoeffi-
cients for total change score and change item responses scored on a
one -to -four or a one -to -five scale.

Also, before the findings of this study can be generally
accepted, replication is needed using other populations and other

instruments .

 

33
,J
10! miinmo'ml {class gmble‘hj mum»: . -
2"‘1'
"A I ‘h-

' '7‘
M

mg"
I . t,‘
{MU .imqolovgh 9m abmuom 15mm HA .M ' '

i

u-mw aiazlsns. man 53.05.19 'to abacus": ' W. .

cw!” .21 )Uy‘nru n «so!!! ﬂit": he' snmno v}! ~-

94 -?'{f5.:t. awn“; a. h.) :EhEancu en‘ blur! rhlrhram
.noude’nua

. JV? bﬁlup sauna um; 33.1.54! i'

;,,';-
«- 2mm." .' “17".; .-'-'1 .. ‘ ' W; .- vamb has 33W'
._ v.7.w,~..‘ ”A .» H '5' . 1' .--, v.‘ "r: 9'1" 3:119911M
u»; 'r _ .{L - o 2:33. omens-u no
.. . v fli- .: i! “TWIN-33811.4.
l 'q:4 t h ' a “ 3391
rm... . ’ , . «a: MD
I 1‘
.~;.' ‘ "‘
, ‘11 I ‘1 . ,y '
., 'lw'H‘ m . T
w .
,, . ,

  
    
  
  
  

  
 
 
  
  

 

53

An actual study of the content of the change items themselves
was not within the scope of this investigation. It remains a very
important, but, as yet, unexplored area. Cox (1965) warned test
constructors to remember that item selection onstatistical criteria
alone might change the nature of the test by eliminatingitems
designed to measure a specific objective. He proposed a method to
use in conjunction‘with statistical item analysis to insure that items
were maintained in the test to cover all vital objectives of the evalua-
tion. Such selection of items to measure objectives could be
practiced equally well with change items now that feasible, statistical
item selection techniques are available.

A final implication of this study goesbeyond the area of
constructing instruments to the broader area of measuring change.
This study has demonstrated that researchers need no longer fear to
undertake a study of individual change because of the insurmountable
problem of low change score reliability. Through item analysis,
instruments can be developed which will provide reliable measures

of individual change.

 

   

 

BIB LIOGRA PHY

 

 

 

BIBLIOGRAPHY

American council on Education Committee on Measurement and
Evaluation. Instructor' 8 manual for the inventory of
beliefs. Washington, D. C. , 1953.

 

 

Bereiter, Carl M. Some persisting dilemmas in the measurement
of change. Chapterwl in Harris, Chester W. (Ed.)
Problems in measuring change. Madison, Wisconsin:
University of Wisconsin Press, 1963.

 

Cox, Richard C. Item selection techniques and evaluation of
instructional objectives. Journal of Educational Measure -
ment, 1965, 2, 181-185.

 

Cronbach, Lee J. Coefficient alpha and the internal structure of
tests. Psychometrika, 1951, 16, 297 -334.

 

Dressel, Paul L. , and Mayhew, Lewis B. General education
explorations in evaluation. Washington, D. C.: American
Council on Education, 1954.

 

 

Greenhouse, S. W. , and Geisser, S. On methods in the analysis
of profile data. Psychometrika, 1959, 24, 92-112.

 

Gruber, H. E. , and Weitman, M. Item analysis and the measure-
ment of change. Journal of Educational Research, 1962,
6, 287 -289.

 

Gulliksen, Harold. Theory of mental tests. New York: Wiley,
1950.

 

Horst, Paul. Multivariate models for evaluating change. Chapter 6
in Harris, Chester W. (Ed.) Problems in measuring
change. Madison, Wisconsin: University of Wisconsin
Press,. 1963.

 

54

"I11” {:‘1 3301.18.18

    
       
 
 
   
     

. . Hr x. -.- . . m 9-41Unmn'9noiuaubf‘! no
'rr ‘.‘-_.,-,'.'v»~‘ ('1 «'- ‘101 {rungﬁu a “ 'f‘JLI'iJJ-enl .8“

. ,-,1 , '~ .. wmnidasw

:1. . .‘ .' '.:.."M§.LJ "12,;"qur‘l’lﬁa
’ l: 3) 1 J 1' ‘ ' ' hmd';
, «3'11 a..'-;'asﬁ.'x_fl_l 1.
u W.','}l_7.‘7va}"“o -
_U
,n'. m u , :‘ w. mail 1) In ~ --
r .-.v ~'

. , _, ., - -. 1') luxciiomjlnl )_
‘ 7 i . ,-.' ,.‘L .(IOCI .3118” '
.~*

r':=- ». 99.1 ,
n“ " 9.1293 '
1.,‘7‘ 7"" l , v j " h ' V rh’aq
, ‘ . ,- g . , . miqxa
{,r'uro'f)
A
_f.‘~1
"[*' , 9 1 .‘NRUOII A .
1' . - - I-'.‘ I >
I
" .k a
9 V .'
a} ‘
z: i
1": .g .1
Li _. , - ,
Lri.r 'A ‘

   

 

T
f.”
'53:. . c

I- A: 9

:,. “-92
__ _ ‘
.l -—I ‘ '

 

55

Horst, Paul. Psychological measurement and prediction. Belmont,
California: Wadsworth, 1966.

 

Jaspen, Nathan. Serial correlation. Psychometrika, 1946, 11,
23-30.

 

Jenkins, William L. Triserial r--a neglected statistic. Journal
of Applied Psychology, 1956, 40, 63-64.

 

Kirk, Roger E. Experimental design procedures forthe behavioral
sciences. Belmont, California: Wadsworth, 1968.

 

Lange, Allan L., An empirical study of sampling error in factor
analysis. Unpublished doctoral dissertation, Michigan
State University, 1969.

Lehmann, Irvin J. , and Dressel, Paul L. Changes in critical
thinkinggattitudes, and values associa—tea with collegg
attendance. Final Report of Cooperative Research Project
No. 1646. East Lansing: Michigan State University, 1963.

 

 

 

Lord, Frederick M. The utilization of unreliable difference scores.
Journal of Educational Psychology, 1958, 49, 150-152.

 

Lord, Frederick M. Elementary models for measuring change.
Chapter 2 in Harris, Chester W. (Ed.) Problems in
measuring chang_e_. Madison, Wisconsin: University of
Wisconsin Press, 1963.

 

 

Lord, Frederick M. , and Novick, Melvin R. Statistical theories
of mental test scores. Reading, Mass. : Addison Wesley,
1968.

 

 

Magnusson, David. Test theory. Reading, Mass.: Addison Wesley,
1967. .

 

Saupe, Joe L. Technical considerations in measurement. Appendix
in Dressel, Paul L. (Ed.) Evaluation in higher education.
Boston, Mass.: Houghton Mifflin, 1961.

 

Saupe, Joe L. Selecting items to measure change. Journal of Edu-
cational Measurement, 1966, 3, 223-228.

 

 

   
   
  

.lwmlﬁﬂ .noHths'xn has minimum-3m {sol
’ a J.”

_11 «MT ,gfji’qammi'y'ajj .u.;m.is'-.1nalnh‘

.c‘?
.

H
I!
d
-:'a
,.J
.—2
L.

1 [21111101. onetime Dvi'rpluzm
L- “h M .‘M ' .zgutodu a!

£1,921,334 Air"? mi L '.L« W 1 db e935. _. {5.1.3.3119
‘33”! . N'vfix-".’ :"Tl ‘- :L 9" )m

A .' .. . , ' -". 11’1"?“ “A .J

'L,_:,.‘:'HU‘ «W . €.--:i.‘I..-unU . '
. $2.: {NEHWI'IIU , A '

.7i ‘1'. " ‘I LAT'W Uh“. ,.LMV'I]

.. .

. . -1 -. L - ‘ 71:: ”mm
' L ‘. .:£ .dn‘bI

 

' T
9 . “10!.
2 “. )l'l
9., -
(1
I .
' o r')

 

 

56

Shoemaker, David M. Note on the attenuating effect of zero -variance
items on K R-20. Journal of Educational Measurement,

1970, 6, 255 -256.

Tucker, Ledyard; Damarin, Fred; and Messick, Samuel. A base-
free measure of change. Psychometrika, 1966, 31, 457-
473.

 

Webster, Harold, and Bereiter, Carl. The reliability of changes
measured by mental test scores. Chapter 3 in Harris,
Chester W. (Ed.) Problems in measuring chang_e_. ~ Madi-
son, Wisconsin: University of Wisconsin Press, 1963.

Winer, B. J. Statistical principles in experimental desiggg New
York: McGraw-Hill, 1962.

 

comma

v- owes to murk- guusuuwnz art: no M..

m m1 mus - 1M 1 m 1199103 10 Ismael. . "I
__ .—.—————~——..-—-—- m- ‘F‘
4"
-9?t—11 r. ,I'nums’” ,H'Jia'a-h’. has .5151 11115111.“.
—T‘ .1 1?? _0?Ql‘ twp! 1111\11'1'1'1‘1 Honda 10

 

..1 '. 1113:6123 MT 1214.! ﬂunk-1985315
9111911; .11 '* ‘2: ."...-*;-.. ' .1910‘;P1‘}I [8399!!! Yd M

.11 ."V . 1:11;“ ."r . "cl. in? "'1 i-T; (v40 '1': (.ba) W
1‘7"" "~-'~ :“ '1. :1 ’. 9* ._.1 :1 :uiamw.

v‘)‘ ' ,‘-- 1'59: . '1'"

> g 7 31593112111138 .5,
‘11 I-wle‘: '

 

   
    
   
 
 
   
    

 

APPENDIX

 

I. . z . .

‘1‘. 1“ .0 400». ‘
.f' .
a. I. .

P».

‘l

- 1. W .
. - Q- m-" .I” .. r. . ..
.. . .,M1MNN..,....v

I Vt. I

 

 

APPENDIX

The 120 items on theInventcgy of Beliefs, Form 1, used in

 

this study are listed here. Following the listing of items, Tables A. 1
and A. 2.are presented to indicate the subscales- in-which each item
appeared. Percentages of overlap of items between the scales
selected by different item analysis methods are reported in Tables A. 3

through A. 10.

1. If you want a thing done right, you have to do it yourself.

2. There are times when a father, as head of the family, must
tell the other family members what they can and cannot do.

3. Lowering tariffs to admit more foreign goods into this country
lowers our standard of living.

4. Literature should not question the basic moral concepts of
society.

5. Reviewers and critics of art, music and literature decide what
they like and then force their tastes on the public.

6. ' Why study the past, when there are so many problems of the
‘ present to be solved.

7. Business men and manufacturers are more important to society
than artists or ‘musicians.

8. There is little chance for a person to advance in business or
. industryunless he knows the right people.

57

J‘JU'IIBq‘IA

1:1 Duau .1 nn'w'n". a; ~.!'~LI m ‘I‘iwuwrt. '10? no and”
. ‘rnj-iJ '51.? 91' w ‘1'”. 9191! w

. ‘E'I’Ilri ‘ "

   
  
  
   
  

In ': 1z' Fifi-H ; 31' . '» up “S “5'11 (1' home-m

.,~,1-,:. ',, m. ' 1 . 1 u_:; ' - ’- 1",: augﬁlﬁ’to‘l"
gt ; ,1 .1 _ 1 A. . 1:.» {no}. :noxentb ﬂ

0] .ﬁ

1 ’ ‘1 L‘ t7 ”18".“

.1 - . ~13" 15 9M
. . ,1 -~.-rv‘n gawk

~v ~, . .1 gnmaN'
’ ‘ an» 819”}-

‘ I. ‘~"-1:‘1”u.
vial?“ ,. '

'~'-:\ .. . ' ’ ' ‘. l;"NilV’8 .“
. ‘ ‘l'lnllftga I‘.

. 1 ' can: '{d' . ,.l
' , .1 manom'

.‘.‘ l)‘, '1? 1 1“ 1_ ‘ u , ' 1 "”11“. g,
”9:" -‘ and!

' ‘ we!!!" .
d' ' A

 

     

10.

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

58

Man has an inherent guide torright and wrong--his conscience.
The main thing about good music is lovely melody.

It is only natural and right for each person to think that his
family is betterthan any other.

All objective data gathered by unbiased persons indicate that the
world and universe are without order.

Any man can find a job if he really wants to work.

We are finding out today that liberals really are soft-headed,
gullible, and potentially dangerous.

A man can learn as well by striking out on his own as he can by
following the advice of others.

The predictions of economists about the future of business are
no better than guesses.

Being a successful wife and mother is more a matter of instinct
than of training.

A person often has to get mad in order to push others into action.

There is only one real standard in judging art works—-each to
his own taste.

Business enterprise, free from government interference, has
given us our high standard of living.

Nobody can make a million dollars without hurting other people.
Anything we do for a good cause is justified.

Public resistance to modern art proves that there is something
wrong with it.

Sending letters and telegrams to congressmen is mostly a waste
of time.

Many social problems would be solved if we did not have so many

. immoral and inferior people.

8?.

  
    
  
  
  
  
   
  

331191931100 aid-~gao'lw has "1311 o: abut. :
.vgbolem {19on 2! 9181919 boo: ”lo.

9.111 mm 11:16.1 0: Hum-M1 does 101 3113?: has
3191110 ml M

- {.1 . iii oisoti‘m :e:zu.-.'~..q beenl'dnu 21,14 usual!!!) 8‘
ammo Sunni-w 315 M“
'11:; r' 911 .i dnt 8 hill!

mu: 0) «java.

1.“ 1.111 '. 1.7 '91-"; m: ”I!”

 

'.‘!'1 9'1/.L‘"‘L IV I.
,sr~ ,.;...11:1~..:e.roqm.
1,. 1" :9, 1.. - ' 9» as maelm~
J 1.1:.“
' l 1 l
3 ‘ l .4"
4

1.1111110an & .
1 o

1 0

inc 31910“ *1

Jar.) rum I!“ .

11"
. - 1 - . -v.'4.9nu|‘. ‘

1 - ”'7" ‘10 “am, "-

""1950" '4‘

gristvvnA .u

~ 1'“ "JUM .“

v' 2,1101. '

7.

ri

     

26.

27.

28.

29.

30.

31.

32.

33.

34.

35.

36.

37.

38.

39.

40.

41.

59

Art which does not tell a human story is empty.

You can't do business on friendship: profits are profits; and
good intentions are not evidence in a law court.

A person has troubles of his own; he can' t afford to worry about
other people.

Books and movies should start dealing with entertaining or
uplifting themes instead of the present unpleasant, immoral, or

tragic ones.

Children should be made to obey since you have to control them
firmly during their formative years.

The minds of many youth are being poisoned by bad books.
Speak softly, but carry a big stick.

Ministers in churches should not preach about economic and
political problems.

Each man is on his own in life and must determine his own
destiny.

New machines should be taxed to support the workers they dis -
place.

The successful merchant can' t allow sentiment to affect his
business decisions.

Ministers who preach socialistic ideas are a disgrace to the
church.

Labor unions don' t appreciate all the advantages which business
and industries have given them.

It' s only natural that a person should take advantage of every
opportunity to promote his own welfare.

We should impose a strong censorship on the morality of books
and movies.

The poor will always be with us.

 

   
 
  
   

' .qu-‘a :«u Tibia-06ml! I u"
‘.‘-.
.‘w .eté‘ir-urz vr-x ah‘m'xr; :qtriabnsmlﬂou- -~ ' -
,Huon wnl :3 mi a'unobtvs mu m N 3

1 7' ”]

mm of l/mifr»: r 'rtc- ) mi ,awr IL! lo calm I My
ﬁrst
. . .~' . ‘ .

'i. ,
’ 2.. nails-25» PM)? bluoda servo -"- ..
'J fo bzmmm 858'.“ F
.. ' ‘-

  
 
 
 
  
   
   

.'.I-~'x~ ..;

\
:1

. . . . -..::s:.. «I bmoda
...;?cv-'I'u,'*. :wr't gar!“ J
7 «V

Cl! at: ‘mcu‘r to chat. '-

. “1‘s“,

   
  
   
    
    

t!" 41:30:!

1,! _ 3 . ' , ‘V-; f' ""‘ IX! Bj-’ " " :0
~ Y‘:-u‘- mom '-"' '
. ﬁt".

(- ; . - ; r .‘ nun M ,
'{rMé‘ ' .-
‘ K

‘ -
‘ detain VI
.h'uﬂfs

81:7“); :1 ' - ,' . ' xndu
. alba-

-‘.,_ - . .._.‘, a u“ .-
r‘vuqqo -_

 
 
  
 

m:

}I.L‘:F.. .

   

   

 

42.

43.

44.

45.

46.

47.

48.

49.

50.

51.

52.

53.

54.

55.

56.

57.

58.

60
A person who is incapable of real anger must also be lacking in
moral conviction.

If we allow more immigrants into this country, we will lower
our standard of culture.

People‘who live in the slums have no sense of respectability.

We acquire the highest form of freedom when our wishes con-
form to the will of society.

Modern paintings look like something dreamed up in a horrible
nightmare.

Voting determines whether or not a country is democratic.

The government is more interested in winning elections than in
the welfare of the people.

Feeble -minded people should be sterilized.

In our society, a person' s first duty is to protect from harm
himself and those dear to him.

Those who can, do; those who can't, teach.
The best government is one which governs least.

History shows that every great nation was destroyed when its
people became soft and its morals lax.

Philosophers on the whole act as if they were superior to ordinary
people.

A woman who is a wife and mother should not try to work outside
the home.

We would be better off if people would talk less and work more.

In some elections there is not much point in voting because the
outcome is fairly certain.

The old masters were the only artists who really knew how to
draw and paint.

 

 

   

 

 

59.

60.

61.

62.

63.

64.

65.

66.

67.

68.

69.

70.

71.

72.

73.

61

Most intellectuals would be lost if they had to make a living in
the realistic world of business.

You cannot lead a truly happy life without strong moral and
religious convictions.

If we didn't have strict immigration laws, our country would be
flooded with foreigners.

When things seem black, a person should not complain, for it
may be God' 3 will.

Miracles have always taken place whenever the need for them
has been great enough.

Science is infringing upon religion when it attempts to delve into
the origin of life itself.

A person has to stand up for his rights or people will take
advantage of him.

A lot of teachers, these days, have radical ideas which need to
be carefully watched.

Now that America is the leading country in the world, it' s only
natural that other countries should try to be like us.

Most Negroes would become overbearing and disagreeable if not
kept in their place.

Foreign films emphasize sex more than American films do.

Our rising divorce rate is a sign that we should return to the
values which our grandparents held.

Army training will be good for most modern youth because of
the strict discipline they will get.

When operas are sung in this country they ought to be translated
into English.

People-who say they' re religious but don' t go to church are just
hypocrites.

 

   

 

74.

75.

76.

77.

78.

79.

80.

81.

82.

83.

84.

85.

86.

87.

88.

62

What the country needs, more than laws or politics, is a few
fearless and devoted leaders in whom the people can have faith.

Pride in craftsmanship and in doing an honest day' s work is a
rare thing these days.

The United States may not have had much experience in inter-
national dealings but it is the only nation to which the world can
turn for-leadership.

In practical situations, theory is of very little help.

No task is too great or too difficult when we know that God is
on our side.

A sexual pervert is an insult to humanity and should be punished
severely.

A lot of science is just using big words to describe things which
many people already know through common sense.

Manual labor and unskilled jobs seem to fit the Negro mentality
and ability better than more skilled or responsible work.

A person gets what' 3 coming to him in this life if he doesn' t
believe in God.

Public officials may try to be honest but they are caught in a web
of influence which tends to corrupt them.

Science makes progress only when it attempts to solve urgent
practical problems.

Most things in life are governed by forces over which we have
no control.

Young people today are in general more immoral and irrespon-
sible than young people of previous generations.

Americans may tend to be materialistic, but at least they aren' t
cynical and decadent like most Europeans.

The many different kinds of children in school these days force
teachers to make a lot of rules and regulations so that things
will run smoothly.

20

  
  
  

.wa'l H 2} ,E'Jillﬁq ‘10 3176‘ 0m 9106: .m
.'.J.'>“n ovum-l m: . _.!qr.3': of; (30.4w m 3195-31

s at how .1 "{le manor! ns gniob at but -

     
   
    
   
      
  

. 0115

~ .,. '1 I :1. -%g.l'.v!'lr."!;-' nf-ium bnd 9le mm '{sm 3”»

-. .1“ .1... a (J :‘f‘wvv N‘ .w'ﬁzr': {Ii «(U 31 H Nd ..
.qidns-

“1 . 'f. ,1 .‘1-'; 41 out!!!“
I

42-, T"!'.il'11h m :3913 001 at ‘
'8
7»

. “ Q
11“.“ w "; . :' 'i'il. .-‘.E x'wv'wq I. F

r ‘J"“133 10’“:
-.:z. rlqoeq ’

   
 
 
 
 
 
 
 
 
    
   
  
    

,;, W. - _ t, m. mm.” _
a"! vuudm , -

.' 1': .zvd‘xﬂ
ms) rat we“

- 1‘ ~.-‘~f
. F.
v" 'u odM-x . '~,a

.. month 3‘- ‘ '

‘ ; '7 )179k93
'4 ul‘Js'Iq

. ,.
1“ Q

2."! mold ..

“'rnt on '

g!

.-. . ' ~ . ',‘),',!1L'0Y .ﬂ

, , ., _ Md}. ‘

     
 
 

\ ‘

r
3
"(3i " '1'

 

can“ 3 . . , . . .2311?
. ,
.

 

89.

90.

91.

92.

93.

94.

95.

96.

97.

98.

99.

100.

101.

102.

103.

63

Jews will marry out of their own religious group whenever they
have the chance.

The worst danger to real Americanism during thelast 50 years
has come from foreignideas and agitators.

Europeans criticise the United States for its materialism but
such criticism. is only to cover up their realization that American

culture is far superior to their own.

The scientist that reallycounts is the one who turns theories
into practical use.

No one can really feel safe when scientists continue to explore
whatever they wish without any social or moral restraint.

Nudist colonies are a threat to the moral life of a nation.
One trouble with Jewish businessmen is that they stick together
and prevent other people from having a fair chance in competi-

tion.

No worldorganization should have the right to tell Americans
what they can or cannot do.

There is a source of knowledge that is not dependent upon obser-
vation.

Despite the material advantages of today, family life now is not
as wholesome as it used to be.

The United States doesn't have to depend on the rest of the world
in order to be strong and self -sufficient.

Foreigners usually have peculiar and annoying habits.

Parents know as much about how to teach children as public
school teachers.

The best assurance of peace is for the United States to have the
strongest army, navy, air force, and the most atom bombs.

Some day machinery will do nearly all of man' 3 work, and we
can live in leisure.

 

88

{5WD '1 swutmtw qUo'lg gum-31191 NW 119!!! ht.

;;.1;'.iulb mandamus-u". 1801 d

.a'u‘Jz-‘Jigl‘. has assbi “3101* ‘
t “.0
131?. bmzu'J 9d: MW‘ .1,

.n‘: 7212.] 9:0

.4 ' ' 0
,.J, :1".

mt. .‘z'iE-ILI'LHLHI 9,7.
t “(pr Tani? 7".'(I.’=':\i;'(;»"‘ 'li‘L'i! (1' nave») 01 [Inc ﬂ . '-
F-W'"

\

o ,{> 114‘ 1'; '1‘ K ' n: '. 'U‘3': (lib-91 "d’ "
.9311 l» -
' i. Tf‘ "7 ‘ 11 ")5! VHBS‘! I”
; 1 win-175w narw {9:8 1
' 4'! "rmflluim

um: .sfd

' .‘.z‘.9‘:c- N101!
{5.11 w .'

W '1.:'

1.5

(M

- ' . a~‘dod'¢l§l~' .

I t
L.

." lab-IO a ' It“

a I¢II"_

 

'ztutl 'J! 1081“”. 1" I

~' ' 11;!" ’ﬂﬂm‘f

"MU MIT

 
 
   
   

‘ .

  
 
  

O

     
 
   
    
  
  
 
  
   
  
 
 
 
  
 

., 3‘

\‘r

a: 81563:, _
1100"” L ‘..‘ t

‘1’ I"

: -. rzqaaﬂ" '

' "313101 I 7

we I“
.i 00119.

til 3“
9 "gnoﬂlv 4.. l

.Tr‘.

I

b
‘
I

  

 

104.

105.

106.

107.

108.

109.

110.

111.

112.

113.

114.

115.

116.

117.

118.

64

There are too many people in this world who do nothing but
think about the opposite sex.

Modern people are superficial and tend to lack the finer
qualities of manhood and womanhood.

Members of religious sects who refuse to salute the flag should
be punished for their lack of patriotism.

Political parties are run by insiders who are not concerned with
the public welfare.

As young people grow up they ought to get over their radical
ideas.

Negroes have their rights, but it is best to keep them in their
own districts and schools and to prevent too much contact with
whites.

The twentieth century has not had leaders with the vision and
capacity of the founders of this country.

There are a lot of things in this world that will never be
explained by science.

Sexual relations between brother and sister are contrary to
natural law.

There may be a few exceptions, but in general Jews are pretty
much alike.

The world will get so bad that some of these times God will
destroy it.

Children should learn to respect and obey their teachers.

Other countries don' t appreciate as much as they should all the
help that America has given them.

We would be better off if there were fewer psychoanalysts
probing and delving into the human mind.

American free enterprise is the greatest bulwark of democracy.

 

ma gamma ob mtw bi'row am: 111 ”spot;

L‘Lutrt‘k.

0' N n .

M

  
   
     
  
     
    
  
   

me on! »-

want! on! 540.3! d: brie: bns'lsrszl'qua an. m»
.(morfvsmow has : , .. ., ‘_

I
.4

“.0 L
main; a: 0231'} .0 JtI-W ammo weigh-3h. '. v

Had-"rug" lo .433! 1190‘” - ~

' I
\ v' new in: "2 :21: {-518 29%;” _

1.“! 3119' a

I J
3:.“ 'l:'

' ‘. _: 'mr .25.! mug ulqosq
, . . . . . “.1318 'WIzrl soot .
. . . ; .' Jainsahha»
‘. . ‘.
(«I ;~
~ ; Mamie" m
2'. ‘2 {ﬂow 0'
l
. . ‘ ? . I: ‘- L: snuff}
,1 Wauialqu'ﬁf
l-suxsa
‘ Isn'dlﬂ-

.7 .
‘ '1.!

 

65
119. If a personis honest, works hard, and trusts in God, he will
reap material as well as spiritual rewards.

120. One will learn more in the school of hard knocks than he ever
can from a textbook.

  
  
 
  

um :‘N .1300 ni aiam: has .lrmd um , ., .
ab':8w91 laurmiqz u not

'mvu ~1sri nut: EN’WXL‘I 1:11:11 'to looms; 5d: at 010‘,
.i

 

 

66

TABLE A. 1-. --Listing of subscales in which each change -item first
appeared after-item analysis with one -to -four scor-
ing system. *

 

 

 

 

Item Analysis Method
Item I ‘
Number Change II III IV
Variance Pr etest Saupe 1‘ Random

1 " 6° 90 60
2 " 15 -- ' 15
3 15 5° 90 9o
4 30 90 __ 3O
5 15 90 30 -_
6 3° -' 90 9o
7 ‘ ' - - 60 60
8 " " -- 3o
9 30 3o _ _ 15
1° 9° 60 60 --
1 1 6° 60 60 90
12 “ '- -- '30
13 90 15 __ 60
14 90 __ 30 90
1 5 60 60 60 60
16 -- 90 __ 3o
17 60 90 __ 15
18 60 90 __ 60

 

 

 

 

 

*The numbers in the table indicate the scale length-when the
item first appeared. If an item is included in a scale of 15 items, it
is obviously included in all scales using the same procedure which
are of greaterlength.

 

   

.‘aﬁl maxi-spams dose {faith at CMME
vnoa wol- ot-aao mw eievfsas malt m

g

 

boxhell r-Iavlanlx meal

 

 

 

 

. l .
v'l 1 H: g H
. . o _
.‘x:ub-.;s.h I 1 m ‘ 139:9?!
‘ I
A---“ J v- .1- -_1 -1 - J
1‘ I '1 .-
1
l .
J
V ‘ ‘v‘;~
I

v" -
l

,,

,
I
.
"t s ..
Lb‘l
I r ’1.: 5 f, . _-.’.

TABLE A. 1. --Continued.

67

 

 

Item Analysis Method

 

 

Heni I
Number Change Priie st Savage r Rarﬁi’om
V ariance

19 30 15 30 90
20 30 60 30 --
21 60 90 90 --
22 60 90 -- 90
23 -- -- 90 60
24 60 -- 90 30
25 15 60 15 --
26 60 90 -- 6O
27 15 60 60 90
28 -- -- 90 90
29 3O 30 60 --
3O 60 15 90 90
31 30 30 60 60
32 30 60 -- 30
33 15 90 -- 15
34 15 15 -- 60
35 90 -- 30 -_
36 60 90 90 90
37 90 90 30 --
38 15 15 -- 90
39 -- 30 60 60
40 30 60 90 30
41 -- 15 -- 15

 

 

 

 

1'8

 

 

 

 

bother/i am;th mull

 

__ --- ~._ ~-.» r

w
I. ,,
"U «I'M _ .r.
1. v
L
.
till Ilrlix<t
m
—
t
» 1.

 

 

68

TABLE A. 1. --Continued.

 

 

 

 

Item Analysis Method

Heni I

Number Change 11 III IV
Variance Pretest Saupe r Random

42 60 9O -- 6O
43 90 -- 60 --
44 -- -- 90 90
45 15 60 30 90
46 30 60 90 60
47 15 90 90 60
48 60 -- -- 3o
49 60 60 -- 15
50 -- 15 60 60
51 30 -- -- 15
52 15 90 -- 90
53 90 15 30 90
54 90 60 60 --
55 60 15 60 60
56 90 30 60 30
57 -- -- 90 60
53 -- -- 90 90
59 90 90 90 90
60 30 30 -- 15
61 90 3O 90 -—
62 90 30 60 90
63 90 60 15 60
64 30 90 60 30

 

 

 

 

 

 

 

    

 

-,_-__._. no... 7 4"
TWA‘“ 9M aiavimA
may]

4 .

H?
I
[l

amtmn.“ '

-—7‘_ . ‘ 1 aqua?
__£._ 7 I

m} j V - ‘_ J teed-31‘!

‘ ~m——J
”In t t.“

- l o.

‘ 08

 

(‘7‘

 

 

 

69

TABLE A. 1. --Continued.

 

 

 

 

Item Analysis Method

Zheni I

Number Change II III IV

Variance Pretest Saupe r Random

65 -- 30 -- 15
66 60 90 60 60
67 90 60 3O --
68 90 -- 90 90
69 15 30 15 90
70 90 60 60 --
71 60 15 60 60
72 90 60 60 3O
73 3O 90 15 15
74 60 90 -- 60
75 60 60 90 90
76 60 60 60 --
77 -- 90 30 90
78 60 15 30 --
79 60 60 90 60
80 90 -- 15 --
81 90 -- 3O 15
82 60 90 60 60
83 90 60 90 --
84 -- -- 60 --
85 60 -- -- 60
86 60 -- 60 --
87 -- 90 15 60

 

 

 

 

   

-_. _
-9 _
e 17‘3“” .r.
'2.* "
v‘ - .'

h -
\ '-n a -
‘ ‘v

s -

1'!

 

210:9»ng ezzaszA (115:1

_, . -i ”I arméwj

1‘: [l

Johan! . . Miami. ! taw'vl'i
.--L ,. 7- -,- -- ,----.-,-_.___.1
f
l ' Hf
{A
l
{l ‘ r

 

    

 

 

 

70

TABLE A. 1. --Continued.

 

 

 

 

Item Analysis Method

Item I

Number Change II III IV
Variance Pr etest Saupe r Random

88 9° 30 15 __
89 “ -- 9o 15
90 so 60 _ _ 60
91 9° 60 15 -_
92 15 15 15 6O
93 -- 90 15 --
94 " 6° 90 60
95 90 90 15 --
96 6° 9° 90 30
97 so 60 _ _ 15
98 “ 60 60 60
99 " -- 90 -_
10° " -- 90 _-
101 90 — — - _ 60
102 15 _ _ 15 90
103 60 90 90 60
104 -- 60 60 30
105 30 __ 30 15
106 -- 90 30 90
107 '- 90 60 __
108 " 60 90 -_
109 -- __ 15 90
1 10 90 _ _ 90 90

 

 

 

 

 

01'

 

045713.134 5111»:th n19?!

 

‘ I
t
11 - III
w.v‘~.'n..' 1‘. 1 aqua”.

0 1

II
“Ski“!

 

-. -.\ . _._ ___._. -_._‘...-._.v._. 0.4..-- -4

..
,.
.
7 I
.‘c
l 1,.

   

 

 

 

71

TABLE A. 1. --Continued.

 

 

 

 

Item Analysis Method

Item I

Number Change 11 III IV
Variance Pretest Saupe r Random

111 90 3o __ 30
. 112 15 30 __ 15
113 60 i 90 15 -_
114 90 90 15 __
115 -- 15 90 90
116 90 3o 30 __
117 90 -_ 60 90
118 60 3o 90 90
119 90 15 60 30
120 15 60 .60 60

 

 

 

 

 

IT

 

 

 

1111.1? -= ’v’ Elev! 311A (1198!

 

---m- ._._..

 

 

72

TABLE A. 2. -- Listing of subscales in‘which each change ~item first
appeared after item analysis with zero -one scoring. *

 

 

 

 

Item Analysis Method
Item I
Number Change p II III . IV. V
Variance retest Saupe r Tr1ser1al r Random

1 60 60 90 90 60
2 “ 15 -- -- 15
3 15 60 -- -- 90
4 30 -- 15‘ 60 30
5 15 90 90 15 __
6 ‘ " “ 90 60 90
.7 90 _- 15 15 60
8 90 -- -- -- 3o
9 60 15 -- -- 15
1° 30 60 60 60 __
11 30 60 -- -- 90
12 " " -- -- 30
13 -.. 15 90 90 60
14 " “ 90 90 90
15 30 90 -- -- 60
16 __ " " -- 30
17 30 90 90 __ 15
18 60 60 -- -- 60

 

 

 

 

 

 

*The numbers in the table indicate the scale length when the
item first appeared. If an item. is included in a scale of 15 items, it
is obviously included in all scales using the same procedure which
are of greater length.

ﬂ‘

  
    

nan?! 11m: , egnsrf') ﬂoss doidw at aolsoadm b-
" 33:11032 atm— o'r'w I‘wa ete'dmw run)? 1033. '

 

 

 

 

 

 

 

 

 

 

. ._-._ ".1 _..& m
£1chle etz‘ﬂsm': 1.1911
1 -, -,,,--.‘..._._.,_..... . __..
; ' T
f 1 " I TU 11
~11; :1}. 1‘ 1'7! 119.1" '1" I sqw'ﬁ 1 Ias’a1q
' ! i
_ I a _
- _ _. _.__1. - - i‘ - . ”1.--
1,1 f or ,‘ Uu , ()3
r ‘ «1'
i 1 (3;.
t
r‘ 1
l 1
t
..,

 

73

TABLE A. 2 . 5- - Continued.

 

 

 

 

Item Analysis Method
Item I
Number Change H III . IV . ' V
Variance Pretest Saupe r Triserial r Random
19 60 30 30 30 90
2° 3° 30 90 90 _.
21 30 90 -- _- -_
22 60 90 90 90 90
23 -- -- -- -- 60
24 90 -- —- -- 3o
25 30 so 30 30 __
26 90 90 -- -- 50
27 15 60 60 90 90
28 " " 90 90 90
29 15 60 _ 15 15 ..
3° " 15 90 90 90
31 60 30 __ 30 60
32 15 60 60 90 30
33 30 90 _- -- 15
34 60 15 -- -- so
35 90 -- 30 30 __
36 15 50 90 90 90
37 60 90 -- 90 -_
38 15 3o 60 60 90
39 90 15 60 60 60
40 60 60 90 90 30

 

 

 

 

 

 

 

 

 

 

b01851}?! 3181:1541]. met!

._..,_,.7..... --
‘ i
I
l

n
13:19-19

I
-3
1

I
«(11157.7

II.

VI.

 

 

V!

O «‘3.

06?

w.

 

  

a,"

 

74

TABLE A. 2. --Continued.

 

 

 

 

Item Analysis Method

Item I

Number Change H In . IV . V
Variance Pretest Saupe r Tr1ser1al r Random

41 -- 15 90 so 15
42 90 90 __ __ 60
43 -- _ -_ __ __ __
44 90 -- 60 60 90
45 15 60 30 30 90
46 15 60 60 50 60
47 60 90 90 90 60
48 90 -- -- -- 3o
49 90 90 90 90 15
50 _- 3o __ __ 60
51 90 -_ 30 60 15
52 15 90 60 90 90
53 90 30 60 60 90
54 30 90 60 60 __
55 30 3o 60 60 60
56 6° 5° 60 60 30
57 “ " 90 60 60
58 " “ 60 30 90
59 15 9° 50 90 90
60 60 30 -- -- 15
51 -- 15 60 so -_
62 90 30 60 90 90

 

 

 

 

 

M‘

 

 

 

 

 

,1:
u

bur}: M at;

 

 

:1

_ .l
_ .1
A
H.

Y! t

I .1

 

 

 

75

TABLE A. 2. -- Continued.

 

 

 

 

Item Analysis Method

Item I '

Number Change Preltest $511111 T ' IV' 1 1 R ‘1;
Variance pe r riseria r an om

63 60 60 ' 15 15 60
64 90 90 60 60 30
65 -- 15 60 60 15
66 60 90 60 60 60
67 90 60 30 60 --
68 -- 90 90 90 9o
69 15 , 30 ‘15 15 90
70 60 60 90 90 --
71 90 15 90 90 60
72 —- 60 30 15 30
73 90 90 15 15 15
74 90 90 -- —- 60
75 60 90 60 60 90
76 60 60 90 90 --
77 90 -- 60 ' 60 90
78 60 15 15 15 --
79 60 60 90 90 60
80 -- -- 30 30 --
81 -- 90 30 30 15
82 90 90 -- -- 60
83 15 60 15 15 --
84 90 —- 60 60 --

 

 

 

 

 

gh _ . -»+__

 

 

huff’*)'u‘: ms 1.231}: mm}!
“1--- __ __,_ -- .- ., ....

.
-
H
-
l
.—
.
J
1.
J

u r 01‘

 

 

 

 

76

TABLE A. 2. -- Continued.

 

 

 

 

Item Analysis Method

Item I

Number Change 11 III . IV. V
Variance Pretest Saupe r Triserial r Random

85 60 60 60 90 60
86 60 90 —- .. --
87 60 90 30 30 60
88 60 30 15 15 _-
39 -- -- -- -- 15
90 60 60 60 60 60
91 30 so 15 15 __
92 30 30 15 15 60
93 " -- 30 30 --
94 90 60 -— -_ so
95 90 90 60 60 __
96 90 90 90 __ 30
97 6° 3° -- -- 15
98 " 6° 30 30 60
99 " -- 90 60 __
10° -- -- 15 15 __
101 -- __ 90 90 60
102 so 90 15 15 90
103 60 60 -- -- so
104 90 so 60 30 30
105 30 90 15 30 15
106 __ __ 90 90 90

 

 

 

 

 

 

81'

 

_ _._.....__ -——.-.

 

 

b01115 N1 um 111 6319. ms”

 

 

A

.. ____.i _

1
-...__. - -‘_...a 7.. _.

 

 

#1 0 01 t .1 4
n P A.1 H
.0
.12 ,\
m 1
1 :1 3..
t l 1 1 11.

 

77

TABLE A. 2. --Continued.

 

 

 

 

Item Analysis Method
Item I
Number Change P 11 III . 1v. v
Variance retest Saupe r Tr1ser1al r Random
107 -- -- 90 90 -_
108 1 5 90 so 90 _ _
109 -- __ 90 60 90
1 10 90 _ _ 90 90 90
111 90 30 90 90 30
112 60 15 —- -- 15
113 60 60 15 15 -_
114 90 90 30 6O __
1 1 5 90 1 5 50 30 90
1 1 6 60 3 o 30 60 _ _
117 90 -- 60 60 90
118 30 15 60 60 . 90
119 -- 15 90 90 30
120 15 60 30 30 60

 

 

 

 

 

 

 

N‘

 

bod? 9M 0! at!” 111931

 

 

11
no: 91‘!
09

 

j
I
c
l
I

III
qua?

J

'1

tr
VI I
I Iai13221‘9‘1
i
T—
“ 0
1
\ I
l
d.

1
T

 

J
(Twin: a}!

 

 

 

 

TABLE A. 3. --Percentage of item overlap for scales chosen by
different item analysis methods" 15 items,

one -to -four scoring.

 

 

 

VChange Pretest Saupe r
ariance

Change Variance 32 27
Pretest Frequency 7

 

 

 

 

TABLE A. 4. --Percentage of item overlap for scales chosen by
different item analysis methods-- 30 items,

one -to -four scoring.

 

 

 

Change Pretest Saupe r
Variance
Change Variance 33 33
Pretest Frequency 20

 

 

 

 

TABLE A. 5. --Percentage of item overlap for scales chosen by
different item analysis methods-- 60 items,

one -to -four scoring.

 

 

 

Change Pretest Saupe r
Variance
Change Variance 80 73
Pretest Frequency 73

 

 

 

 

 

8?

      

vﬂ ﬂ‘)20d) eels ya LG". q:a£'\:~vu run); 10
2mm 61 -- aur-rrsm at: (16“ mi 31.0
ant-loos “wot-0%

__,. , ,._'.__;.‘._.'.;_:_'. Z‘:'Li‘:m -
agree")
'1: : 1 8 V

 

'_::-:ug\91"l 3“.“

~———-—-—* -

'

 

 

79

TABLE A. 6. --Percentage of item overlap for scales chosen by
different item analysis methods-- 90 items,
one -to -four scoring.

 

 

 

Change Pretest Saupe r
Variance
9* 1
Change Variance 30 73 3...
Pretest Frequency 73

 

 

 

 

 

TABLE A. 7. --Percentage of item overlap for scales chosen by
different item analysis methods-- 15 items,
zero -one scoring.

 

 

 

V2325; 4 Pretest Saupe r ITriserial r
Change Variance 0 20 20
Pretest Frequency 7 7
Saupe r 87
Triserial r

 

 

 

 

 

 

 

 

80

TABLE A. 8. --Percentage of item overlap for scales chosen by
different item analysis methods-- 30 items,

zero -one scoring.

 

 

 

Change Pretest Saupe r Triserial r
Variance
Change Variance 2 7 33 33 . u-
Pretest Frequency 20 23
w
Saupe r 83

Triserial r

 

 

 

 

 

 

TABLE A. 9. "Percentage of item overlap for-scales chosen by
different item analysis methods--60 items,

zero -one scoring.

 

 

 

Change Pretest Saupe r Triserial r
Variance
Change Variance 68 57 50
Pretest Frequency 57 53
Saupe r 89

Triserial r

 

 

 

 

 

08

 
   
   

zd n—Taodm 391599 10! qslvzsvo man.» a... . .. .
.smoﬁ 08- - elmdmm ale-{loos Mi.” ‘1.:
.3m‘m lac-m -

1": ':.:.::;:.r'-'j-.:_._ L--- .M-.. __.. -m
"f T ,

' 15.;1521'51‘ ' wan-R : {891*“q 1 93 J

‘ , I smhav

"7"“ - _...__.4

i

 

 

 

3' s- w . ~

 

.0 \ﬂ '\
1'91" rt" 9"

7 try-31"] 10M,
‘3’ 1
1 a 1 9w.

 

Imus-h"

...._._——.

 

 

   

81

TABLE A. 10. "Percentage of item overlap for scales chosen by
different item analysis methods-- 90items,

zero -one scoring.

 

 

 

Change Pretest Saupe r Triserial r
Variance
Change Variance 87 72 73
Pretest Frequency 76 76
Saupe r 98

Triserial r

 

 

 

 

 

 

 

 

 

 

 

 

Will!”fwﬂllﬂlﬂ'llml

3169