AN EMPIRICAL INVESTIGATION OF THE EFFECT OF
ITEM SELECTION TECHNIQUES ON ACHIEVEMENT TEST
CONSTRUCTION

Thesis for #519 Begs-93» 0? Ph. D.
MICH‘QGAH STATE UNIVERSITY
Rﬁchard Ciair Cox
€964

 

 

 

This is to certify that the

thesis entitled

AN EMPIRICAL INVESTIGATION OF THE EFFECT
OF ITEM SELECTION TECHNIQUES ON
ACHIEVEMENT TEST CONSTRUCTION

presented by

RICHARD CLAIR COX

has been accepted towards fulfillment
of the requirements for

Ph.D.

Education

degree in

    

Major professor

Date—EIJ’. éé/

0-169

     

LIBRARY
Mirh‘mn State «.

a Universitz .J

'—

 

 

 

 

ABSTRACT

AN EMPIRICAL INVESTIGATION OF THE EFFECT
OF ITEM SELECTION TECHNIQUES ON
ACIII EVEMENT TIES T CONS TRUCI‘I ON

by Richard Clair Cox

Problem
The subject—matter content and instructional objectives to

I» evaluated are identified by the test constructor. Items which

measure these instructional objectives in each content area are
created by the item writer. Usually more items than will be used
in the final form of the instrument are written. Items for the final

form of the instrument are commonly selected on the basis of the

statistical analysis of the item pool. In order that the final

form of the evaluation instrument validly measures the objectives
identified in the original item pool the method of item selection
should not have an appreciable effect on the structure of the final
instrument as compared with the structure of the item pool from

which the test items were selected. This study investigates the

effect that the statistical item selection has on the structure of

the final form of a test as compared with the original item pool.

Richard Clair Cox

Procedure

.An item pool of 379 multiple—choice natural science items was

identified. This item pool was described with consideration to the

average difficulty and discrimination levels of the items computed

for a male and female tryout group. The classification of each item

according to the instructional objective being measured was also

examined. iFrom this item pool the 100 most discriminating items

identified by the Davis index and the 100 most discriminating items

identified by the Difference index were selected to form two 100

item tests. This was done separately for males and females. The

entire procedure was repeated using data obtained from high and low.
achieving male and female tryout groups.
The 100 item tests were compared to the total item pool with

respect to the classification of items according to the instructional

objective being measured. The comparisons were made separately for

the tests constructed using the data from the male and female groups

and the high and low achieving groups.

Findings

1. Statistical selection of items from the total item pool

has a biasing effect on the selected tests. The proportion of items

in the selected tests which measure certain instructional objectives

is unlike the proportion of items in the total item pool which measure

the same objectives.

Richard Clair Cox

Statistical selection of items from the total item pool

The

2.

appears to operate differentially for male and female groups.

structure of the selected tests as indicated by the taxonomical struc-

ture of the items differs for male and female groups.

3. Statistical selection of items from the total item pool

appears to operate differentially for high and low achieving tryout

groups, both male and female. The structure of the selected tests

as indicated by the taxonomical classification of items differs for

high and low achieving groups.

4. The statistical selection of items from the total item

pool operates similarly with respect to the taxonomical structure

I

of the items selected no matter which of the two discrimination in—

dices are used as the criterion for selection.

 

AN EMPIRICAL INVESTIGATION OF THE EFFECT
OF ITEM SELECTION TECHNIQUES ON

ACHIEVEMENT TEST CONSTRUCTION

By

Richard Clair Cox

A THESIS

Submitted.to
Michigan State University
in partial fulfillment of the requirements
' for the degree of

DOCTOR OF PHILOSOPHY

COLLEGE OF EDUCATION

Department of Foundations of Education

1964

PLEASE NOTE:

Not original copy. Light and dark type.
Filmed as received.

University Microfilms, Inc.

ACKNOWLEDGEMENTS

The investigator would like to acknowledge his gratitude to
Dr. Willard Warrington, the thesis director, and to Dr. Bernard
Corman, the guidance committee chairman, for their assistance in
the preparation of the thesis and for their helpful guidance through
the doctoral program. Appreciation is also extended to Dr.Jean
M. LePere and Dr. Paul Bakan for their assistance and helpful
criticism.

The investigator is-indebted to the Office of Evaluation
Services, Michigan State University, which not only made possible
the collection of data,but also assisted the investigator in his
efforts.

The investigator is particularly grateful to his wife,
Cynthia, and son, Kevin, for their moral support and patience and

for the occasional neglect they have endured.

ii

TABLE OF CONTENTS

AcmMEmMIN'IS O O O 0 O O O O O O O O O O O O O O o .

LIS T OF TABLES . O O O O O O O O O O O O O O O O O O 0

CHAPTER

I

II

III

PROCEDURE 0 O O O O O O O O O C O O C O O 0

RESULTS AND DISCUSSION. . . . . . . . .

INTRODUCI‘I ON 0 O O I I O O I C O O O O O O O I O

Edgcational Achievement Test Construction . .
Planning the Test . . . . . . . . . . . . . .
Instructional Objectives. . . . . . . . . . .
Writing the Test Item . . . . . . . . . . ...
Experimental Tryout and Assembly of Final
Test Form . . . . . . . . . . . . . . . . .
Suggested Item Discrimination Techniques. . .
Studies Comparing Item Discrimination Methods
Assembly of Final Test Form . . . . . . .
Statement of the Problem. . . . . . . . .

Examinations and Subjects Used in the Study
Classification of Test Items. . . . . . , , ,
Selection of Item Discrimination Indices.
Davis Index . . . . . . . . . . . . . .
The Difference Index. . . . . . . . . .
Procedure . . . . . . . . . . . . . . .

O O
O O 0

Description of Total Item Pool for Males
and Females . . . . . . . . . . . . . . . .
Results of Selection of Most Discriminating
Items for Males and Females . . .'. . . . .
Results of Selection of Most Discriminating
Items by Achievement Level. . . .
Summary of Major Findings . . . .

iii

Page

ii

H

“NINTH

l—‘H
top—axlxam

16
19
20
21
22
22

29

29
32

no
#7 ‘

CHAPTER

IV SUMMARY, CONCLUSIONS AND

Conclusions.

Implications .

I“

BIBLIOGRAPHY.

APPENDIX A.

APPENDIX B.

APPENDIX C.

IMPLICATIONS.

iv

Page
49

50
52

57
63
67

80

LIST or TABLES
TABLE Page

1 Frequency of Test Items Classified in Taxonom—
icaICategorieS................ 20

2 Distribution of Total Test Scores for Males and
Femlesooooocage-cocooooooo 23

3 Means and Standard Deviations of Total Test
Scores for Males and Females . . . . . . . . . 24

A Average Davis and Difference Indices by Sex. . . 2S

5 Means and Standard Deviations of High and Low
Achievers by Sex . . . . . . . . . . . . . . . 27

6 .Average Difficulty and Discrimination Indices
for High and Low Achievers by Sex. . . . . . . 27

7 .Average Difficulty and Discrimination by Taxo-
nomical Category for Total Item Pool-Males . . 30

8 Average Difficulty and Discrimination Indices
by Taxonomical Category for Total Item Pool-
FeMIBS. Q o o O o o o o O O O O O I o o a o o 32

9 Percentage of Most Discriminating Items Classi-
fied in Taxonomical Categories by Sex. . . . . 33

10 Taxonomical Classification of Items Identified
as Most Discriminating for Males and Females
by'éach of the Discrimination Indices. . . . , 34

11 Taxonomical Classification of Items Identified
as Mbst Discriminating for Males and Females
by each of the Discrimination Indices. . . . . 36

12 Average Difficulty and Discrimination Indices
by Taxonomical Category for 100 Item Tests—
Males. . . . . . . . . . . . . . . . . . . . . 37

13 Average Difficulty and Discrimination Indices
by Taxonomical Category for 100 Item Tests-
FeMIeS. o o a o o o o o o o c o o o o o o o o 38

TABLE Page

14 Average Difficulty and Discrimination Indices
by Taxonomical Category for Total Item Pool-
High AChier-n-g Males. o o o o o o o o o o o o o 40

15 Average Difficulty and Discrimination Indices
by Taxonomical Category for Total Item Pool—
Low Achieving Males . . . . . . . . . . . . . . 41

16 Average Difficulty and Discrimination Indices
by Taxonomical Category for Total Item Pool-
High Achieving Females. . . . . . . . . . . . . 41

17 Average Difficulty and Discrimination Indices
by Taxonomical Category for Total Item Pool—
Low Achieving Females . . . . . . . . . . . . . 42

18 Percentage of Most Discriminating Items Classi—
fied in Taxonomical Categories for High and
LW AChieVi-ng Males O O O O O O O O O O O O O 0 an

19 Percentage of Most Discriminating Items Classi-
fied in Taxonomical Categories for High and
Low.Achieving Females . . . . . . . . . . . . . 44

vi

CHAPTER I
INTRODUCTION

The validity of educational evaluation depends on the extent

of correspondence between educational objectives and the instruments

intended to evaluate these objectives. It is desirable that there

be a relationship between these educational objectives and evalua-

tion procedures. It is the purpose of this investigation to examine

some commonly used test construction procedures which could have

some serious consequences for this relationship.

Educational Achievement Test Construction

According to Lindquist, the construction of an educational
achievement test consists of the following five major steps:

1. Planning the test
2. Writing the test exercises

3.3 Trying out the test in preliminary form and assembling
the finished test after tryout

4. Determining the procedures and preparing the manuals
for administering and scoring the test

5. Reproducing the test and accessory materials (33:119)
This investigation will be concerned with only the first

three steps outlined above. These are the stages in test construc~

tion where the correspondence between instructional objectives and

the objectives measured by the final form of the test could be

diminished.

Planning the Test

 

Initial planning of an educational achievement test involves
the identification of the subject~matter contentand the instructional
objectives to be tested. A two—way table of specifications is often
utilized at this stage to insure that these two aspects of test con—
tent are represented. (48 161-162) (24:50) Nelson (40:117—119) pre—

sents some examples ofsnch two—way tables in which the Taxonomy of

 

Educational Objectives (5) is used as the classification system for
the instructional objectives.

In each cell of the two—way table the number of items to be
written is specified. This provides the item writer with a guide to
insure adequate representativeness of the content and instructional
Objectives which are to be tested. This investigation will be pri~
marily concerned with the instructional objectives specified in the

planning stage of test construction.

Instructional Objectives

The Taxonomygof Educational Objectives (5) is a comprehensive
attempt to analyze cognitive objectives in a meaningful classifi-
cation system. According to the authors "It is intended to provide
for classification of the goals of our educational system. It is
exPected to be of general help to all teachers, administrators, pro~
fessional specialists, and research workers who deal with curricular

and evaluation problems.” (5:1)

3
The Taxonomy of Educational Objectives arranges instructional
objectives from the simple tothe complex. The major categories of
this classification system, arranged in order of increasing complex-
ity, are designated as Knowledge, Comprehension, Application, Analysis,

Synthesis and Evaluation.* The Taxonomy of Educational Objectives

 

thus provides a framework for the classification of the instructional
objectives which can be utilized in the planning stage of test con—

struction.

Writinggthe Test Item

As discussed above, the two—way specification table is used in
the_test planning stage to insure that the composition of the test is
representative of the instructional objectives and the content areas
to be evaluated. The Specification table pmovidesthe item writer
with a guidetx>the number of items to be written for measuring the
instructional objectives in each content area. Within this frame~
work the item writer can create test items using a variety of test
forms.

Ebel (15:193—204) describes four commonly employed forms of
"objective"**test items: the short-answer, the true—false, the mult—

iP1€"ChOice, and the matching forms. The multiple choice form,***

*For a more detailed description of these categories see the
Condensed Version of the Taxonomy of Educational Objectives.

 

**An objective test item is one which will be scored in the same
manner regardless of the individual scoring it. This type of item is
in contrast to an essay or free-response item which requires subjec-
tive judgment on the part of the scorer.

eeeThe multiple-choice test item presents to the subject an intro-
ductory statement or question followed by several responses. From these
reSponses the subject must select the most appropriate or best answer.

u,

because of the demand for machine—scorable tests, is probably the most
widely used type of test in educational achievement testing today.

Since the multiple—choice format is frequently used it has
been the object of several recent attacks on testing. (51) (4) (26)
Hoffmann (25) and Guilford (20) indicate that the multiple~choice
format is not suitable to test certain types of educational objectives.
In short, the critics feel that multiple-choice questions can measure
little more than memory or recall of specific facts; they are not
suited to the measurement of more complex objectives.

On the other hand, authorities in the test construction field
have not only emphasized that the multiple-choice question can be
more than a simple recall exercise but have also presented items
which they consider examples of thought provoking and insightful
questions. (37) (14) (18) In this study it will be assumed that it
is possible to write multiple—choice test items which evaluate educa—
tional objectives of varying complexity.

The system of classification presented in the Taxonomy of

 

Educational Objectives can be used to classify test items as well

as educational objectives. (13) (43) (46) Such a classification
requires a knowledge of the prior educational experiences of the
individuals taking the test. In order to make an accurate classifi-
cation of a test item it is necessary to know, or assume, the learn~
ing experiences which have preceded the administration of the test.
(5:51) Stanley and Bolten (46) present evidence that the classifica—

tion of test items into the major categories of the Taxonomy of

 

Educational Objectives can be accomplished with a considerable de~

gree of reliability.

The Taxonomy of Educational Objectives thus provides a frame-

 

work for the classification of instructional objectives and the test

items intended to measure these objectives. It is the task of the

test item writer to attain a high degree of correspondence between the
type of item he creates and the instructional objective to be measured.

It will be assumed for this study that the item writer has conscientiously
attempted to maintain this high degree of correspondence in the original

item pool constructed.”

Experimental Tryout and Assembly of Final Test Form

 

The usual procedure in objective test construction is to pre—
pare a larger pool of test items than will be used in the final form
of the instrument. This large pool of items is administered to a
tryout group in order to obtain certain information that will aid
in the selection of items for the final form of the test. Conrad
(9:250-251) lists seven purposes which may be served by the experi~
mental tryout, several of which stress the statistical aspects of the
test items. These statistical descriptions of test items are commonly

called item analysis techniques or item selection techniques.

Item Selection Techniques
Guilford (19) states that the first major objective of an

item analysis is to obtain objective information about the test

*This is a crucial assumption in the test construction pro~
cedure. The validity of the evaluation instrument depends to a great
extent upon this correspondence. If the assumption is violated the
final instrument could not possibly be a valid measure of the desired
instrumental objectives. If the assumption is net, it is still pos~
sible that the measuring instrument is not valid. This later condition
is the concern of the present study.

items which have been written for a test. This objective information
indicates to the test constructor ways in which the quality of the
final test form might be improved. Item analysis involves the com-
putation of difficulty and discrimination indices for each item ad—
nunistered to the tryout group.

The difficulty index of an item is simply an indication of how
difficult the particular item is EM‘the group taking the test. The
most obvious index is the per cent of the tryout group that passes
the item. (12:267) If this percentage is high the item is not very
difficult; if it is low the difficulty level of the item is high.

”A discrimination index is a measure of the extent to which
students who are judged to be good in terms of some standard succeed
on the item, and those who are judged to be Poor on the same stan-
dard, fail it." (16) The ”standard" in the above definition is often
called the criterion. The criterion may be the total test score,
in which case the discrimination index will reflect the extent to
which an individual item measures the same characteristic as does
the total test. (12:286) The total test score is commonly used as
the criterion for discrimination indices since it is readily avail—
able and because valid external criteria are often difficult to

identify.

Suggested Item Discrimination Techniques*

At least fifty different techniques of item discrimination
have been suggested. Several reviews of some of these techniques
appear in the literature. Long and Sandiford (34) list and describe
twenty-three item—validity techniques. Vernon (49) reviews eighteen
discrimination indices which he classifies according to the way the
criterion variable is treated. Davis (12) describes ten methods of
expressing item discrimination power and also provides an excellent
bibliography concerning techniques of item analysis. Guilford (19)
describes twenty—five indices of item discrimination.

Classical probability theory provides the basis for the item
selection techniques described in Solomon. (45) Statistical multi-
variate analyses and non-parametric solutions are treated in depth.

Other techniques have been suggested by Culliksen,(22:3SO-382)

MiCha€1,(36) Webster, (50) Clemens, (7) Colver, (8) Levine and Lord, (32)

Bryden, (6) and Scott. (“4)

Studies Comparing Item Discrimination Methods

The use of a particular item discrimination technique often

’ ' ' 6 to make.
depends Upon the assumptions thetest constructor is willinO

, . _. . - h-
' - - ~ , m discrimination tec
*An extenSive review of some suggested ite

' - ‘ ', three urposes
n1Ques and studies concerning these techniquesseqxe:nd yarigtv of
FiFSt: it provides some indication of the mU1tltU‘e «

- , . ' ’es
. . . 3r. secondlh 1t 59”
technlques available to the test constructt t to item dis-

. ' inen
to provide 8 current review Of the 11terature Peit Vida the ratiowale
crimination techniques. Finally, the FEVlew VllttPgoqtugv
. . ‘ 11 V "\_o
for selecting the particular technlques ”56d 1“

There are, however, many indices suited for identical situations. Try-
ing to determine which is the best item discrimination index to use
in any given situation has been the purpose of many investigations.

Barthelmess (3) evaluated ten item validity methods. The cor—
relation ratio and biserial 5 had the highest average inter-correlations
with all the other methods. When the methods were evaluated by cor—
relation with two external criteria the correlation ratio, Long and
McCall methods rank highest. The Vincent and biserial 3 were found
to be the most reliable of the methods evaluated. Barthelmess (3)
recommends the use of biserial 3 when a dichotomous method is called
for and the correlation ratio when a multiple method should be used.

For difficult items, biserial E was found to be the most
reliable and stable of five discrimination indices compared by Cook.(l0)
Cook's Index 3 was found to be the best overall index included in
this study.

Lentz,et a1, (3l) compared the reliabilities and inter—cor-
relations of four item discrimination indices. They conclude that
the upper vs. lower third method is superior to the other three
methods.

A comprehensive study by Long and Sandiford (3h) evaluated
thirteen techniques of item discrimination. They found that methods
which select items of fifty per cent difficulty appear to be superior
to those methods which do not; that the upper vs. lower technique
is preferable when effectiveness and ease of computation are the pri—
mary considerations; and that the better techniques differ little in

effectiveness.

Swineford (47) recommends the Holzinger and difference be“
txeaimeans techniques on the basis of reliability and ease of com—
putation. Eight methods were compared in this study.

Adkins (2) found little difference between ten discrimination
techniques used to select test items. Scholastic indices were used
as the validation criteria.

Pinter and Forlano (#1) report high inter—correlations between
nine methods of item selection methods. They conclude that it makes
little difference which method is used to select the most internally
consistent items.

Guilford and Lacey (21) compared six indices of item discrimi—
nation and found that test items were in approximately the same rank
order no matter which index was used. The main difference was that
the point biserial E and the Phi coefficient tend to select items
of moderate difficulty while the other methods examined seemed unin—
fluenced by item difficulty.

Humphreys (27) has compared the Phi coefficient, Flanagan's
E, biserial 5’ point biserial E and the Tetrachoric coefficient.
Point biserial E was found to be the most representative index on
the basis of intercorrelations with the other indices. With the
exception of the Tetrachoric coefficient the reliabilities of the

Rethods were comparable.

lO

Lawshe and Mayer (30) found that a high proportion of the same
items were selected by the Davis index and D—Yalues when these two
indices were used as the criteria for selection.

The reliabilities of items selected by D—Yalues and Phi co-
efficients were compared by Mason.(35) No significant differences
between the two techniques were found.

Vernon (49) computed rank order correlations between six methods
of item discrimination. All correlations were fairly high. D—Yalues
and the Davis index were almost identical. All of the indices
except the per cent method had comparable reliabilities.

Kuang (29) compared the reliabilities of biserial E’ the
Davis index and probit analysis. He found that the three methods
select a high percentage of the same items. He concludes that any
differences between the three methods are insignificant.

Adams (1) investigated the effect of item difficulty level
upon the reliability of nine item.analysis techniques. He concludes
that the upper vs. lower twenty-seven per cent, Phi, biserial E,
point biserial E’ and t—ratio and the Davis index all have comparable
reliabilities at all difficulty levels. Intercorrelation of the
nine methods revealed that point biserial E and the t~ratio were
most representative of the techniques.

The stability of item discrimination indices for groups of
different ability levels was investigated by Hall. (23) The
Flanagan index, Davis index, Culliksen index and the difference
index were all sensitive to Changes id the ability level of the

sample, Hall concluded that experimental items should be administered

ll
to subjects whose mean ability level is similar to the population
for whom the test is intended.

In general these comparative studies indicate that there are
few differences in the end results obtained by using any of the
better item discrimination techniques. The main difference seems to
be that some techniques favor items of moderate difficulty while
others are not influenced by the difficulty level of the item. Ease
of computation is often suggested as the criterion for determining

which method to use.

Assembly of Final Test Form

 

After the test has been planned, the items written and statis—
tically analyzed using a tryout group,the task for the test con-
structor now becomes that of selecting items for the final form of
the test.

The difficulty and discrimination indices described above
are often used for the selection of items for the final test form.
There is not general agreement as to how these indices should be
employed.

One important consideration is the nature and the purpose
of the test under consideration. Many educational achievement tests

are power tests in which the examiner is interested in how many

 

items the subject is able to answer correctly and not how rapidly he
can work. These tests are designed to rank the subjects on some

Specified characteristic;thus, the difficulty and discrimination

12
level of the items becomes a crucial consideration. For other
types of tests difficulty and discrimination levels become less
important.

After a review of relevant literature Adams (1) concludes
that selection of items at the fifty per cent level of difficulty
yields items of maximum validity and discrimination. Saupe (42)
indicates that in a practical situation it would be difficult to
find a set of items all of which cluster around the fifty per cent
level of difficulty. Faced with this problem the test constructor
will select those items which are the most discriminating for in—

clusion lJl the {final. forni()f ttma test.

Statement of the Problem

 

The major steps of test construction up to and including
the assembly of the final instrument have been discussed. The
subject~matter content and the instructional objectives to be
tested are identified by the test constructor. Items which measure
these instructional objectives in each content area are created
by the item writer. Fsually more items than will be used in the
final form of the instrument are written. Items for the final form
of the instrument are Commonly selected on the basis of the statistical

analysis of the item p001 after administration toa tryout group.

*Because of the relationship between item difficultyand certain
discrimination indices the selection of the most discriminating items
Will in general yield items in the middle difficulty ranges.

13

Many item discrimination indices which are used in the selec—
tion of test items for the final instrument have been suggested.
Considerable research has been directed to the comparison of these
item discrimination techniques with respect to reliability, difficulty
level and ease of computation. The effect of item selection tech—
niques on the intended structure of a test as indicated bythe com—
position of the original item pool has apparently received little
attention.

The Taxonomy of Educational Objectives (5) provides a frame-
work for the classification of test items with respect to the in-
structional objectives the items are intended to measure. It is
possible,therefore, to classify the items in the original item pool
according.to the instructional objectives they are designed to
nwasure. In order that the final form of the evaluation instrument
\alidly measures the objectives identified in the original item
pool the method of item selection should not appreciably alter the
Structure of the original item pool. The structure of the final
form of a test constructed by selection of items from the original
item pool should be similar to the structure of the original item
P001. If the item selection procedure biases the final test form
by disproportionate selection of items which measure certain in—
structional objectives the final form of the test will not validly

evaluate the objectives measured by the total item pool.

14

It is the purpose of this study to investigate the effect of
statistical item selection on the structure of the final evaluation
instrument as compared with the structure of the original item pool.
This effect of statistical item selection will be examined with res—
pect to the sex and achievement level of the tryout group.

While there has been considerable research comparing item dis-
crimination techniques with respect to reliability, difficulty level
and ease of computation, little evidence is available concerning the
effect that the sex of the tryout group might have on these techniques.
If, for the items in the original item pool, the values assigned by
a discrimination index vary for male and female groups the items
selected for the final form of a test could differ depending on the

C0mPOSiti0n of the tryout group. The possibility that males or

females do better on certain types of items and that consequently

. . . . , Q h D 1 j,
the items discriminate differently for males and females has been

OVBrlooked in the comparisons of various item discrimination techniques.

Another variable to be examined is the achievement level of

, ' , . . . . x
the tryout group. If the values aSSigned by a discrimination inde

' ' ‘ ' t ‘r tlie
Vary for high and low achieving groups the items selectei fo

. ' . . , a~ 7 In wsed
final form of a test could vary if the tryout sIUUP has C0 PL

' “ ~ v ' v' 9 students.
Of a dlSproportionate number of high or low achie.ing

The possibility that differences in sex or achievement level
have an effect on the values assigned by discrimination indices could
have far reaching implications for test construction. A differential
effect on discrimination indices by either sex or achievement level
would bias the final form of the test in favor of a particular group.
For this reason the present investigation will examine the item
selection procedure taking into account the sex and achievement level

of the tryout group.

PROCEDURE

~The basic problem of this investigation is to examine the
effect of item selection techniques on the structure of the final
test form. The present chapter describes the examinations and the
subjects used in the study. It also describes the classification
of test items according to the instructional objectives they are
dESigned to measure. Finally, the chapter describes the methods
of item selection employed and the investigation of the basic

problem,

Eﬁgminations and Subjects Used in the Study

The examinations used in this study are the end-of—term exam-

]: . o . o
nations used in the introductory natural SCience sequence at

\' - , . . . . - -
lllChl‘s’an State LniverSity.” The fall and winter term examinations

have 125 multiple—choice questions; the spring term examination

has 129 such items. These three examinations were combined to

form the total item pool under consideration in this study. This

was deemed necessary in order to have enough items so that there

*Due to dmzsecuritv involved the examinations pg£_§§ Will

not be presented in this study. Some of the types of items in—
TheSe exami-

VOlVed are presented in Nelson (38) (39) (40:117-119).
nations were selected for two major reasons. First, a large part Of
the reSPOHSibilitV for the assembly of the examinations rests with
Clarence H. Nelson, a national figure in science test writing- Secondly,
The examinations are constructed with specific instructional objectives
In mind.

16

17
would be an adequate representation of items classified as measuring
each general instructional objective.*

Items for the examinations were submitted by the instructors
in the natural science courses. An examining committee comprised of
a group of these instructors reviewed the items and constructed the
final examinations.

The reliability of each exam was computed using the Kuder—
Richardson formula #21. The reliability coefficients for the three
terms were .91, .86 and .83, respectively. These reliability co-
efficients are probably underestimates of the actual reliabilities
due to the assumption of equal item difficulty made in the calcul-
ation of the formula.

Although a time limit was imposed, the examinations were
essentially power tests and the majority of subjects responded to
every item. The examinations were scored using the IBM 805 scoring
machine. The test papers were first checked for overmarking, that
is, more than one answer for each question. Those few papers with
cwermarks were excluded from the study. The remaining papers were
scored and double—checked on the scoring machine. The score on a
test was the number of correct responses, i.e. there was no penalty

for guessing.

*See Table l.

18

The subjects which were used in the shxh'are those students
who in the 1962—63 school year were enrolled in the introductory
rmtural science sequence at Michigan State University and took the
final examination at the end of all three terms. The majority of
these students were college freshmen and sophomores.

The term-end examinations were taken by 5,028 students in the
fall term, u,2u9 students in the winter term and 3,369 students in
the spring term. Of these, 3,150 students were identified as having
taken all three term—end examinations.~ Those groups of students
who were eliminated from the study either because they were not
enrolled (and did not receive actual instruction but did take the
final examinations for credit) or because they had not taken all
three examinations were examined to see if their exclusion would
have any biasing effect on the study.

One group of students eliminated from the present sample were
those who had taken the examinations in order to obtain credit for
the Course without attending instructional periods. The effect of
their exclusions was to eliminate several of the top scores of the
total distribution since, for the most part, these students were of
above average ability.

A second group eliminated from the sample were those students
‘who did not complete the entire sequence due to academic failure.
The effect of their exclusion was to eliminate several of the bottom
scores of the total distribution. The overall effect of eliminating

these two groups was to restrict the range of possible scores slightly

19

The 3,150 students were separated by sex. From these groups
1,000 males and 1,000 females were selected at random. These are

the two major groups used in the study.

Classification of Test Iteum*

The 379 items in the total test were classified using the
categories of the Taxonomy of Educational Objectives.(5) Three
judges worked independently on this classification using the examples
presented in the Taxonomy of Educational Objectives and in Nelson.
(40:117—119) There was agreement as to the category in which a parti—
cular item belonged for approximately eighty per cent of the items.

Such a classification of test items requires knowledge of the
learning situations which have preceded the test. (5:51) Two of
the judges were not completely aware of the actual presentation of
subject matter to the subjects; hence, the discrepancies in the classi—
fication of test items. After consultation with the subject matter
expert agreement was reached as to the proper classification of all
379 test items. Table 1 presents the number and percentage of test
items classified in the major categories of the Taxonomy of Educational
Objectives. None of the items were classified in the Synthesis and

Evaluation categories. Multiple-choice items which fall in these

categories are extremely rare.

' *Some examples of the classification of natural science items
in each of the four categories of the Taxonomy of Educational 0b-
jectives are presented in.Appendix A.

20

TABLE 1. Frequency of Test Items Classified in Taxonomical Categories

 

 

 

Category Number of Items Per cent of Total
Knowledge 102 27
Comprehension 110 29
Application 91 24
Analysis . 76 20

Total 379 100

 

Selection of Item Discrimination Indices

As discussed in Chapter I, a variety of item discrimination
indices have been proposed. Some of these indices are no longer in
use today, having been modified or discarded after critical exami-
nation. Studies which compare discrimination indices indicate that
the major considerations when selecting discrimination indices should
be the effect of item difficulty and the ease of computation.

The two indices to be used in this study are the Davis index
and the Difference index. The Davis index is theoretically unin-
fluenced by item difficulty while the Difference index tends to
assign maximum values to items of middle difficulty. Both are

relatively easy to compute.

21

Davis Index*

 

The biserial E is a correlation coefficient between each
test item and the total test score. Davis (11:9) states that
biserial r is probably the most satisfactory measure of relationship
available when the total test score is used as the criterion of dis—
crimination.

Kelley (28) has demonstrated that the upper and lower twenty-
seven per cent of the total test scores provide a good approximation
to biserial E which involves considerably less computation. Flanagan
(17) constructed a table from which the value of correlation coefficients
between a test item and the total test score can be read directly.
The table is entered by identifying the percentage of subjects in
the upper and lower twenty—seven per cent who have passed the item.
The tabled values are essentially uninfluenced by the difficulty
of the item.

Davis transformed the coefficients in the Flanagan table into
discrimination indices. (11:11—15) The Davis indices range from 0
(no discrimination) to lOO (perfect discrimination). Like the Flan—
agan coefficients, they are theoretically uninfluenced by item dif—
ficulty and can be read from the table with the knowledge of the per~
centage of subjects in the upper and lower twenty—seven per cent of

the total test scores which passed the item.

*For a complete description of the Davis index see Davis (ll).

22

The Difference Index

 

The Difference index is defined by Hall (23) as the dif-
ference between the percentage of subjects in the upper and lower
per cent of the total test scores who have answered the item cor—
rectly. The percentage answering the item correctly in the lower
group is subtracted from the percentage answering the item correctly
in the upper group.

The Difference indices range from 0 (no discrimination) to
100% (perfect discrimination). Unlike the Davis indices, the Dif-
ference indices are influenced by item difficulty. Items of median
difficulty (501) will generally be assigned higher values than will

either very easy or very difficult items.

Procedure
———-—-

After the samples of 1,000 males and females were selected,
the distributions of total test scores were tabulated. Table 2

presents these distributions for males and females indicating the

number of scores that fall in each ten point interval.

23

TABLE 2. Distribution of Total Test Scores for Males and Females

 

 

Total Score Males

 

Females
330-339 . 1 1
320-329 1 1
310—319 g 4 2
300-309 10 14
290-299 24 29
280—289 ‘ 42 39
270-279 67 72
260-269 84 80
250-259 102 72
240-249 130 107
230-239 118 112
220-229 102 126
210-219 96 95
200-209 95 94
190-199 51 64
180—189 37 51
170-179 23 26
160-169 10 9
150—159 2 3
140-149 1 3

Total 1,000 1,000

‘_

24
A further comparison was nede by computing the means and
standard deviations for the two distributions. These figures are
presented in Table 3.

TABLE 3. Means and Standard Deviations of Total Test Scores for
Males and Females

 

 

 

Group Mean Standard Deviation
Males 235.86 31.03
Females 233.70 32.42

 

0n the average, the males have achieved slightly higher
scores on the combined tests. This is the usual pattern in the
natural science examinations given at Michigan State University.

The upper and lower 270 subjects (27%) in each distribution
were identified in order to compute indices of item difficulty and
discrimination. The index of difficulty for a particular item was
determined by the percentage of subjects in the upper and lower 27
per cent of the total test scores who passed the item. Thus, a
difficulty index of 100 indicates that all the students in the com—
bined upper and lower groups passed the item; an index of 50 indi-
cates that half of the students inthe combined upper and lower groups

passed the item, etc.

25
The average difficulty level of items for males was 62.10 and
for females was 61.79. The test items were on the average slightly
easier for males. This was indicated previously by the slightly
higher male achievement on the total test.
Davis indices and Difference indices were computed for all 379
items for both males and females. The average values of these in-

dices for both groups appear in Table h.

TABLE 4. Average Davis and Difference Indices by Sex

 

 

 

Item Discrimination Method Males Females
Davis Index 15.66 17.0h
Difference Index 20.28 21.34

 

The average Difference indices are higher than the average
Davis indices for both males and females. The Difference index
will,in general, assign higher values to items near the middle
ranges of difficulty than will the Davis index. The higher average
values for females are also a reflection of difficulty levels.

In order to simulate the assembly of a final test form, the
100 items with the highest Davis indices were selected from the
total item pool. The procedure was repeated using the Difference
indices as the criteria for iteu1selection. These procedures were

followed for both the male and female groups.

26

The 100 items selected by each of the two techniques were com—
pared with respect to the number of items classified according to
instructional objectives, the average difficulty level and the aver-
age discrimination level. The same comparisons were made between
the male and female groups and between the 100 item tests and the
total 379 items.

As a supplement to the major analysis the entire procedure
was repeated for samples of high and low achieving males and females
as defined by the total test scores. Those males and females who
had the top 270 (27%) scores on the total test were defined as high
achievers. The bottom 270 were defined as low achievers. These
groups are the upper and lower 27 per cent of the total test which
were used in the computation of difficulty and discrimination indices
for the total group.

The means and standard deviations of the test scores in the
high and low achieving groups for males and females are presented
in Table 5. The high achieving females have, on the average, achieved
higher scores than high achieving males. The reverse is true for
the low achieving groups. The higher overall achievement by males
(see Table 3) seems to result from higher scores by males in the

middle and lower ranges of the distribution.*

*In the discussion of students eliminated from the study it
was indicated that the Upper end of the total distribution was slightly
restricted due to the elimination of those students who had taken
the final examination without taking the actual course work. This is
a provision allowed by the waiver system at Michigan State University.
In the natural science sequence male students take greater advantage
of this provision than do females. The upper end of the distribution
of test scores for males is therefore more restricted than is that
for females. This is true since these students often obtain the
higher scores on the test.

ll

 

27

TABLE 5. Means and Standard Deviations of High and Low Achievers

 

 

by Sex
Mean Standard Deviation

Males 274.12 14.26
High Achievers

Females 275.06 lu.26

Males 197.3h 13.86
Low.Achievers

Females 193.83 13.84

Difficulty and discrimination indices were computed for the
high and low achieving groups in the manner described previously.
The upper and lower 27 per cent of these high and low groups included
73 subjects. The average values for the difficulty and discrimination

indices are presented in Table 6.

TABLE 6. Average Difficulty and Discrimination Indices for High and
Low Achievers by Sex

Difficulty Davis Difference

 

Index Index Index
Males 72.73 8.83 9.03
High Achievers
‘ Females 72.96 9.09 9.01
Males 51.22 6.38 8.53

Low Achievers
Females 50.59 6.47 8.91

28

Once again it is evident that the Davis and Difference indices

differ with varying levels of average item difficulty. For the high

achieving groups only slight differences between the two discrimina—

tion indices are apparent. For the low achieving groups,for which the

average item difficulties are close to the 50 per cent level, the

test items have, on the average, been assigned higher values by the

Difference index.

As before, the 100 most discriminating items as indicated by

the Davis and the Difference indices were selected from the total

item pool. The sane comparisons described above were made. The re—

sults of the analysis for the high and low achieving groups were com~

pared with the results obtained using the entire distribution.

CHAPTER III
RESULTS AND DISCUSSION

The purpose of this investigation is to examine the effect
of item selection techniques on the strucutre of the final form of
a test. From the original item pool smaller tests were constructed
by selecting the most discriminating items identified by two dif—
ferent item discrimination indices. These smaller tests represent
the final form of the test which are to be compared with the original
item pool. This chapter presents and discusses the results obtained
from the comparisons made between the structure of the original item
pool, as indicated by difficulty level, discrimination level and
taxonomical classification of items, and the structure of the tests
selected by the discrimination indices. The results are reported
for males and females in general and for high and low achieving

males and females.

Description of Total Item Pool for Males and Females*
The original item pool consists of 379 multiple~choice items.
The number and percentage of items classified in each taxonomical

category were presented in Table 1. Knowledge items accounted for

*In order to adequately discuss the results presented in this
Chapter it seems necessary at this point to summarize and elaborate
upon the characteristics of the original item P001-

29

30
27 per cent of the total pool, Comprehension 29 per cent, Applica—
tion 24 per cent and Analysis 20 per cent. These values should be
closely approximated by the percentage of items classified in a
similar manner for the tests selected by the discrimination indices
if these selected tests are to be representative of the original
item pool.

For males, the average score on the 379 items was 235.86 with
the range of scores going from 146 to 330. Appendix B presents the
difficulty and discrimination indices computed on each item for the
male group. These values are presented separately for each taxonomi—
cal category. The average difficulty level of the items was 62.10.
The average Davis index was 15.66 while the average Difference index
was 20.28. Table 7 presents the average difficulty and discrimina—
tion indices by taxonomical category for the male group.

TABLE 7. Average Difficulty and Discrimination Indices by Taxonomical
Category for Total Item Pool — Males

E

 

Taxonomical Difficulty Davis Difference
Category Index Index Index
Knowledge 65.98 15.25 19.28
Comprehension 63.74 17.16 22.32
Application 59.18 15.24 19.95

Analysis 58.04 14.54 19.01

___—____‘\

31

The values of the average difficulty and discrimination indices
differ with taxonomical category. 0n the average, Knowledge and
Comprehension type items have a higher difficulty level than does
the total item pool; average difficulty levels for Application and
Analysis type items are smaller than the value for the total item
pool. Knowledge and Comprehension items are easier on the average
than all the items while Application and Analysis are more difficult.

The values of the average discrimination indices for males
also differ with taxonomical category. Both the Davis and Difference
indices indicate that Comprehension type items discriminate better
on the average than the remaining types of items. Analysis items
are the least discriminating as indicated by the average values.

For the female group the average score on the 379 items was
233.70 with a range of scores from 142 to 337. Appendix C presents
the difficulty and discrimination indices computed for each item
using the female scores. The average difficulty level was 61.79;
the average Davis index was 17.04 and the average Difference index
was 21.34. The 379 items were more difficult for the female group
and subsequently discriminated better for females than for males.
Table 8 presents the average difficulty and discrimination indices

by taxonomical category for the female group.

32

TABLE 8. .Average Difficulty and Discrimination Indices by Taxonomical
Category for Total Item Pool - Females

 

 

 

Taxonomical Difficulty Davis Difference
Category Index Index Index
Knowledge 66.39 16.83 20.18
Comprehension 63.45 18.08 23.22
'Application 58.37 17.08 21.31
Analysis 57.29 15.79 20.21

 

The pattern of the values presented in Table 8 is similar to
the pattern for males. The average difficulty levels decrease with
increasing complexity of taxonomical category. Knowledge type items
are easiest while the Analysis type items are the most difficult.
Both the Davis and Difference indices indicate that Comprehension
type items discriminate better than the ayerage. Analysis and Know—
ledge type items are least discriminating for females.

Bgsults of Selection of Most Discriminating
ltgms for Males and Females

From the total item pool a test was constructed by selecting
the 100 items indicated as most discriminating by the Davis index.

A second test was constructed by selecting those 100 items indicated
as most discriminating by the Difference index. This procedure was

followed for both the male and female groups-

33

For males, 80 of the 100 items identified as most discrimina—
ting by the Davis index were also selected by the Difference index.
There were 79 common items selected by the two techniques for females.

The percentage of test items classified in each taxonomical
category was computed for the 100 item tests. These values are pre-
sented in Table 9 along with the comparable values for the total
item pool.

TABLE 9. Percentage of Most Discriminating Items Classified in
Taxonomical Categories by Sex

 

Males Females

 

Items Selected By: Items Selected By: Items In

 

Taxonomical Davis Difference Davis Difference Total
Categories Index Index Index Index Item Pool
Knowledge 24 22 24 20 27
Comprehension 38 40 29 36 29
Application 18 21 31 27 24
Analysis 20 17 16 17 20
Total 100 100 100 100 100

 

None of the structures of the 100 item tests, as indicated
by the percentage of items in each taxonomical category, closely
correspond to the structure of the total item pool. In every case,
the percentage of Knowledge items in the 100 item tests is less than
the percentage of Knowledge items in the total item pool. Compre-
hension itensare in general selected more often for inclusion in the
100 item tests than seems warranted by the structure of the larger

item pool. The selection of Application items appears to operate

34
differentially for males and females. For males the 100 item tests
reveal a smaller percentage of Application items than the total item
pool. The reverse is true for females. Analysis items in general
appear to be under-selected in the 100 item tests.

The tests composed of most discriminating items are not repre-

sentative of the total item pool from which they were selected.
These tests would not adequately measure the instructional objectives
measured by the total item pool. In general, less emphasis is given
to Knowledge and Analysis items and more emphasis is given to Compre—
hension items by the selected tests than is the case in the original
item pool.

As cited above, 80 per cent of the items selected by the two
discrimination indices for the male group were the same items. The
two 100 item tests for males therefore have 80 items in common.

The two selected tests for the female group have 79 common items.
Table 10 presents the percentage of these common items classified
in each taxonomical category.

TABLE 10. Taxonomical Classification of Items Identified as Most
Discriminating by Both Discrimination Indices by Sex

 

 

*—

 

Taxonomical Common Items Common Items
Category for Males for Females
Knowledge 21 21
Comprehension ”2 33
Application 18 31
Analysis 19 15

Total 100 100

¥

35

The values in Table 10 are reported in order to compare the
percentage of common items classified in each taxonomical category
1dth the values presented in Table 9 for the entire 100 selected
items and the total item pool. The values in Table 10 are similar
.to those yielded by the entire 100 item tests. In reference to
the total item pool, Knowledge and Analysis items are underselected
while Comprehension items are overselected. Application type items
operate differentially by sex group. This pattern is identical to
the pattern for the 100 item tests. The taxonomical pattern of com—
mon items selected by the two indices in both the male and female
groups does not appear unlike the pattern for the entire 100 items
tests.

0f the 100 items identified as most discriminating by Davis
index for males, 67 of these items were also identified as most
discriminating by the Davis index for females. Comparison of the
100 item tests selected by the Difference index indicates that 70
items selected for the male group were similarly identified for the
female group. Table 11 presents the percentage of these common items

classified in each taxonomical category.

 

\\

/

36

TABLE 11. Taxonomical Classification of Items Identified as Most
Discriminating for Males and Females by each of the
Discrimination Indices

a

 

Taxonomical Common Items by Common Items by
Category Davis Index Difference Index
Knowledge 25 23.5
Comprehension 37 40

Application 23 23.5

Analysis 15 15

Total 100 100

 

The values in Table 11 indicate that with respect to the tax—
onomical structure of the item it makes little difference which of
the two indices are used in the selection procedure. Once again
the percentage of Knowledge and Analysis items are not as large as
the comparable percentages in the total item pool while the percent—
age of Comprehension items is larger. Those items which are not
selected by each discrimination index for both males and females
seem to have little effect on the results.

The discrepancies between the structure of the 100 item tests
and the structure of the total item pool do not seem to be a func-
tion of the discrimination index employed in Hm selection process.

The average difficulty and discrimination indices for each
100 item test are presented by taxonomical category in Tables 12 and

13. Similar values computed on the total item pool appear in Tables

7 and 8.

37

As in the total item pool, the average Difference index values
are higher than the average Davis index values for both males and
females. The values of each discrimination index do not vary with
taxonomical category. This similarity across taxonomical category is
a function of the selection procedure. The most discriminating items
regardless of taxonomical category were selected for inclusion in the
100 item tests. In order to be selected an item in any category would
have to discriminate as well as item in all the other categories.
Thus, the average discrimination values for a particular index will
differ only slightly across taxonomical categories.

TABLE 12. Average Difficulty and Discrimination Indices by Taxonomical
Category for 100 Item Tests — Males

 

Test Selected By
Davis Index

Test Selected By

Taxonomical Difference Index

 

Category Difficulty Discrimination Difficulty Discrimination
Knowledge 71.17 24.83 62.00 33.50
Comprehension 63.32 23.79 59.20 32.58
Application 69.67 23.89 60.48 32.05
Analysis 68.30 24.15 62.00 33.24

¥

 

\\

38

TABLE 13. Average Difficulty and Discrimination Indices by Taxonomical
Category for 100 Item Tests - Females

 

 

 

Test Selected By Test Selected By
Taxonomical Davis Index Difference Index
Category Difficulty Discrimination Difficulty Discrimination
Knowledge 70.96 27.71 - 70.10 34.75
Comprehension 63.07 26.55 57.06 34.25
Application 64.84 26.74 58.59 35.81
Analysis 67.75 26.75 59.82 34.76

 

The average difficulty values on the total item pool (Tables
7 and 8) decreased with increasing complexity of taxonomical category,
i.e. Knowledge items are easiest while Analysis items are most
difficult. This pattern does not appear when the average difficulty
indices for the 100 item tests are examined.

For males the average difficulty levels for both 100 item
tests indicate that Comprehension type items are the most difficult.
Knowledge items are easiest in the 100 item test selected using the
Davis index while for the 100 item test selected by the Difference
index Knowledge and Analysis items are equally least difficult.

For females, the average difficulty levels for both 100 item
tests indicate that Comprehension type items are the most difficult

and Knowledge type items are the easiest.

39

The items selected for the 100 item tests are not.typical of
the items in the total item pool with respect to taxonomical classi-
fication or difficulty levels. A good indication that this would
occur is given by the average discrimination indices for the total
item pool (Tables 7 and 8). Since certain types of items are more
discriminating than others it would be expected that these items would
be selected for the 100 item tests more often than types of items
which are less discriminating.

For males the average Davis index of the original 379 items
was 15.66; the average Difference index was 20.28. The average
Davis and Difference indices for males on the 100 item tests were
24.13 and 32.78 respectively. For females the average Davis index
of the ofiginal 379 items was 17.04; the average Difference index
was 21.34. The average Davis and Difference indices for females on
the 100 item tests were 26.92 and 34.86 respectively. These figures
illustrate the increase in average discrimination values resulting
from the selection procedure.

The average difficulty index of the original 379 items for
males was 62.10 and was 61.79 for females. The same values computed
on the 100 item tests for males are 67.34 for the test selected
using the Davis index and 60.56 for the test selected using the
Difference index. For females the average difficulty level of the
100 item test selected using the Davis index was 66.26 and was 60.55
for the 100 item.test selected using the Difference index. The
average difficulty levels of the tests selected using the Davis index
increased as the most discriminating items were selected from.the

item pool. The reverse is true for the tests selected using the

Difference index.

40

Within each taxonomical category a similar pattern appears
for both the male and female groups. Those types of item which have
the highest discrimination indices in the total item pool are selected
more often than the other types of items for the 100 item.tests and
have the lowest average difficulty level of the 100 item tests, i.e.
they are the most difficult group of items selected. The Comprehension
items follow this pattern. Those types of items in the total item
pool which have the lowest average discrimination indices (Knowledge
and Analysis items) are selected for the 100 item tests less often
and are, in general, the easiest group of items selected.
Results of Selection of Most Discriminating
Items by AchieVement Level

Average difficulty and discrimination indices were computed
on the total item.pool using the high and low achieving male and fe—
male groups described in Chapter II. These values are presented in
Tables 14 to 17.

The average difficulty indices for high and low achieving males
decrease with increasing complexity of taxonomical category. This
was also true for males in general (Table 7). The discrepancy in
average difficulty levels for the high and low achieving groups is a
result of defining these groups by their scores on the total test.

TABLE 14. .Average Difficulty and Discrimination Indices by Taxonomical
Category for Total Item Pool - High Achieving Males

 

 

w n
Taxonomical Difficulty Davis Difference
Category Index Index Index

Knowledge 75.56 10.08 9.82

Comprehension 75.53 9.40 9.48

Application 69.94 9.36 10.23

Analysis 68.22 5.67 5.87

—

TABLE 15. Ayerage Difficulty and Discrimination Indices by Taxonomical

Taxonomical

_—‘
J

Category for Total Item Pool - Low Achieving‘Males

 

Difficulty Davis Difference
Category Index Index Index
Knowledge 55.29 6.35 8.55
Comprehension 51.92 6.04 8.41
Application 48.98 6.91 9.68
Analysis 47.43 6.28 8.82

 

TABLE 16. Average Difficulty and Discrimination Indices by Taxonomical
Category for Total Item Pool - High.Achieving Females

 

 

 

Taxonomical Difficulty Davis Difference
Category Index Index Index
Knowledge 76.76 8.74 8.11
Comprehension 75.89 10.39 10.95
Application 69.20 8.99 9.19
Analysis 68.21 7.79 7.20

 

42

TABLE 17. .Average Difficulty and Discrimination.1ndices by Taxonomical
Category for Total Item Pool - Low Achieving Females

 

 

a--u....._.......__.______________________________._i ——1.__==
Taxonomical Difficulty Davis Difference
Category Index Index Index
Knowledge 55.92 6.48 _ 8.97
Comprehension 51.81 7.33 9.93

Application l+6.35 6.57 9.24

Analysis 46.76 5.12 6.96

 

In the total item pool for high achieving males Analysis items

are least discriminating. When the average Davis index is considered,

Knowledge type items are most discriminating. Applications items

are indicated as most discriminating by the Difference index.

For low achieving males Comprehension items are least dis—

criminating. Application items are indicated as most discriminating

by both the Davis and Difference index for low achieving males.

For males in general (Table 7) Analysis items were least dis—
criminating while Comprehension items were most discriminating. The

values for high and low achieving males show considerable deviation

from this pattern. .Analysis items were least discriminating in

the high achieving male group but in no case were Comprehension items
indicated as most discriminating for the high and low achieving groups.

43

The average difficulty indices for high and low achieving
females are similar to those values for high and low achieving
males. Knowledge items are the easiest for these groups while
Analysis items are in general the most difficult. This was also

true for the total female group (Table 8).

In the total item pool for both high and low achieving fe—

males ' - . . .
ComprehenSion items are, on the average, the most discriminating

tYPe items. Analysis items are the least discriminating. This pattern

18 consistent with that for the total female group (Table 8).

The average discrimination indices computed on the total item

P001 indicate in general that Comprehension items are most discrimi—

Rating and Analysis items are least discriminating for the total

female group and for the high and low achieving groups. High and

low achieving female groups are more like the total female group

than'highznd low achieving males are like the total male group.

From the total item pool two tests were constructed by select—

lng the 100 items indicated as most discriminating by the Davis

ollowed for the high

and Difference indices. This procedure was f

and low achieving male and female groups. Thus,for each of the

£0”? groups two tests were constructed — one using high Davis indices

as the criteria for selection of items and the other using Difference

indices.

The percentage of test items classified in each taxonomical

of the 100 item tests. These values

Category was computed for each
the comparable values

are presented in Tables 18 and 19 along with

for the total item pool.

44

TABLE 18. Percentage of Most Discriminating Items Classified in
Taxonomical Categories for High and Low Achieving Males

 

 

 

 

 

High Achievers Low.Achievers _

Items Selected By: Items Selected By: Items in
Taxonomical Davis Difference Davis Difference Total
Category Index Index Index Index Item Pool
Knowledge 32 32 23 26 27
Comprehension 31 29 28 29 29
Application 27 29 27 26 24
Analysis 10 10 22 19 20
Total 100 100 100 100 100

 

TABLE 19. Percentage of Most Discriminating Items Classified in
Taxonomical Categories for Highend Low Achieving Females

 

 

 

 

W M
High Achievers Low Achievers
Items Selected By: Items Selected By: Items in
Taxonomical Davis Difference Davis Difference Total
Category Index Index Index Index Item Pool
Knowledge 23 23 30 30 27
Comprehension 35 41 33 33 29
Application 26 23 21 21 24
Analysis 16 13 l6 16 20

"Fetal 100 100 100 100 100

__

45

Once again the structures of the 100 item tests, as indicated
kw the percentage of items in each taxonomical category, differ
ifrom the structure of the total item pool from which the 100 item
tests were selected. The structure of the 100 item test for low
achieving males is most like the structure of the total item pool.
Knowledge items are not selected as often for inclusion in the 100
item tests while Application items are selected more often than seems
warranted by the structure of the total item pool. These differences
are slight when compared with the differences between the taxonomical
structures for high achieving males and.the total item pool.

For high achieving males the 100 item tests reveal a larger
Percentage of Knowledge items and a smaller percentage of Analysis
items than the total item pool. The differences between the high
and low achieving males are even more striking for these two taxono—
mical categories.

The percentage of Comprehension type items for both high and
low achieving females is higher than the percentage of Comprehension
items in the total item pool. Analysis items are selected less often
for inclusion in the 100 item tests for high and low achieving females
than seems warranted by the percentage of Analysis items in the
total item pool. The major difference between high and low achieving
females appears in the percentage of Knowledge items selected for
the 100 item tests. The percentage of items falling in the Knowledge
.Category for low achieving females is higher than that for the total

item pool while the reverse is true for high achieving females.

46

The relationship between the average discrimination indices
computed in each taxonomical category for the total item pool and
the subsequent selection of items for the 100 item tests is once
again apparent in this data. For high achieving males the average
discrimination indices computed on the total item pool are lowest
for.Analysis type items. The Analysis items are selected least often
for inclusion in the 100 item tests. For low achieving males the
average discrimination indices computed on the total item pool are
all similar with the Application items discriminating slightly better
than the rest. The Application items are the only type items.
selected more often for inclusion in the 100 item tests for low
achieving males than seems warranted by the structure of the item
pool.

The average discrimination indices for both high and low achiev—
ing females indicate that Comprehension items are most discriminating
and Analysis items are least discriminating. The percentage of Com—
prehension is higher in the 100 item tests than it is in the total
item pool;the percentage of Analysis items is lower.

As for the total male and female groups, the tests composed
0f most discriminating items for low and high achieving males and
females are not, in general, representative of the total item pool
from which they were selected. These selected tests could not ade-
quately evaluate the instructional objectives measured by the total

item pool.

 

 

1.
Ln

 

 

 

 

 

 

‘.
I
.1

47
SummaryOf ‘ Major F indings

In the total item pool 27 per cent of the items were classified
as Knowledge type items. In the 100 item tests selected from the
tntal iteunpool using the Davis and Difference indices computed for
the male group, the percentage of Knowledge items decreased. The
same result was obtained in the 100 item tests selected for the female
group.

In the 100 item.tests for high achieving males and low achiev-
ing females the percentage of Knowledge items was larger than that
.value for the total item pool. The opposite was true for the tests
constructed using the low achieving male and the high achieving fe-
male groups.

In the total item pool 29 per cent of the items were classified
as Comprehension type items. In the 100 item tests selected from the
total item pool using the two indices computed for the male group, the
Percentage of Comprehension items increased. The same result was ob-
tained in the 100 item test selected by the Difference index for the
female group.

In the 100 item tests for high and low achieving males the per—
centage of Comprehension items does not appreciably differ from the
percentage in the total item pool. The 100 item tests for high and
low achieving females have a higher percentage of Comprehension items

than does the total item pool.

48

In the total item pool 24 per cent of the items were classified
as Application type items. The 100 item tests for the male group has
a smaller percentage of.App1ication items while for the female group
the 100 item.tests have a larger percentage of Application items than
does the total item pool.

In the 100 item tests for high and low achieving males the per-
centage of Application items is greater than the percentage in the
total item.pool. For the high and low achieving female groups the
trend is toward fewer Application items than the total item pool.

In the total item pool 20 per cent of the items were classified
as Analysis type items. In the 100 item tests for both the male and
female groups the percentage of Analysis items decreased with one ex—
ception where the value remained the same. A similar result was ob-
tained for the tests selected for the high and low achieving female
groups and for the high achieving male group. The 100 item test
selected from the total item pool using the Davis index computed for
low achieving males was the only 100 item test in which the percentage
of Analysis items exceeds that in the total item pool.

In summary, the 100 item tests selected from.the total item
pool using discrimination indices computed for males were structurally
different from those computed for females. The sane results were
obtained for high and low achieving males and females. None of the
100 item tests closely correspond to the taxonomical structure of

the total item pool.

CHAPTER IV
SUMMARY, CONCLUSIONS AND IMPLICATIONS

This study sought to investigate the effect of statistical
item selection on the structure of the final form of a test. An
item pool of 379 multiple-choice natural science items was identified.
This item pool was described with consideration being given to the
average difficulty and discrimination levels of the items computed
for male and female tryout groups. The classification.of the items
according to the instructional objectives being measured was also
examined for the total item pool.

From this large item pool the 100 most discriminating items
identified by the Davis index and the 100 most discriminating items
identified by the Difference index were selected to form two 100
item tests. This procedure was followed using the data from the
male and female tryout groups separately.

The entire procedure was repeated using data obtained from
high and low achieving male and female tryout groups. These groups
were selected from the total male and female groups by identification

Of subjects who had either high or low scores on the 379 items.

49

 

n

bh‘

.. ,..~ ‘9."

::.:. utVH

 

 

---nv

-~.v e.

t;:. ‘

vvev .

out ’
y

. »\ _

775‘ .

"I-

0“
egg

( I)
n"

50

High discrimination values were used as the criteria for the
selection of items for the smaller tests in order to approximate a
commonly used test construction procedure. The final form of a
test is often constructed by the selection of items from a larger
item pool using high discrimination indices as a guide.

The 100 item tests were compared to the total item pool with
respect to the classification of items according to the instructional
objectives being measured by each item. In order for a 100 item
test to validly measure the instructional objectives measured by
the total item pool the item selection procedure should not dis-
proportionally select items from these categories of classified items.

Some comparisons were also made between the average difficulty
and discrimination values for the 100 item tests and the total item
Pool in order to further examine the effect of statistical item

selection.

Conclusions

The major conclusions of this study are as follows:

1. Statistical selection of items from the total item pool
has a biasing effect on the selected tests. The proportion of items
in the selected tests which measure certain instructional objectives
is unlike the proportion of items in the total item pool which measure
the same objectives. The selected tests are not representative of

the total item pool in this respect.

 

o«

 

‘r
\ uvo

Dv-~-

A..L

II'L

51

2.. Statistical selection of items from the total item pool
appears to operate differentially for male and female groups. When
the statistical data obtained from the female tryout group is used
to select tests from.the total item pool the results differ from
those obtained using the male tryout group. The structure of the
selected tests as indicated by the taxonomical structure of the items
differs for male and female groups. Application items are selected
from the total item pool more often using the discrimination indices
computed for the female tryout group than for the male tryout group.
In general, Application type items discriminate better for females.

3. Statistical selection of items from.the total item pool
appears to operate differentailly for high and low achieving tryout
groups, both male and female. The structure of the selected tests
as indicated by the taxonomical classification of items differs for
high and low achieving groups. The major differences between the
tests selected using indices computed for high and low achieving
Hale tryout groups appear in the selection of Knowledge and Analysis
items. For high and low achieving female groups the major difference
in the selection of items appears for Knowledge items.

4. The statistical selection of items from.the total item pool

a
.U

Operates similarlwaith respect totme taxonomical structure of the
items selected no matter which of the two discrimination indices are
used as the criterion for selection. The taxonomical structure of the
tests selected using the Davis index as the criterion for selection
does not differ appreciably from the structure of the tests selected

using the Difference index as the selection criterion.

52

The general results of pertinent research reviewed earlier in
'ﬂds study (p. 11) indicate that there are few differences in the
end I'Esults Obtained by using different diacrimination' indices. This
finding holds true with respect to the taxonomical structure of the
tests selected in this study.

The major difference between the selection of items by the'u«>
indices is in the difficulty level of the items selected. The Davis
index tends to select easier items than does the Difference index.
This would be expected due to the difference between the two indices

previously described (p. 20).

anlications

In. the assembly of an item pool the test constructor includes
items which measure the instructional objectives to be evaluated in
the final. form of the test. The items‘for the final form of the-
test are commonly selected from the item pool on the basis of the stat-
istical analysis of the items using data obtained from a tryout group.
Tne results of this study would suggest that the structure of a test
constructed in this manner, as indicated by the proportion of items
measuring each instructional objective, maybe very much unlike the
structure of the total item pool. Consequently, the constructed
test may not evaluate the instructional objectives in the same pro-

Portion as would the original item pool.

 

53

The practice of statistical selection of items for the final
form of an evaluation instrument is seriously questioned by the re—
sults of this study. Statistical item selection alone is not sufficient;
other variables should be considered.

It_has been shown.that in the total item pool the average
discrimination values differ for the four major categories of items
classified according to the instructional objective being measured.
This should indicate to the test constructur that selection of items
from the item.pool on the basis of these discrimination indices will
be biased in favor of the group of items which have the highest
average discrimination values. This suggests the posSibility of
selecting the most discriminating items within a particular taxonomical
category rather than selecting the most discriminating items from
the total item.pool, disregarding the taxonomical structure of the
items. In the planning stages the test constructor would specify
the number of items measuring each instructional objective to be
included in the final form of the test. The items for the item pool
would be written accordingly. Then, within each category a specified
number of items would be selected for inclusion in the final form of
the test. In this manner the test constructor would be sure that

the final form of the test will validly evaluate the instructional

Objectives as indicated in the planning stage.

 

.-
..

(J
'4‘.

 

P; FPS

ow -v

 

H'UO!‘

5.1.-

.
0" :4“
Mt. v4
.

’Av q-
.U~ '-
-...—

I...

 

Sh

This study has also indicated that the sex and achievement
level of the tryout group has an effect on the statistical item
selection. It is a well—known principle that the tryout group should
he essentially similar to the group for which the test is to be used.
The results of this study clearly indicate that the sex and achieve-
ment leuel of the tryout group make a difference in the selection of
certain types of items for inclusion in the final test form. If,
for example, Application items discriminate better for females than
for males, then the test constructed using the discrimination indices
computed using the female tryout group for item selection will include
more Application items than will the test constructed using the data
from the male tryout group. If the proportion of males and females
in the group or groups for which the test is intended is not similar
to the proportion of males and females in the tryout group the test
would not validly measure the instructional objectives specified in
Ithe planning stage.. This could be a critical consideration in the
construction or use of a test with an all male or female group.
The test constructor might consider the possibility of computing
item.discrimination indices separately by sex as well as by taxonomi—

cal category and selecting items for the final form of the test

accordingly.

 

 

55

The use of-iteulselection using discrindnation indices com—
mned for high and low achieving males and females indicate
similar implications. A test constructed using data from the
entire tryout group might be invalid when used with high or low
achieving students. This could be a crucial consideration if the
test were to be used to evaluate a group of high or low achieving
students, e.g. scholarship testing.

The results and implications of this study must be tempered
in light of the following limitations inherent in the study:

1. The specific population in this or any study may have
an effect on the obtained results. In order for the results to be
meaningful for a larger population the study should be replicated
with subjects of varying age and grade levels.

2. The test items used in this study may have a biasing
effect on the results. Natural science items were used in this
StUdy. The procedure should be replicated using items from various
subject-matter areas.

As evidenced by the discrimination indices presented in
Appendix B and C, the test items used in this study were in general
high quality items. The study should be replicated using test items
of;varying quality, from the highly discriminating to the negative
discriminating.

The total item pool identified in this Study was comprised of
items from three tests given at different times. It is not evident
how this time interval between test administration has affected the
results obtained. It would be desirable to replicate the study using

items all of which were administered to the tryout group at one time.

.0

8L

 

 

56

In the opinion of the investigator the limitation imposed by
the quality of the test items might be more of an asset to the study
rather than a liability. It is highly possible that a replication of
the study using a more typical item pool with varying quality items
would yield even more discrepancies between the selected tests and
the original item pool from which the tests are selected.

3. Only two discrimination indices were employed in the study.
Although these were considered representative of the wide variety of
indices available, there is no assurance that the use of other indices
would yield similar results. This suggests replications using a variety
of discrimination indices as the criteria for the selection of items.

In order for the results of this study to have far-reaching
implications for educational achievement test construction further
study b required. The study has suggested, however, that the type
of statistical data on test items commonly used in item selection
might be only one of many considerations in the selection procedure.
The test constructor should consider the taxonomical structure of the
test items and the sex and achievement level of the group for which
discrimination indices are to be computed. The test constructor should
not overlook these aspects of the test items and the discrimination

indices computed for a particular group if a valid instrument is desired.

BIBLIOGRAPHY

10.

11.

BIBLIOGRAPHY

Adams,.James F. “An Evaluation of the Effect of Level of Item
Difficulty on various Indices of Item—Discrimination."
Unpublished Ph.D. thesis, State College of Washington,
Pullman, Washington, 1960.

Adkins, Dorothy C. "A Comparative Study of Methods of Select—
ing Test Items." Unpublished Ph.D. thesis, The Ohio State
University, Columbus, Ohio, 1937.

Barthelmess, Harriet M. "The Validity of Intelligence Test
Elements," Contributions to Educationz # 505. New York:
Bureau of Publications, Teachers College, Columbia Univer-
sity, 1931.

Barzun, Jacques. The House of Intellegt. New York: Harper,
1959. ’*

Bloom, Benjamin S. (ed.) Taxonomy of Educational Objectivgs.
New York: David McKay Company, Inc., 1961.

Bryden, M. P. A Non-Parametric Method of Item.and Test
Scaling, Educational and Psychological'Measurement,
1960, 39, No. 2, 311—315.

Clemens, Williaun An Index of Item Criterion Relationship.
Educational and Psychological Measurement, 1958, lg, 167-172.

Colver, Robert M. Estimating Item Indices by Nomographs.
Psychometrika, 1959, 35, 179—185.

Conrad, Herbert S. The Experimental Tryout of Test'Materials.
In E. F. Lindquist (ed.) Educational‘Measurement. Wash—
ington: American Council on Education, 1961, 250-265.

Cook, w. w. The Measurement of General Spelling Ability In—
volving Controlled Comparisons Between Techniques. Univer-
sity of Iowa Studies in Education, 1932, 19' No. 6.

Davis, F. B. ItemeAnalysis Data: Their Computation, Interpreta-
tion, and Use in Test Construction. Vgarvard Education

Papers, 1946, No. 2.

58

 

59

12. Davis, F. B. Item Selection Techniques. In E. F. Lindquist
(ed.) Educational Measurement. Washington: American Council
on Education, 1961, 266-328.

13. Dressel, Paul L., and‘Nelson, Clarence H. Qpestions and Pro-
blems in Science, Item Folio No. 1. Princeton: Educational

Testing Service, 1956.

14. Dyer, Henry S. Measuring Creativity and Intelligence. In
The Behavioral Sciences and Education, No. lOLCollege
Agmissions. Princeton: College Entrance Examination
Board, 1963, 46—65.‘

 

15. Ebel, Robert L. Writing the Test Item. In E. F. Lindquist
(ed.) Educational Measurement. Washington: American
Council on Education, 1961, 185-249.

16. Ebel, Robert L. Measuring Educational Achievement in School
and College Classrooms. East Lansing: Michigan State
’ University Book Store, 1964. (Mimeographed)

l7. Flanagan, J. C. General Considerations in the Selection of
Test Items and a Short Method of Estimating the Product-
Moment Coefficient From Data at the Talis of the Dis-
tribution. Journal of Educational Psychology, 1939, 22,

18. Gerberich, J. R. Specimen Objective Test Items: A Guide to
Achievement Test Construction. New York: Longmans, Green

and Company , l 956 .

l9. GuilfordJ. P. Psychometric Methods. New York: McGraw—Hill,
1954.

20. Guilford, J. P. The Nature of Intellectual Activity. In The
Behavioral Sciences and Education, No. 10, College Admissions.
Princeton: College Entrance Examination Board, 1963, 65-73.

21. Guilford, J. P. and Lacey, J. I. Printed Classification Tests:
Aviation Psychology Research Report No. 5. Washington:
Government Printing Office, 1947.

22. Gulliksen, H. Theory of.Ment§:l Tests. New York: John Wiley
and Sons, 1950.

23. Hall, Alfred E. "An Empirical Study of the Stability of Four
It era-Discrimination Indices Over Groups of Different Average
Ability." Unpublished Master's thesis, State University of
Iowa, Iowa City, Iowa, 1960.

 

24.

25.

26.

27.

28.

29.

30.

31.

32. Levine, Richard and Lord, Frederic M.

33. Lindquist, E. F.

34. Long, John.A. and Sandiford, Peter.
Items. Toronto:

Lentz, T. F. Hirshstein, Bertha, and Finch, J. H.

60

Hill, Walker H.and Dressel, Paul L. The Objectives of In-
struction. In Paul L. Dressel, Evaluation in Higher
Education. Boston: Houghton Mifflin Company, 1961.

Hoffmann, Banesh. The Tyranny of Multiple-Choice Tests.
‘Harper's Maggzine, 1961, 222, 37-44.

Hoffmann, Banesh. The Tyranny of Testing. New York: The
Crowell-Collier Press, 1962.

Humphreys, L. G. Commonly Used Statistical Procedures.
" In J. P. Guilford (ed.) Printed Classification Tests,
.Aviation Psychology Research Report No. 5. Washington:

Government Printing Office, 19h7.

The Selection of Upper and Lower Groups for

Kelley, T. L.
Journal of Educational

the Validation of Test Items.
Psychology, 1939, S9, 17—24.

Kuang, H. P. .A Critical Evaluation of the Relative Efficiency
of Three Techniques in Item.Ana1ysis. Educational and
Psychological Measurement, 1952, 12, 248-266.

Lawshe, C. H. Jr. and Mayer, J. S. Studiesin Item Analysis: 1.
The Effect of TWO Methods of Item Validation on Test Re-
liability. Journal of Applied Psychology, 1947, 31, 271—277.

Evaluation of

Methods of Evaluating Test Items. Journal of Educational

Psychology, 1932, 22, 344—350.

An Index of the Dis-

crimination Power of a Test at Different Parts of the
Score Range. Educational and Psychological Measurement,

1959, 22: No. a, 497-503.

Preliminary Considerations in Objective Test
Construction. In E. F. Lindquist (ed.) Educational Measure-
ment.‘ Washington: Aawrican Council on Education, 1961,

119-158.

The validation of Test
Department of Educational Research,

University of Toronto, 1935.

35. .Mason, C. F. .A Comparison of TWO Methods of Item.Ana1ysis on

the Basis of Reliability of Selected Tests.

Unpublished

.Master's thesis, Purdue university, Lafayette, Indiana, 1947.

61

36. ‘Michael,‘W. B. Development of Statistical Methods Especially
Useful in Test Construction and Evaluation. Review of
Educational Research, 1956, 26, 82-109.

37. Multip1e~Choice Questions: A Close Look. Princeton: Educa-
tional Testing Service, 1963.

 

38. Nelson, Clarence H. Exandning In Biological Science. In
Comprehensive Examinations in a Program of General Education.
East Lansing, Michigan: Michigan State College Press, 1949,
44-64.

 

 

 

39. Nelson, Clarence H. Evaluation of Objectives of Science
' Teaching. Science Education, 1959, 43,.No. 1, 20-27.

 

40. .Nelson, Clarence H. Evaluation in the Natural Sciences. In
P. L. Dressel, Evaluation in Higher Education. Boston:
Houghton Mifflin Company, 1961.

41. Pinter, R. and Forlano, G. A Comparison of Methods of Item
Selection for a Personality Test. Journal of Applied
Psychology, 1937. 21, 643-652.

42. Saupe, Joe L. Some Useful Estimates of the K-R Formula No.
20 Reliability Coefficient, Educational and Psychological
‘Measurement, 1961, g}, No. 1, 63-71.

43. Scannell, Dale In and Stellwagen, Walter R. Teaching
and Testing for Degrees of Understanding. California Journal
for Instructional Improvement, 1960!.2’ No. 1, 8-14.

44. Scott, William.A. Measures of Test Homogeneity. Educational
and Psychological Measurement, 1962, 23, No. 4, 751—757.

 

45. Solomon, H. (ed.) Studies in Item.Analysis and Prediction.
Stanford: Stanford university Press, 1961.

46. Stanley, J. C. and Bolten, D. Book Review in Book Review
Section, William B. Michael, (ed.) Educational and Psy—
chological Measurement, 1957, 12, 631-634.

47. Swineford, F. validity of Test Items. Journal of Educational
Psychology, 1936, 27, 68—78.

.48. vaughn, K. W. Planning the Objective Test. In E. F. Lindquist
(ed.) Educational Measurement. Washington: American Council
on Education, 1961, 159,184.

62

49. Vernon, P. E. Indices of Item Consistency and Validity.
British Journal of Psychology, Statistical Section, 1948,
l, 152-1650

50. Webster, Harold. Item Selection.Methods for Increasing Test
Homogeneity. Psychometrika, 1957, 22, 395—403.

 

51. Whyte, William H. The Organization Man. New York: Simon
and Schuster, 1956.

 

APPENDIX A

 

~5—

EXAMPLES* OF NATURAL SCIENCE TEST ITEMS CLASSIFIED IN THE TAXONOMICAL
CATEGORIES

KNOWLEDGE

Use the following key to answer Items 1—3

Key: 1
2
3
a
stated.
5. Impossible to deter
Statement

1. A negatively charged particle
repels a positively charged
particle

2. At constant temperature the

pressure and volume of a gass
are inversely proportional

3. Two charged objects rePe1
each other

COMPREHENS ION

. The statement is false under the condition stated.
. The statement is false regardless of the condition.
. The statement is true under the condition stated.

. The statement is true regardless of the condition

mine without more data.

if the negative particle has the
larger charge. (2)

if the pressure is expressed in
mm. of mercury and the vol-
ume is expressed in cubic centi—

meters. (4)

if both attract similarly charged
objects. (3)

For Itesm 4—7 use the following key:

Statement B,
2. Statement A,

ment B, which is

3. Statement A is empirical
e is pgiexplanatory relationship

cal, but ther

’

between them.

4. Statement A is theore
but there is pg_explana

W:
between them.
5. Either both sta

empirical.
Statement A
4. Crossing in the human popu—

lation gives a progeny sex
ratio of 1 : 1.

___-

*Examples from 40:124—149 64

' Key: 1. Statement A, which is empirical, is explained by
whidh is theoretical.
which is theoretissl,

misal-

explains State-

and Statement B is theoreti—

tical and Statement B is
tory relationship

tements are theoretical or both are

S ta tement B

Sex chromosomes segregate to
different cells during the for—

mation of gametes. (5)

I‘H“

 

65

5. A certain monohybrid cross Each parent carries contrasting
gives a 3 : 1 ratio in a popu— alleles for the characteristic in
lation. question. (1)

6. The hereditary determinants An individual usually resembles
are carried on the chromo- his father about as much as he
somes. resembles his mother. (4)

7. Genes occur ina constant The same kinds of crosses always
linear order on the chromo- give about the same crossover
somes. frequencies. (2)

APPLICATION

 

For Items 8-11 select the most appropriate answers from the

following key:

10.

11.

A fact based on empirical observation.

An assumption basic to the solution of the problem.
A conclusion that is contradicted by the evidence.
A conclusion that is justified by the data.
Insufficient evidence to make this judgment.

Key: ,

UlJ-‘er—I

In the P generation the brightest rat made fewer than ten errors
while the dullest rat made more than 200 errors. (1)

The extent to which each rat is able to learn to run a maze can
be used as a measure of the rat's intelligence. (2)

In the F1 generation the least number of errors made by any Bl
individual was 12. (1)

If the number of errors made in maze running is a criterion of
intelligence, then the ablest member of the D1 group was more
intelligent than the ablest member of the 81 group in the F1

generation. (4)

ANALYSIS
~l_——

red, some gray; some profusely spotted, some without spots.

"Items 12-15 are concerned with the following situation:

Some are

A certain species of fish comes in various colors.
In an

attempt to analyze the inheritance of color in these fish a geneticist
has worked out the following:

Red color is due to a recessive gene borne on the X—chromosome.
Gray color is due to the dominant allele of this gene.

The gene for plain is dominant in the female and is not sex-
linked.

12.

13.

14.

15.

66

The gene for spotted is dominant in

linked.

(The gene for spotted is al

Let: X represent a
gray color.

x represent a sex
red color.

P represent the gene

p represent the gene

sex chromosome which bears

the male and is not sex—

lelic to the gene for plain.)

the gene for

chromosome which bears the gene for

the female).

for plain (dominant in
in the male).

for spotted (dominant

For items 12-15 use the following key:

Key: 1. XYPP x XxPp.
2. XYPP x XxPp.
__3. xYPp x xxPp.
4. XYPP x xxPp.
5. None of the above.

The theoretical yield of this
25% red plain males,
25% red spotted males,
50% gray plain females. (4)
The theoretical yield of this
12.5% gray plain males,

12.5% gray spotted males,
12.5% red plain males,
12.5% red spotted males,
50% gray plain females. (1)
theoretical yield of this
gray plain males,
red plain males,
gray plain females.

The
25%
25%

50% (2)
The theoretical yield of this
37.5% red spotted males,
37.5% red plain females,
12.5% red plain males,

12.5% red spotted females. (3)

cross will be

cross will be

cross will be

cross will be

APPENDIX B

68

FALL TERM EXAMHNATION - MALES

Know ledge Itsms

Item Number Item Difficulty Davis Index Difference Index
1 67 2o 29
3 88 19 14
5 48 7 11
6 59 9 7
7 71 24 32
8 71 25 33
9 73 30 35

18 82 11 12
19 65 26 37
20 62 34 “7
21 67 10 1”
22 82 16 17
23 81 7 8
24 55 19 3°
25 75 12 15
26 3s 4 6
27 73 16 21
28 44 - 18 27
29 68 15 21
3o 33 14 21
31 44 8 12
32 78 ' 31 32
33 9 ‘2 '1
34 51 23 36
3s 42 2 3
36 49 19 30
37 64 21 3O

29 27

38 82

J‘A -\

69

FALL TERM EXAMINATION - MALES

Comprehension Items

Item Number Item Difficulty Davis Index Difference Index
2 64 5 8
4 69 11 15

10 85 22 . 19

11 ' 57 14 22

12 68 26 35

.13 61 14 22
14 47 . 23 - 34

15 72 12 16

16 61 23 34

17 72 25 31

49- 65 26 37

50 6O 24 35

51 67 25 34

52 75 19 23

53 46 7 11

54 7o 21 28

55 70 22 29

‘56 ' 22 25 28
57 64 21 31

58 56 20 32

115 46 16 25
116 76 12 15
117 57 24 37
118 30 10 1“
119 79 26 28
- 8 12

120 64

 

70

FALL TERM EXAMINATION - MALES

Appl ica tion Items

Item Number Item Difficulty Davis Index Difference Index
39 49 15 24
.40 69 27 36
41 44 . 8 ' 12
42 83 21 19
43 77 9 11
44 57 17 “7
45 55 24 37
46 73 13 17
47 67 12 17
48 69 20 27
59 59 10 15
60 81 13 1“
. 61 73 16 21
84 ‘ 46 12 19
85 74 25 3°
86 71 20 27
87 70 17 2“
88 29 16 22
89 78 25 28
90 63 20 29
91 79 1” 16
92 59 21 31
93 69 17 2“
94 56 18 27
95 48 11 18
96 43 11 18
97 55 18 27
98 39 18 27
99 37 11 18
100 41 19 29
101 47 1“ 23
102 78 26 20
103 58 11 18
104 71 18 25
105 71 15 20
106 75 13 17
107 83 1“ 13
108 75 19 23
121 72 23 29
122 18 1 1
123 74 19 23
124 79 27 28
125 70 21 28

rim i__.._1_

71

FALL TERM EXAMINATION - MALES

finialysis Itggs

Item Number Item Difficulty Davis Index Difference Index
62 63 6 9
63 90 10 7
64 , ‘52 15 14
65 95 5 2
66 68 13 18
67 68 11 15
68 67 11 15
69 75 15 19
70 68 11 16
71 62 14 22
72 28 7 9
73 68 11 15
74 84 22 19
75 70 20 27
76 58 8 12
77 30 11 15
78 90 13 9
79 6O 10 16
80 52 ‘ 15 2”
81 ~ 88 12 9
82 20 '2 ’2
83 33 10 1”

109 68 26 35
110 71 22 29
111 71 20 27
112 68 ii 3%
113 69 21 30

114 66.

72

WINTER TERM EXAMINATION - MALE

Knowledge Items

Item Number Item Difficulty Davis Index Difference Index
1 88 10 8
2 53 3 5
3 72 17 22
4 71 16 22
5 39 11 18
6 67 6 9
7 59 7 10
8 82 15 15

14 83 14 13
15 40 16 2“
16 73 19 25
17 89 21 - 1“
18 69 14 19
19 69 15 21
7 20 76 1‘1 17
21 75 12 15
22 90 10 7
23 80 15 17
24 73 19 18
25 76 16 19
25 85 22 19
27 77 23 27
28 73 19 25
29 70 16 22
69 64 ' 26 37
70 73 13 17
71 78 17 20
72 65 23 32
73 81 13 1“
74 40 9 13
75 58 13 22
76 64 23 34
77 51 20 32

73

WINTER TERM EXAMINATION - MALES

Compre hension Items

Item Number Item Difficulty Davis Index Difference Index
._ 9 88 16 12
10 75 6 7
11 92 19 10
12 72 14 18
13 60 15 23
30 73 19 24
31 84 17 16
32 74 14 18
33 43 19 29
34 43 24 37
35 75 27 32
36 72 4 23 30
37 70 24 32
38 63 20 29
39 82 19 19
40 66 8 12
41 52 20 31
54 75 16 20
55 61 17 26
56 73 15 19
57 80 20 21
58 48 11 17
59 73 27 33
60 . 73 9 12
61 70 3 15 ' 21
62 61 14 21
63 . 27 11 15
81 31 8 11
82 ‘ 60 23 3“
83 74 10 13
84 44 19 29
85 73 7 1°
86 83 1“ 1”
101 38 21 32
102 60 32 “6
103 ’ ‘53 21 33
104 40 17 26
105 70 21 28
106 46 19 30
107 38 11 17
108 94 0 O
109 94 12 5
110 77 15 18
111 72 13 17
112 65 13 20
113 76 8 13

114 27 3

 

i“

74

WINTER TERM EXAMINATION - MALES

Application Items

Item Number Item Difficulty Davis Index Difference Index
42 56 18 27
43 54 20 31
44 60 20 30
45 26 18 23
46 51 13 22
47 43 4 . 6
48.. 63 5 7
49 43 7 13
50 62 2 9
51 66 11 16
52 62 10 16
53 65 9 13
64 69 17 2“
65 89 26 16
66 92 19 10
67 74 18 22
68 71 21 23
94 83 - 15 15
95 29 15 21
96 19 20 2°
97 58 23 35
98 55 19 22
99 27 10 18

100 30 0

 

 

Item Number

78
79
80
87
88
89
9O
91
92
93
115
116
117
118
119
120
121
122
123
124
125..

75

WINTER TERM EXAMINATION - MALES

.Analysis Items

Item Difficulty

69
61
44
39
43
38
44
80
77
29
66
70
48
53
7O
38
52
61
17
64
55

 

 

Davis Index

29
21
12

2
15
27
14
18
21

2
30

8
21
14
18

9
19
17

7
15
15

Difference Index

38
32
19
3
23
39
21
19
24
3
41
12
33
23
25
13
29
26
7
.23
23

._.q

76

SPRING TERM EXAMINATION - MALES

Eppwledge Items

Item.Number Item Difficulty Davis Index Difference Index
30 54 10 -16
32 88 17 13
29 — 85 18 16
31 80 18 19
25 67 23 32
24 74 18 22
26 48 1 1
22 71 13 1s
21 71 22 29
20 61 14 21
19 71 19 26
27 70 16 22
23 89 . 4 3
28 47 7 11
11 99 7 2
10 64 33 “6

5 90 13 9

7 75 19 23

4 95 24 9

6 87 22 17
13 54 12 19

9 50 16 25
12 70 23 31
92 81 11 12
87 93 9 5
88 . 35 19_ 28
86 67 21 30
98 77 15 18
76 . 58 13 27
81 86 23 19
33 4O 24 35
3“ 38 9 13
35 33 0 0
36 34 7 10
121 43 11 18
117 64 1” 22
120 64 ‘ 8 12
122 50 9 15
124 57 z I:
119 36 23

118 63 15

77

SPRING TERM EXAMINATION - MALES

Comprehension Items

Item Number Item Difficulty Davis Index Difference Index

2 78 17 19

3 89 15 11

1 95 12 5

8 42 21 31
14 63 16 24
15 85 20 17
18 46 7 12
16 . 69 23 31
69 69 20 - 27
70 65 23 31
99 ' 79 16 18
97 . 39 16 24
93 86 30 22
94 63 26 38
79 74 14 18
78 98 22 17
80 49 20 31
75 54 11 18
45 . 64 . 23 33
48 27 16 20
44 63 21 31
47 79 27 29
43 36 8 12
46 65 21 30
42 45 5 8
83 48 12 18
85 98 14 9
82 72 19 25
84 77 22 25
96 96 19 5
95 36 28 “0
128 77 17 20
129 57 15 2“
37 60 17 29
38 53 21 33
39 44 17 26
123 39 10 15

78

SPRING TERM EXAMINATION - MALES

Application Items

Item Number Item Difficulty Davis Index Difference Index
17 36 10 16
74 54 20 31
73 ‘ 73 15 19
71 31 -3 -5
89 13 15 12
90 17 0 0
91 43 8 12
55 80 20 21
65 74 11 14
61 74 21 26
57 16 19 17
67 27 5 6
49 31 -1 ‘1
59 66 12 17
53 79 20 22
63 30 12 17
51 78 12 1“

100 99 7 1
101 93 16 8
102 76 22 26
103 64 26 37
104 85 :3 :2
105 73 31 _ 46

106 42

79

SPRING TERM EXANHNATION - MALES

Analysis Items

Item Number Item Difficulty Davis Index Difference Index
72 29 15 21
77 87 12 10
56 53 11 17
66 90 15 10
62 79 24 26
58 31 4 6
68 47 6 9

. 50 11 5 4
60 62 18 27
54 70 24 31
64 42 11 18
52 90 15 10

125 38 20 30
126 58 27 “0
127 58 27 “0
107 83 21 19
108 76 28 31
109 63 10 16
110 78 25 28
111 3 -3 ‘1
112 " 66 16 19
113 79 23 25
114 15 0 0
115 82 11 12
116 33 8 11
40 35 11 17

6 9

41 33

APPENDIX C

81
FALL TERM EXAMINATION - FEMALES

Knowledge Items

Item Number Item Difficulty Davis Index Difference Index
1 68 21 29
3 88 17 13
5 51 13 21
6 90 15 10
7 78 31 32
8 75 35 38
9 79 29 30

18 87 12 10
19 67 29 39
20 64 36 49
21 73 6 7
22 86 19 16
23 88 -2 '2
24 59 29 43
25 73 12 16
26 31 12 17
27 72 9 12
28 42 15 23
29 ‘ 73 18 . 23
30 35 ' 16 2“
31 43 6 9
32 ‘ 81 38 33
33 - 11 9 7
34 49 22 34
35 51 8 13
36 50 23 35
37 68 18 25

38 84 33 26

 

lz-n .‘F—n. _.. .

82

FALL TERM EXAMINATION - FEMALES

Comprehension Items

Item Number Item Difficulty Davis Index Difference Index
2 6O 17 25
4 67 12 18
10 88 13 11
11 6O 23 . 34
12 73 3O 35
13 60 11 18
14 53 31 47
15 68 11 15
16 66 31 42
17 77 13 16
49 70 35 43
50 53 26 41
51 71 27 35
52 82 19 19
53 53 16 25
54 69 29 38
55 73 1 24 30
56 ' 28 26 33
57 64 28 “0
58 52 22 34
115 47 18 27
116 82 17 17
118 32 21 29
119 78 28 30
120 62 6 9

83

FALL TERM EXAMINATION - FEMALES

Appl ication Items

Item Number Item Difficulty Davis Index Difference Index
39 57 21 32
40 69 29 38
41 51 18 27
42 78 25 28
43 77 11 14
44 62 37 52
45 64 24 35
46 76 19 22
47 69 18 25
48 67 26 36
59 60 11 18
60 83 ~ 21 19
61 74 18 22
84 52 20 32
85 83 22 2°
86 73 18 23
87 75 15 19
88 40 24 35
89 87 34 23

' 90 69 29 37
91 84 15 1“
92 4 22 32
93 74 7 22 27
94 54 14 22
95 50 13 21

15 24
96 45
97 49 16 25
98 43 23 39
99 41 21 32
100 45 26 39
101 52 19 2“
102 77 35 36
103 7 65 12 I:
104 80 25 :5
105 74 20 20
106 78 17 21
107 86 29 27
108 79 25 33
121 77 3i 1
122 18
123 77 32 33
124 79 29 33
125 67 2“

84

FALL TERM EXAMINATION - FEMALES

ﬁgslysis Itsms

ItempNumber Item Difficulty Davis Index Difference Index
62 68 6 9
63 92 20 11
64 55 9 14
65 95 13 5
66 76 14 17
67 62 20 29
68 65 12 18
69 74 13 17
70 70 19 26
71 61 0 21 32
72 34 13 19
73 70 10 1“
74 91 25 14
75 81 29 27
87 52 11 17
77 37 15 23
78 90 29 17
79 63 14 21
80 47 22 34
81 91 15 '9
82 21 5 5
83 38 13 20

109 ' 75 29 . 3"
110 75 ‘ 23 28
111 . 77 32 3“
112 73 33 39
113 1 73 20 39
114 73 22 28

 

85
WINTER TERM EXAMINATION - FEMALES

Knowledge Items

Item Number Item Difficulty Davis Index Difference Index
1 86 4 4
2 58 5 8
3 70 20 27
4 70 16 22
5 36 8 11
6 68 8 12
7 64 8 11
8 84 6 6

14 86 23 ' 18
15 45 19 24
16 80 21 22
17 92 24 13
18 8O 17 18
19 71 18 2“
20 78 25 27
21 81 22 22
22 92 19 10
23 82 21 21
24 77 23 26
25 80 14 15
26 84 22 19
27 72 29 36
28 65 25 35
29 69 17 23
69 62 20 3°
70 70 16 22
71 85 15 13
72 71 23 30
73 85 4 4
72 44 9 11
75 58 10 15
76 60 23 3“

15 24

77 52

 

86
WINTER TERM EXAMINATION - FEMALES

Comprehe ns ion Items

Item Number Item Difficulty Davis Index Difference Index
9 90 13 9
10 . 75 8 9
11 92 20 ‘ 11
12 73 9 12
13 - 57 9 14
30 . 68 16 22
31 85 18 16
32 75 14 17
33 44 20 31
34 32 18 25
35 63 26 35
36 69 22 30
37 75 29 34
38 66 18 26
39 86 12 10
40 76 11 13
41 46 18 27
54 75 17 21
SS 68 20 28
56 75 14 17
57 85 20 17
58 56 15 2“
59 78 19 22
60 78 12 15
61 68 14 19
62 65 13 19
63 29 11 15
81 38 10 16
82 60 18 27
. 83 74 14 18
84 48 19 3°
85 75 13 19
86 86 14 12
101 43 24 37
102 66 25 35
103 54 25 38
104 39 2° 30
105 73 . 19 29
106 50 18 27
107 27 19 2“
108 93 4 2
109 95 17 7
110 75 16 20
111 78 12 1“
112 66 16 2”
7 113 78 9 11
16

114 30 11

 

 

87

WINTER TERM EXAMINATION - FEMALES

Application Items

Iten Number Item Difficulty Davis Index Difference Index
42 54 11 18
43 37 18 27
44 47 23 36
45 18 7 8
56 34 10 14
47 39 4 6
48 55 3 5
49 42 8 12
50 6O 8 11
51 66 10 14
52 63 9 13
53 67 7 10
64 70 16 22
65 91 20 11
66 94 27 10
67 81 20 20‘
68 70 19 26
94 82 18 18
95 - 24 12 15
96 8 ‘ 9 5
97 47 30 “0
98 - 45 27 “1
99 , 24 -3 ‘3

100 36 2 9

88

WINTER TERM EXAMINATION - FEMALES

Ana lys is Items

Item Number Item Difficulty Davis Index Difference Index
78 68 20 27
79 71 13 17
8O 44 10 16
87 39 9 14
88 46 18 28
89 39 22 33
90 49 7 12
91 82 21 21
92 77 19 22
93 28 4 5

115 68 33 43
116 72 4 5
117 57 24 37
118 42 14 22
119 70 16 22
120 32 4 6
121 45 14 21
122 56 11 18
123 16 0 0
124 61 11 17
125 51 12 20

89

SPRING TERM EXAMINATION - FEMALES

Knowledge Items

Item Number Item Difficulty Davis Index Difference Index
30 47 11 18
32 86 17 15

, 29 85 16 15
31 68 11 15
25 42 8 12
24 66 19 27
26 45 10 15
22 69 17 23
21 69 20 27
20 64 17 25
19 67 20 28
27 70 .14 19
23 88 10 8
28 40 15 ‘23
11 98 11 2
10 u 63 26 38

5 91 21 12

7 77 18 21

4 94 18 8

6 81 30 28
13 46 15 23

9 52 6 10
12 65 18 26
92 79 18 19
87 91 21 12
88 36 8 12
86 68 29 38
98 77 12 15
76 54 17 26
81 83 35 29
33 39 23 34
34 34 6 9
35 33 -4 ‘6
35 37 5 8
121 44 13 20
117 67 12 17
120 67 12 17
122 51 13 21
12” . 65 20 . 29
119 36 11 17

118 65 20 29

90

SPRING TERM EXAMINATION - FEMALES

Comprehension Items

 

Iten Number Item Difficulty Davis Index Difference Index
2 79 15 17
3 89 18 12
1 94 18 8
8 41 25 38
. 14 57 19 29
15 84 23 20
18 51 7 12
16 62 25 36
69 68 21 29
7o 56 19 29
99 74 14 18
97 17 5 5
93 79 2o 21
94 57 27 39
79 69 17 .23
78 80 28 27
8O 43 19 29
75 ' 46 14 22
45 58 16 25
45 58 16 25
48 31 9 13
44 51 18 27
47 73 28 3“
43 32 1 2
46 64 24 3”
42 52 2 4
83 44 15 2“
85 96 22 8
82 71 22 29
84 79 17 18
85 96 22 8
82 71 22 29
84 79 16 18
96 91 33 16
95 28 2“ 31
128 68 22 31
129 S2 20 32
37 53 19 29
38 48 19 29
39 45 18 27
123 ' 43 12 19

91

SPRING TERM EXAMINATION - FEMALES

ABplication Items

Item Number Item Difficulty Davis Index Difference Index
17 34 8 12
74 41 22 34
73 58 22 34
71 31 —4 -6
89 1o 5 4
9o 15 5 5
91 43 -3 —5
55 76 2o 23
65 71 15 20
61 67 15 22
57 9 22 13
67 3o 3 5
49 21 -2 -2
59 68 11 15
53 73 24 29
63 17 -1 -1
51 78 19 22

-100 97 7 2
101 90 13 9
102 65 22 31
103 62 25 36
104 84 9 9
105 59 25 38

106 35 3O #1

92

SPRING TERM EXAMINATION - FEMALES

Analysis Items

 

Item Number Item Difficulty Davis Index Difference Index
72 21 5 6
77 79 21 23
S6 47 19 3O
66 86 20 17
62 71 17 '23
58 27 9 12
68 44 10 15
SO * 8 O 0
6O 58 15 24
54 61 20 3O
64 31 4 6
52 88 21 15

125 36 10 16
126 51 23 36
127 42 24 37
107 80 21 22
108 70 26 35
109 56 22 34
110 71 25 32
111 3 -4 -1
112 60 29 43
113 78 19 22
114 16 3 3
115 80 12 13
116 29 16 22
4O 39 8 12

41 30 12 17

.1, ill.» I.Il I, 9‘ 1| 1 .I! I l: I

, . 4

 

 

M'1111111111111111111111111111111“

46 9583