THE DIFFERENCES AMONG THREE-, FOUR-, AND FIVE-OPTION-ITEM FORMATS ON
A HIGH-STAKES ENGLISH LISTENING TEST
By
HyeSun Lee

A THESIS
Submitted to
Michigan State University
in partial fulfillment of the requirements
for the degree of
MASTER OF ARTS
Teaching English to Speakers of Other Languages
2011

ABSTRACT
THE DIFFERENCES AMONG THREE-, FOUR-, AND FIVE-OPTION-ITEM FORMATS ON
A HIGH-STAKES ENGLISH LISTENING TEST
By
HyeSun Lee
The aim of this research is to investigate the differential effects of multiple-choice items with
three, four, and five options on a high-stakes, English language listening test. Three-option
multiple-choice items on an English listening test were compared with items with four and five
options in terms of average total score assignments, average item facility, average item
discrimination, overall test reliability, and processing time. Three prep English listening tests for
the CSAT (College Scholastic Aptitude Test), each with five-option items, were adapted into
parallel forms with four- and three-option items by eliminating the least plausible options, as
selected by 73 Koreans. A total of 264 Korean EFL learners, divided into three groups,
participated in the study. Each group took tests with five-, four-, and three-option items. The test
administrations were based on Latin squares to control for order and practice effects. Results
indicated that the average scores between tests with three- and five-option items differed
significantly. However, there was no significant difference in the average item facility, average
item discrimination, and overall reliability among the tests with the different number of options.
Regarding time on exam, the three-option-item test took 11% less time than the five-option-item
test. Survey data from the 265 test takers revealed that 54% preferred the three-option-item
format. Also, 68% agreed administering the CSAT with three-option items would be preferable.
Results are discussed from the perspective of statistical, cognitive, emotional and contextual
factors in determining the optimal number of options.

To my parents, HK, and HJ whom I always love and miss so much.

iii

Acknowledgements

I would like to thank the principal and teachers for allowing me to collect data in their school;
students for their voluntary participations; my classmates for their generous assistance in revising
tests. Finally, I wish to express my deep gratitude to Dr. Paula Winke and Dr. Charlene Polio
who provided insightful comments throughout my thesis writing.

iv

Table of Contents

List of Tables .................................................................................................................................. vi
List of Figures .............................................................................................................................. viii
Introduction ..................................................................................................................................... 1
Background ..................................................................................................................................... 2
Research Questions ........................................................................................................................11
Methods......................................................................................................................................... 12
Data Analysis ................................................................................................................................ 20
Results ........................................................................................................................................... 38
Discussion ..................................................................................................................................... 41
Conclusion .................................................................................................................................... 45
Limitations and Future Research .................................................................................................. 47
Appendix A-1 ................................................................................................................................ 49
Appendix A-2 ................................................................................................................................ 55
Appendix A-3 ................................................................................................................................ 61
Appendix B-1 ................................................................................................................................ 67
Appendix B-2 ................................................................................................................................ 76
Appendix B-3 ................................................................................................................................ 86
Appendix C ................................................................................................................................... 94
References ..................................................................................................................................... 96

v

List of Tables

Table 1

Revision of the Three Prep-Tests to the Four-option-item Tests ....................... 17

Table 2

Revision of the Four-option-item Tests to the Three-option-item Tests............. 17

Table 3

Test Administration Schedule ............................................................................ 18

Table 4

Descriptive Statistics of Three Original Tests with Five Options (I5i, II5i, and
III5i)................................................................................................................... 21

Table 5

Significance in terms of the Average Scores in Five Original Tests (I5i, II5i, and
III5i)................................................................................................................... 21

Table 6

Descriptive Statistics of Three-, Four-, and Five-option-item tests: All Nine
Tests ................................................................................................................... 23

Table 7

Significance in terms of the Average Total Scores in All Nine Tests .................. 23

Table 8

Paired Comparisons: Differences in terms of the Average Total Score
Assignment in All Nine Test ............................................................................... 23

Table 9

Correlations of Test Scores among Three Different Item formats ..................... 25

Table 10

Descriptive Statistics of Three-, Four-, and Five-option-item Tests: Harder
Version Tests ...................................................................................................... 26

Table 11

Paired Comparisons: Differences in terms of the Average Total Score
Assignment in Harder Version Tests .................................................................. 26

Table 12

Average Item Facilities: All Nine Tests, Harder Version (I), and Easier Versions
(II & III) Tests .................................................................................................... 28

Table 13

Average Item Discriminations: All Nine Tests, Harder Version (I), and Easier
Versions (II) ....................................................................................................... 29

Table 14

Overall Reliability ............................................................................................. 30

Table 15

Agreement vs. Disagreement based on the Preferred Number of Options ........ 34

Table 16

Item two: Why do you prefer a three-, four-, or five-option-item test? ............. 35

Table 17

Item Four: Advantages and Disadvantages of the Three-option-item Tests ..... 37

vi

Table 18

Item Seven: Why do you agree or disagree with the administration of the threeoption-item tests? .............................................................................................. 38

vii

List of Figures

Figure 1

Preferred item format among three-, four-, and five-item tests ....................... 31

Figure 2

Administration of the three-option-item format on the CSAT ......................... 32

Figure 3

Agreement with the administration of the three-option-item format based on
the preferred number of options ...................................................................... 33

viii

Introduction
Haladyna (2004) stated that multiple-choice is preferred as a test item format due to its
scoring objectivity, higher reliability, and efficient administration even though many test writers
have difficulty in writing multiple-choice items. It is challenging for test developers to word the
stem, to make a single, clear correct answer, and to write plausible distracters. On the other hand,
test takers report that multiple-choice items are much less demanding and make them less
anxious, so this format is preferred compared with other test item formats (Haladyna, 2004). In
addition to the advantages of the automaticity of scoring and less demanding workload for the
test takers, the multiple-choice format produces relatively higher reliability than other essay
formats, as Haladyna claimed. Reliability is an important factor to be considered in testing.
Brown (2005) defined reliability as ―the extent to which the result can be considered consistent
or stable‖ (p. 175). Also, Haladyna argued that multiple-choice items make it possible to cover
content knowledge and a wide range of skills, as other types of items such as short-answer or
essays do. Thus, the multiple-choice item format has been widely used in many language tests.
Also, a lot of studies have been conducted regarding the best type of multiple-choice format. One
area of contention is the optimal number of options in a multiple-choice item, and this argument
has lasted more than 80 years (Rodriguez, 2005).
Before proceeding with the exact details of this study, I first discuss prior empirical
research that has been conducted regarding the number of options that multiple-choice items
should have. This is tied in with researchers‘ discussions on guessing that results from the
multiple-choice item format, thus I review this area of research as well. I also discuss the College
Scholastic Aptitude Test (CSAT) in Korea, a high-stakes, five-option multiple-choice test, which
provides the general context for this study on the optimal number of options for multiple-choice

1

items. Then, I introduce the research questions and research method of this study. In the result
section, analysis of quantitative and survey data will be presented with an outline of the major
findings. Finally, I conclude with a discussion, research implications, and limitations and future
directions.

Background
Optimal Number of Options
As mentioned above, the construction of multiple-choice items, especially the
construction of options, takes item writers a lot of time. The optimal number of options for items
has been debated considerably (Haladyna & Downing, 1993, 2002; Lord, 1977; Trevisan, Sax, &
Michael, 1991). In the review of multiple-choice item-writing guidelines, Haladyna, Downing,
and Rodriguez (2002) reported 70 percent of 31 guidelines of multiple-choice item-writing cited
the rules of ―write as many plausible distracters as you can‖ and only four percent of the
guidelines were against this. Plausible distracters should be chosen by low performers and
eliminated by high performers. In other words, a plausible distracter should be a good, functional
distracter that looks like a correct answer to test takers who do not have enough knowledge or
skill concerning what an item intends to measure (Haladyna, 2004). However, it is hard to write
many plausible options. Thus, the ultimate goal is often to construct test items with an optimal
number of plausible distracters. Since Ruch and Stoddard (1925) claimed three-option items do
not have detrimental effects on test outcomes, compared with four- or five-option items, many
empirical studies have been conducted that investigate the optimal number of functional options
a multiple-choice item should have (e.g., Haladyna, Downing, & Rodriguez, 2002; Rodriguez,
2005).

2

In the early of 1940s, Lord conducted a study on the number of options in multiplechoice items, claiming that three options were optimal. In a mathematical demonstration,
Tversky (1964) argued that items with three options maximize the power of a test, test
discrimination, and information in a test, provided that the total number of options would be
fixed. Costin (1970) revised four-option-item tests in introductory psychology for the Air Force
by random elimination of one option from each item. Costin reported that mean item
discrimination and reliability of three-option items were better than those of four-option items.
Straton and Catts (1980) compared two-, three-, and four-option-item tests in a fixed number of
options in Tversky‘s condition— the total exam time is in proportion to the number of options in
a test (1964). Straton and Catts developed four different versions of a trial test for a college
entrance exam in economics: one version of 60 items with two options, two versions of 40 items
with three options. The only difference between these two versions in the sets of 40-items was in
the way the options were discarded—random discarding of options in one 40 items set and
eliminating the worst option in the other 40 items set. Finally, one version of 30 items with four
options was adopted for the study of Stranton and Catts. Administering four tests to 260 students
at Sydney technical college, the results showed an increase in item difficulty and item
discrimination when there were fewer numbers of options. Also, reliability and the Standard
Error of Measurement of items with three options were equal to or better than the items with two
or four options.
Owen and Froman (1987) conducted an experiment with two parallel psychology final
comprehensive exams, consisting of a pretest and posttest with three- and five-option items. Both
of the tests were composed of 100 items, half of which were three-option items and the other 50
items with five options. In this research, each of 114 undergraduate students enrolled in an

3

educational psychology class took both three-option items and five-option items as a pretest and
a posttest over a ten-day period—if a student took a pretest which consisted of first 50 items as
five-option questions and the other 50 items as three-option questions, then he or she would take
a posttest in which 50 three-option items came up first, then 50 five-option items were followed.
Correlated t-tests were used to estimate item difficulty and item discrimination. The results
indicated that there were no significant differences between the three-option test and the fiveoption test in mean item difficulty and mean discrimination. Also, total test scores between the
two tests were not so different. After finishing the posttest, the students voted for the preferred
form. The result showed that 111 students preferred three-option items to five-option items. Also,
Owen and Froman reported that response time per an item were significantly reduced with fewer
options. Owen and Froman claimed that more options were likely to give more unintended clues
that may help test-wise students. According to Owen and Froman, if three-option items provide
equal statistical outcomes to four-or five-option items, testwiseness may not be an important
issue in three-option items. Also, they claimed that content validity and reliability could be
improved by adding more items into a three-option-item test due to the saved testing time.
Trevisan, Sax, and Michael (1991) investigated the interaction between student ability
and the number of options in items, and reliability and validity affected by this interaction. From
a verbal test in the Washington Pre-College Admission Test Battery (University of Washington,
1983) which originally consisted of items with five options, Trevisan et al. created three- and
four- option items by using the point biserial coefficients from the standardization data and by
discarding the least discriminating options. The students were divided into high, medium, and
low ability groups based on their GPAs. A total of 435 junior class parochial high school students
took only one of three tests—the five-option test, four-option test, or three-option test. The result

4

showed that there were significant differences in reliability on the three tests for low-ability
group. Validity was not significantly different across groups. Trevisan et al. claimed that the
optimal number of options was three when the groups were combined.
Also, Haladyna and Downing (1993) reported the summary of empirical and theoretical
studies about three-option items. Also, they investigated the frequency of effective and
ineffective distracters by analyzing three standardized multiple-choice tests: a test of a graduate
medical education program for physicians which had 200 five-option items, ACT Assessment
which consisted of total 127 four-option items, and a state certification examination in the health
sciences in which there were 150 four-option items. In analysis of the 477 items including 2,108
options from the frequency of distribution of options, Haladyna and Downing claimed that items
with three options may be a ―natural limit for multiple-choice item writers in most circumstances‖
(p.1008), reporting that only one or two distracter(s) were effective among two thirds of the
items, and that items with three effective distracters were between 1.1 percent and 8.4 percent of
the 477 items, and all 200 items with five options did not have four effective distracters.
Haladyna and Downing also found that there was no relation between the number of effective
distracters and item difficulty, whereas items with more plausible distracters could affect items‘
discrimination. Furthermore, Haladyna and Downing stated that more options per item could
contribute to better reliability if the distracters performed adequately, which would be rare.
Different from Haladyna and Downing‘s result, Crehan et al. (1993) reported that items with
three options had a little bit higher item difficulty than four-option items and no difference in
item discrimination among the tests—220 university students from a psychology class took one
of the manipulated tests. Crehan et al. concluded that three-option items had three benefits: they
were easier to write, had more effective distracters, and took less time to administer.

5

Cizek and O‘Day (1994) examined item difficulty and item discrimination, comparing
32 five-option items with four-option items in a test for certification in a medical specialty with
700 participants. The result showed that deleting nonfunctioning options made a slight increase
(but not a significant a difference) in item difficulty. Based on their report, item discrimination
indexes of both tests were not so different and the reliability of the four-option test was slightly
better than the five-option test. Bruno and Dirkzwager (1995) investigated the optimal number of
options, comparing the results with those from statistical perspectives. According to Bruno and
Dirkzwager, information from the item could increase with more options. However, too many
options per item would produce less information because the amount of information per option
has a maximum. Bruno and Dirkzwager stated that when each option was equally probable,
items with three options would give test takers an opportunity to obtain the maximum
information per option. Bruno and Dirkzwager stated that the result from the information theory
approach was in line with the previous findings from statistical and empirical research (e.g.
Costin, 1970; Cox, 1980; Ebel, 1969; Grier, 1976; Lord, 1944; Rush & Stoddard, 1927; TorabiParizi & Campbell, 1982; Tversky, 1964).
Rogers and Harley (1999) investigated the susceptibility to testwiseness in a high-stake
school-leaving mathematics examination with 158 senior high school students, comparing fouroption items with three-option items. According to Millman, Bishop, and Ebel (1965),
testwiseness means ―a subject‘s capacity to utilize the characteristics and formats of the test and
/or test-taking situation to receive a high score. Testwiseness is logically independent of the
subject matter for which the items are supposedly measure‖ (p. 707). They investigated
differences in item difficulty, item discrimination, and reliability. A total of 31 multiple-choice
items with four options in the 1992 version of the mathematics 30 test was used. Thirteen items

6

were testwiseness susceptible and 18 items were not testwiseness susceptible. Based on the
original four-option test, a three-option test was developed by eliminating a distracter in five
different ways. Both of three-and four-option tests were administered in the same class or section.
Odd numbered students took the original four-option test while even numbered students took the
three-option test. Comparing item difficulty, item discrimination, and reliability between fouroption items and three-option items, Rogers and Harley reported that testwiseness was less
affected when three-option items were administered. Also, the result showed that item difficulty
in fewer-option items was increased and reliabilities of both four-and three-option items were
almost equal. Rogers and Harley also reported that when asked if they would develop a threeoption test or four-option test, the teachers preferred a three-option test because of the difficulty
in writing three functional distracters.
Recently, in their review of multiple-choice item-writing guidelines for classroom
assessment, Haladyna et al. (2002) summarized the results from empirical research regarding
item difficulty, item discrimination and reliability. Haladyna et al. stated that different number of
options did not affect item discrimination while it changed item facility. Haladyna et al. reported
that according to five studies (Landrum, Cashin, & Theis, 1993; Rogers & Harley, 1999; Sidick,
Barrett, & Doverspike, 1994; Trevisan, Sax, & Michael, 1991, 1994) tests with fewer numbers of
options could lead to a decrease in item difficulty. However, two studies (Cizek & Rachor, 1995;
Crehan, Haladyna, & Brewer, 1993) indicated that fewer-option tests resulted in higher item
difficulty. Haladyna et al. stated that, citing Cizek and Rachor‘s result, the fewer options on a test,
the more the item discrimination. However, research from Crehan et al. showed no significant
difference in item discrimination when the number of options varied. When it comes to
reliability, a study from Trevisan et al. (1994) and Sidick et al. indicated that fewer options

7

increased reliability. However, Trevisan et al. found no change in reliability. Haladyna et al. also
mentioned that it did not seem that the item writers could write three plausible distracters with
consistency, claiming that four or five options were not worthy of the extra effort. Instead, three
options could be enough in most testing situations because the most important thing is how
functional the item distracters are, not how many numbers of options the item has.
Rodriguez (2005) reported on a meta-analysis of 27 studies that investigated the number
of options multiple-choice items should have. Rodriguez reported significant changes in item
facility were found when the number of options was reduced, especially research in which the
number of options was reduced to two options. Even though item discrimination was reported to
show significant changes, there was one study in which no change was reported by reducing
from five options to three options and a slight increase of item discrimination was found in
research when the number of options was changed from four to three. Furthermore, from the
perspective of test reliability, it was stated that in most cases the reduced options resulted in a
decrease of reliability. However, there was a case that reliability was not affected significantly by
reducing options from five to three in the meta-analysis or that reliability increased by changing
a four-option test to a three-option test.
Shizuka, Takeuchi, Yashima, and Yoshizawa (2006) investigated the effects of different
numbers of options in an English reading test for a university entrance exam. They changed an
original four-option reading test to a three-option test by discarding the least chosen option. A
total of 38 five-option items were revised into ten five-option items and 28 three-option items.
The original five-option test was administered to all applicants to the university and 1000
Japanese applicants were randomly selected to be analyzed. In this study, 192 Japanese
participants who did not take the original test took the three-option test in about nine weeks.

8

Before analyzing data, the comparability of two groups was run on the ten common items.
However, a t-test result showed that four-option group was significantly higher in their reading
ability. Thus, it was mentioned that common item equating in the Rasch measurement was
operated by using FACET v.3.0 software. Researchers stated that for a distracter analysis, the
―actual equivalent number of options (AENOs) defined by Sato and Morimoto (1976)‖ (p. 46)
was computed. The results indicated that the average item facility and average item
discrimination between the four-option test and the three-option test were not significantly
different. Also, test reliability in the test with three-option items was not significantly lower than
in the test with four-option items. Furthermore, in their analysis of distracters, they found that the
average number of actual functional distracters was less than two. Thus, researchers claimed that
three-option multiple-choice items would be optimal, considering item facility, item
discrimination, test reliability and efficiency of information availability in tests.

Guessing
The issue of guessing can be raised if multiple-choice items with fewer options are
adopted. According to Haladyna (2004), the effects of guessing are quite overestimated.
Haladyna stated that the chance of guessing correctly on ten items is about .00000001% when
the items have four options. Thus, the probability of getting a high score on a test of 50 items,
due to guessing, is quite remote. Also, Costin and Kolstad et al. (Costin, 1976; Kolstad, Briggs,
& Kolstad, 1985) claimed that test takers were not likely to randomly guess. Instead, test takers
seemed to delete the least attractive distracters, resulting in only two or three remaining options.
According to Costin and Kolstad et al. plausible distracters would prevent test takers from

9

getting correct answers through blind guessing. In other words, they argued that the number of
options would not create an effective alternative for an undeserved lucky chance.

CSAT
As one of the high-stakes and norm-referenced tests, the College Scholastic Aptitude
Test (CSAT) is used in Korea to screen out applicants for entrance into universities. The CSAT
consists of multiple-choice items with five options because it is believed that such items are
better at discriminating among higher and lower ability students than four-option multiple-choice
items and result in fewer passing test candidates. The Korea Institute of Curriculum and
Examination (KICE) is responsible for developing test items, administering the CSAT, reporting
the CSAT results, and analyzing the CSAT data. KICE has implemented five-option items in the
CSAT since 1993—before that, all items on the test had four options. Also, each office of
education, including Seoul metropolitan office of education, has been developing the CSAT
prep-tests and administering them five times a year. All prep-CSAT materials have five-option
multiple-choice items, as in the CSAT.
According to Choi (2008), it is not an exaggeration to say that the CSAT results
determine the future social status of each student in Korea. Thus, the national obsession over the
CSAT in Korea is unimaginable as Asia Times described. On the test day, under the government
policy, employees in government and companies arrive to work one hour later than usual to
avoid traffic jams. The Korean stock market also opens at ten in the morning, which is one hour
later than usual. Especially during the listening test, domestic and international flights are not
allowed to take off and land, as the noise may influence the test results. All cars are not allowed
to honk horns near the testing site on that day (Asia times, 2005).

10

As noted in the language testing literature, the development of five-option items takes a
lot of effort, time, and money (Budescu & Nevo, 1985; Delgado & Prieto, 1998: Haladyna, 2004;
Haladyna & Downing 1993; Owen & Froman, 1987; Rogers & Harley, 1999; Straton & Catts
1980). This is especially true in the case of the CSAT, which is administered once a year. In
developing items of the CSAT, more than 650 personnel including professors and teachers are
isolated in a secure place for 33 days because of the high-level of test security that is needed (The
Hankyoreh, 2005).

Research Questions
Considering that many empirical studies have been supporting that items with three
options are optimal, it is highly worthwhile investigating the claims that fewer options do not
affect test statistics and outcomes, especially in the high-stakes CSAT in Korea. Also, given that
studies have rarely been conducted regarding an English listening test in a high-stakes test, this
study will contribute to the future development of a high-stakes listening test item.
An additional claim in the language testing literature is that fewer options in multiplechoice items result in less processing time and thus can lead to shorter exam times that can be
more effective in terms of less anxiety for test takers and less money to administer the test
(Budescu & Nevo, 1985; Delgado & Prieto, 1998; Haladyna, 2004; Haladyna & Downing 1993;
Owen & Froman, 1987; Rogers & Harley, 1999; Straton & Catts 1980). On the other hand,
Owen and Froman (1987) claimed that test developers could, consequently, add more items to a
test in place of the saved time, which would increase content validity and reliability of a test. In
the case of the CSAT English language test, the listening section takes almost 20 minutes in total
as part of the 70-minute test. Therefore, it will be worth investigating the claim that three-option

11

items in a multiple-choice format can shorten the total testing time due to the reduced processing
time per item.
Using a mixed-method research design, I adopted a survey questionnaire to triangulate
quantitative data (Mackey & Gass, 2005). The questionnaire data provide the perspectives of
examinees on the different item formats. The survey questionnaire explored the examinees‘
opinions about the three-option-item format and the administration of this format on the CSAT.
The survey data supplemented the quantitative data, contributing to a better understanding of the
results (Brown, 2001).
The context for this study is in South Korea with EFL learners taking the prep-CSAT
English language listening tests. The research questions are as follows:

1. In terms of the average total score assignment, average item facility, average item
discrimination, and overall reliability, are there significant differences among three-,
four-, and five-option multiple-choice item formats on a high-stakes English listening
test?
2. Can a listening test with three-option items function as selectively as those with fiveoption items in screening applicants for university entrance?
3. Can a three-option multiple-choice item format significantly reduce the exam time?
4. Do test takers prefer a three-option-item format on a high-stakes test?

Methods
Participants
A total of 264 Korean high school students (40 males and 224 females) from six intact,

12

tenth- grade English classes preparing to take the CSAT to enter a university participated in this
study. Each class size was composed of 45 to 50 students. Initially, I started with 300 participants.
However, some of the participants could not take all three tests due to absence from school or
other reasons. As a result, 36 participants were removed from the initial data set since they did
not complete all three tests. Located in Seoul, this high school, in which I have taught for six
years, has students who are approximately eighty percent female and twenty percent male. The
principal of the school generously approved of this research experiment. The students in this
school participated in this experiment voluntarily. At this school, students are taught English four
hours per week in Korean by Korean teachers. The participants had been learning English for
seven years through classroom-oriented instructions. The instruction is intensively focused on
reading comprehension and grammar in preparation for the CSAT. They are usually exposed to
written materials in class. Like those in other EFL learning environments, they rarely have
opportunities to use English outside the classroom. In the next section, I will review the overall
structure of the CSAT English test and introduce the listening tests adopted in this study.

Materials
The CSAT English Test
The English test in the CSAT is composed of 50 multiple-choice items in total: 17
listening items, two or three grammar and vocabulary items, and 30 or 31 reading comprehension
items. Even though the portion of listening items in the CSAT is quite large, listening skills are
not often dealt with in class. Instead, reading and grammar are the main focus of instruction.
Thus, the listening section in the CSAT English language test is prepared for by self-learning or
private tutoring.

13

In the CSAT English listening test, all 17 multiple-choice listening items with five
options are printed in the test booklet with 33 reading items which also have five options. Test
takers can look over the listening items before listening to the audio files. Each item has one
audio file. That is, test takers listen to 17 different audio files, one for each item. First, test takers
listen to the direction for an item. Following the direction, an audio file is played through audio
speakers at a testing site. Then, test takers mark the correct answer based on the given five
options which are printed in the test booklet. For example, the direction such as listen to the
conversation and choose what the man will do next is given. Then the audio file of a
conversation between a man and a woman is played and test takers choose their answer from five
written options. After ten seconds, automatically another direction is played for the next listening
item.

The Prep-CSAT Listening Tests Adopted in Research: Nine Listening Tests
In this research, I adopted three English listening prep-tests for the CSAT. The prep-tests
for the CSAT are administered to high school students seven times a year (In March, May, June,
July, September, October, and November): The offices of educational districts develop and
administer the prep-tests. The three tests that this study used (versions I, II, and III) were actually
administered in November, 2007 (version I), November, 2008 (version II), and November, 2009
(version III). All versions had the same format (five-option items) but differed only in content
(see Appendix A). To investigate the main research effect, as a pre-research task, these three
original tests (I5i, II5i, and III5i) were adapted into two parallel forms with four-option and
three-option items by deleting the least plausible option, as selected by 73 native speakers of
Korean. These new forms are Test I4i, II4i, and III4i (four-option items), and I3i, II3i, and III3i

14

(three-option items). As a result, nine different tests were used in investigating the main research
questions.
Survey Questionnaire
To triangulate the quantitative data, a survey questionnaire was used. This questionnaire asked all
participants their preferred number of options (three, four, or five options) and their opinions
about changing the option format in the CSAT. The survey questionnaire was written in the
native language of participants, Korean, to avoid the concern that English proficiency may affect
the quality of response as Mackey and Gass (2005) mentioned. The questionnaire consisted of
seven items: Two closed-end questions, one Likert-scale question, and four open-ended questions
(see Appendix B). The Likert-scale question (item three) was added to triangulate participant‘s
preference of item format (item one). This questionnaire was distributed to participants if they
took at least one of the three test sessions (see Appendix B).

1. Closed-ended item: The preference among three-option, four-option, and five-optionitem tests
2. Open-ended item: The reasons why a participant preferred three-, four-, or fiveoption-item tests
3. Likert-scale item about the three-option-item test: from 1(most dislike) to 7 (most like)
4. Open-ended item: Advantage of three-option-item tests
5. Open-ended item: Disadvantage of three-option-item tests
6. Closed-ended item: Agreement or disagreement with the administration of the threeoption item format on the CSAT

15

7. Open-ended item: The reason why a participant agreed or disagreed with the
administration of the three-option-item format on the CSAT

Procedures
Pre-Research: Revising the Three English Listening Prep-Test for the CSAT
The pre-research task to revise a test format was conducted in March, 2010. It was a part
of the final project in a testing course that I took during the spring semester of 2010. A total of 73
Korean native speakers in Korea and U.S. participated in revising the three original tests with
five-option items into four-option items tests (phase one) and three-option-item tests (phase two).
In phase one, three groups of the 43 participants took two original listening tests with five-option
items among the three original tests—I5i, II5i, and III5i (see Table 1). While taking tests, they
were required to choose the least plausible options. Based on the wrong answers and the least
plausible options selected by them, I deleted the least plausible options and revised the five
option tests to four-option items (I4i, II4i, and III4i). In phase two, another 30 participants, who
did not participate in phase one, took two of the revised tests with four-option items among three
test (see Table 2). Like in phase one, they selected the least plausible option while taking the tests
with four-option items. Finally, I revised the four-option-item tests to three-option-item tests (I3i,
II3i, and III3i) by deleting the least plausible option. In revising them, Tversky‘s (1964)
condition—the total exam time is in proportion to the number of options in a test— was not
considered as in the study of Owen and Froman (1987). As a result, nine different tests were
generated to investigate the research questions.

16

Table 1
Revision of the Three Original Prep-Tests to the Four-option-item Tests
Group
Tests taken
Tests created
1 (n=15)
I5i
II5i
I4i/II4i
2 (n=15)

II5i

3 (n=13)
III5i
Note: n = number of test takers.

III5i

II4i/III4i

I5i

III4i/I4i

Table 2
Revision of the Four-option-item Tests to the Three-option-item Tests
Group
Tests taken
Tests created
I3i / II3i
1 (n=10)
I4i
II4i
2 (n=10)

II4i

3 (n=10)
III4i
Note. n = number of test takers.

II3i/III3i

III4i
I4i

III3i/I3i

Main Research
The main data collection sessions were conducted from the second week to the fourth
week of May, 2010 with the consent of participants from the high school in which I have taught
in Korea. As discussed earlier, six intact English classes were divided into three groups to take
nine different tests, administered to the three groups based on Latin squares, which controlled for
order effects and practice effects. The three different groups took the three tests with a one-week
interval between each administration (see Table 3). As a result, all participants took three-, four-,
and five-option tests in different versions (version I, II, and III).

17

Table 3
Test Administration Schedule
Group

Session
3

1

2

1 (n=86)

Male/
Female
11/75

4

II4i
(four-option)

I3i
(three-option)

III5i
(five-option)

Survey
session

2 (n=89)

15/74

I5i
(five-option)

III4i
(four-option)

II3i
(three-option)

Survey
session

3 (n=89)

14/75

III3i
(three-option)

II5i
(five-option)

I4i
(four-option)

Survey
session

Note. n = number of test takers.

This study additionally investigated the claim that fewer-option items in multiple-choice
questions result in less processing time and thus can lead to shorter exam times (Crehan,
Haladyna, & Brewer, 1993; Owen & Froman, 1987). In the CSAT English language test, the
listening section takes 20 minutes out of the 70-minute test. If a test with fewer-option items in
the CSAT English listening test can significantly reduce the exam time—as mentioned, during
this 20-minute CSAT English listening test administration, no plane is officially allowed to take
off or land—it may contribute to the test efficiency, content validity, and reliability. Thus, this
claim was examined by recording response time per item. In each test taking session, three
participants (a total of 37 participants) recorded their response time per item by using a stop
watch.
Finally, to explore test takers‘ opinions about the three-option-item format, I conducted a
survey session with the participants after completing the third session. They took 10 to 15
minutes to answer the seven questions and the survey was done anonymously to protect the
individual test takers‘ confidentiality (Dörnyei, 2003).

18

Scoring
Listening test.
A scoring system where some items are weighted with more or less points based on a
pre-expected level of difficulty has been adopted on the CSAT to increase item discrimination. In
a prep-English listening test for the CSAT, the most difficult item out of 17 items is given three
points and the easiest item has one point. The other 15 items are given two points. However, in
this study, each item was equally scored as one point for a correct answer and zero for an
incorrect one, unlike the original test, with the rationale that the pre-expected level of difficulty
can be different from the actual level of difficulty that will be identified only after the
administration of a test.

Survey questionnaire.
Regarding the two closed-ended items, item one was coded three, four, or five based on
the preferred number of options. Item six was coded dichotomously: one for agreement and two
for disagreement. Item three, a Likert-scale item, was coded from one (most dislike) to seven
(most like). Three open-ended items was processed through open coding. I decided initial
categories through the first reading. Based on the initial categories, I coded item two, four, five,
and seven. Then, I finalized these categories into detailed subcategories by investigating
connected patterns and removing overlapping themes. Once final categories were listed, I coded
10 percent of the data. (Ten percent was based on the previous literature (Brown, 2001; Chandler,
2003). Then, to confirm the consistency of coding procedure, another Korean rater coded these
10 percent randomly selected data based on the final coding list. By comparing the coding results

19

from another coder and me, the intercoder agreement was checked. Differences in coding were
resolved through discussion.

Data Analysis
Quantitative Data
Preliminary Data Analysis
Before analyzing main effects, the comparability of three groups was checked by
investigating achievement test scores. The achievement test was administered one month before
the 1st session (the second week of May) of research. The achievement test consisted of listening
items and reading comprehension items: Forty percent of the items were measuring listening
skills and the rest of the items were for reading comprehension. As listening skills had significant
correlation with reading skills (Hedrick & Cunningham, 1995, 2002), the achievement test scores
were used to confirm the compatibility of the three groups in this study. The mean scores of three
groups are as followings: The mean score of group 1 was 70.74, that of group 2 was 69. 98, and
for group 3, was 70. 12 (Maximum score of the achievement test is 100).
Second, to ensure whether the three original tests with five-option items (I5i, II5i, and
III5i) were of equal difficulty, total scores of each five-option-item test were analyzed by using
Kruskal-Wallis test. Because Kolmogorov-Smirnov statistics showed that the distribution was
not normal (p < .05, Table 4) nonparametric testing was used to check the equal difficulty of the
tests. Kruskal-Wallis test indicated a significant difference across the three versions,
H(2) = 29.89, p = .000. To explore significant differences, Mann-Whitney test was used with the
adjustment of Bonferroni significance level as .017. The result revealed that test version II and
version III were not significantly different. However, version I was significantly different from

20

version II and version III (Table 5). The mean of three original tests with five-option items
indicated that version I was the hardest version and version II and III were easier than version I.
Also, skewness showed that the hardest test, version I, was closer to having a normal distribution
than other two versions were (Table 4). Thus, I will first analyze data without considering the
level of difficulty. Then, I will separately analyze the hardest test, version I, and version II and III,
the easier versions, to explore the differences among the tests with the different numbers of
options.

Table 4
Descriptive Statistics of Three Original Tests with Five Options (I5i, II5i, and III5i)
Test Version
n
Mean
SD
Kolmogorov- Skewness
(Std. Error)
Smirnov
(Sig.)
I5i
89
12.65 (.34)
3.21
.000
-0.74
II5i

86

14.31 (.32)

3.04

.000

-1.91

III5i
86
14.71 (.33)
3.07
Note. n = number of test takers. Maximum score =17.

.000

-2.01

Table 5
Significance in terms of the Average Scores in Five Original Tests (I5i, II5i, and III5i)
Version pair
U
z
p
r
*
I X II
2563.50
-4.10
.000
-.31
II X III

3294.00

-1.62

.11
.000*

III X I
2158.50
-5.03
Note. Bonferroni adjustment (p < .017).

-.38

Nine English Listening Prep-Tests for the CSAT
This research was designed to be robust across the different level of difficulty in the tests,
because all participants took the difficult test once based on Latin-square combination that could

21

contribute to the counterbalancing of different level of difficulty. Thus, without considering the
level of difficulty, the differences in the average total score assignment among three-, four-, and
five-option-item tests were investigated.

Average total score assignment.
As Kolmogorov-Smirnov statistics showed, the distribution was not normal (p < .05).
Therefore, Friedman test, the nonparametric counterpart of one-way ANOVA for repeated
measures, was used. As expected, the mean for the three-option-item tests was the highest and
the mean for five-option-item tests was the lowest. The mean of four-option-item tests ranked
between those of three- and five-option-item tests (Table 6). The result revealed a significant
2

difference among three-, four-, and five-option-item tests, x (2) = 11.40, p = .003 (Table 7). To
detect the differences, Wilcox tests were adopted with the Bonferroni adjustment as .017
significance level. The tests reported that the three-option-item tests were significantly different
from the five-option-item tests T = 7464.00, p = .000, r = -.16 (or z = -3.58, p = .000, r = -.16).
There was no significant difference between the other pairs of item formats (Table 8).

22

Table 6
Descriptive Statistics of Three-, Four-, and Five-option-item tests: All Nine Tests
Test
n
Mean
SD
Kolmogorov- Skewness
(Std. Error)
Smirnov
(Sig.)
3-option
264
14.48 (.17)
2.72
.000
-1.80
(I3i/II3i/III3i)
4-option
(I4i/II4i/III4i)

264

14.03 (.18)

2.89

.000

.000

5-option
264
13.88 (.20)
3.22
(I5i/II5i/III5i)
Note. n = number of test takers. Maximum score = 17.

-1.15

-1.37

Table 7
Significance in terms of the Average Total Scores in All Nine Tests
Test
Mean rank
Chi-square( x2)
df
3-option
2.15
(I3i/II3i/III3i)
4-option
(I4i/II4i/III4i)

1.97

5-option
(I5i/II5i/III5i)

p

1.89

11.40

2

.003*

p < .05.

Table 8
Paired Comparisons: Differences in terms of the Average Total Score Assignment in All Nine Test
Test pair
T
z
p
r
3opt. X 4opt.
9114.50
-2.35
.019
4opt. X 5opt.

9664.00

- .71

.48

5opt. X 3opt.
7464.00
-3.58
Note. Bonferroni adjustment (p < .017). Opt. = option.

23

.000*

-.16

Item facility and item discrimination.
In line with the expectation, the tests with five-option items were slightly more difficult
than those with the three- and four-option items. The homogeneity of variances was verified.
Therefore, the parametric test, one-way ANOVA was used to explore any significant difference.
The results revealed that the average item facility was not significantly different across the three
different formats (Table 12).
Aligned with results from the previous literature in testing, the average item
discrimination in tests with five-option items was slightly higher than tests with the other two
formats. The result of one-way ANOVA indicated the average item discrimination was not
significantly different across the three different item formats (Table 13).

Correlation.
As the CSAT is a norm-referenced test, it is worthwhile investigating about the changes
in test takers‘ ranks across the three different formats of tests. Spearman‘s correlation revealed
positive correlations among three different item formats (Table 9). However, the correlation
coefficients were rather low (.444 to .541), indicating the three different item formats construed
exams that perhaps tapped into different underlying test constructs. This will be discussed further
in the discussion section.

24

Table 9
Correlations of Test Scores among Three Different Item formats
3-option
3-option
Spearman‘s rho
1
Sig. (2-tailed)

4-option
.444**
.000

5-option
.541**
.000

4-option

Spearman‘s rho
Sig. (2-tailed)

1

.501**
.000

5-option

Spearman‘s rho
Sig. (2-tailed)

1

n
264
264
264
Note. n = number of test takers. Correlation is significant at the 0.01 level (2-tailed).

Harder Version (version I) vs. Easier Versions (versions II &III)
Based on the preliminary analysis regarding the level of difficulty of three test versions,
I separately analyzed version I (the hardest) from version II and III to investigate the differences
among three-, four-, and five-option item formats in terms of the average total score assignment.

Average total score assignment: harder version (version I).
Kruskal –Wallis test was used due to the skewed data distribution. As shown in table 10,
the three-option-item test (I3i) was the easiest while five-option-item test (I5i) was the hardest.
The result revealed a significant difference among the three-, four-, and five-option-item test
groups, H(2) = 8.29, p = .016. To explore significant differences, Mann-Whitney test was used
with the adjustment of Bonferroni significance level as .017. Same as when nine tests were
considered, the results indicated that only the three-option-item test was significantly different
from the five-option-item test (U = 2850.00, z = -2.94, p = .003, r = -.23).

25

Table 10
Descriptive Statistics of Three-, Four-, and Five-option-item Tests: Harder Version Tests
Test
n
Mean
SD
Kolmogorov- Skewness
(Std. Error)
Smirnov
(Sig.)
3-option (I3i)
86
14.03 (.26)
2.43
.000
-1.01
4-option (I4i)

89

13.25 (.34)

3.24

.000

-. 95

5-option (I5i)
89
12.65 (.34)
Note. n = number of test takers.

3.20

.000

-. 74

Table 11
Paired Comparisons: Differences in terms of the Average Total Score Assignment in Harder
Version Tests
Test pair
U
z
p
r
I3i X I4i
3408.50
-1.26
.21
I4i X I5i

3447.00

-1.50

.13
.003*

I5i X I3i
2851.00
-2.94
Note. Bonferroni adjustment (p < .017).

-.23

Item facility and item discrimination: harder version (version I).
The average item facility was as follows: the three-option item format (M = .83,
SD = .12), the four-option item format (M = .78, SD = .12), and the five-option item format
(M = .74, SD = .14). As expected, the tests with five-option items were slightly difficult than
those with the three- and four-option items. The one-way ANOVA result indicated that the
average item facility was not significantly different across the three different formats,
F(2,48) = 1.78, p = .18 (see Table 12).
The average item discrimination in the three-option item format was M = .30 (SD = .20),
in the four-option item format, M = .40 (SD = .17), and in the five-option-item test, M = .40
(SD = .17). The average item discrimination in tests with five-option items was slightly higher

26

than tests with three-option items. However, the average item discrimination was not
significantly different across the three different item formats, F(2,48) = 1.81, p = .18 (see Table
13).

Average total score assignment: easier versions (version II & III).
Aligned with the expectation, the mean for the thee-option-item format was M = 14.66
(SD = 2.85), that for the four-option-item format was M = 14.33 (SD = 2.62), and for five-option
item format was M = 14.51 (SD = 3.05). However, different from the result with all nine tests
and that with harder tests (version I), Kruskal–Wallis test indicated no significant difference
across the three different item formats in terms of the average total score assignment.

Item facility and item discrimination: easier versions (version II & III).
The average item facility was as follows: the three-option item format (M = .86,
SD = .08), the four-option item format (M = .85, SD = .10), and the five-option item formats
(M = .85, SD = .10). The one-way ANOVA revealed that the average item facility was not
2

significantly different across the three different item formats, F(2,48) = .09, p = .92, ω = -.04
(see Table 12).
The average item discrimination in the three-option item format was M = .25 (SD = .12),
in the four-option item format, M = .32 (SD = .17), and in the five-option item format, M = .33
(SD = .18). There was no significant different across the three different item formats,
2

F(2,48) = 1.32, p = .28, ω = .01 (see Table 13).

27

Table 12
Average Item Facilities: All Nine Tests, Harder Version (I), and Easier Versions (II & III) Tests
Test
N
Mean
SD
df
F
p
ω2
(Std. Error)
All 9 tests
3-option
17
.85 (.02)
.07
4-option

17

.83 (.03)

.08

5-option

17

.82 (.02)

.09
2

.83 (.03)

17

.78 (.03)

17

.74 (.03)

1.78

.18

.26

.09

.92

-.04

.12

5-option

.26

.12

4-option

.47

2

17

.78

2

Harder version (I)
3-option

.14

Easier versions (II & III)
3-option

17

.86 (.02)

.08

4-option

17

.85 (.02)

.10

5-option

17

.85 (.03)

.10

p < .05.

28

Table 13
Average Item Discriminations: All Nine Tests, Harder Version (I), and Easier Versions (II & III)
Tests
Test
N
Mean
SD
df
F
p
ω2
(Std. Error)
All 9 tests
3-option
17
.31 (.03)
.12
4-option

17

.35 (.04)

.14

5-option

17

.38 (.04)

.15
2

.30 (.05)

17

.40 (.04)

17

.40 (.04)

.18

.03

1.32

.28

.01

.17

5-option

1.81

.20

4-option

.06

2

17

.32

2

Harder version (I)
3-option

1.17

.17

Easier versions (II & III)
3-option

17

.25 (.29)

.12

4-option

17

.32 (.40)

.17

5-option

17

.33 (.43)

.18

p < .05

Overall Test Reliability
The reliabilities of each listening test were examined through the Cronbach alpha
coefficient as seen the table 14. The average reliability of the three-option-item tests was M = .71,
in the four-option-item tests M = .77, and in the five-option-item tests M = .82, and the reliability
increased as the number of options were added. However, the results from a one-way ANOVA
revealed no significant difference among the reliabilities of the three different item formats.

29

Table 14
Overall Reliability
Version
I

3-option
.66

Item format
4-option
.79

5-option
.76

II

.85

.74

.85

II

.63

.78

.86

Overall

.71

.77

.82

Processing Time
A total of 37 participants recorded their processing time while taking a test: For threeoption-item tests, 12 participants recorded their time, 13 for four-option-item-formats, and 12 for
five-option-item tests. The average exam time including audio file playing time was 667.92
seconds for three-option-item tests, 696.00 seconds for four-option-item tests, and 736.08
seconds for five-option-item tests. The average processing time per item was 41 seconds. This
indicated that the administration of the three-option-item tests could save 68.16 seconds, in
which one or two more items (11.8% of the total items) might be added.

Survey Data
If participants took at least one of the three tests, they were asked to fill out a survey at
the end of the study. In analyzing the survey data, missing values were deleted pairwise. Due to
the absence on the third session day or the rejection to respond to survey questions, 35 were
missing from 300 initial participants. As a result, 265 participants‘ data from the questionnaire
were analyzed. Another Korean rater coded 10% of the data (randomly selected) to ensure the

30

reliability of coding. The agreement between the raters on that 10% of the data was 87%. The
disagreements were resolved through an in-depth discussion.

Closed-Ended and Likert-scale Items
Item one, three, and six.
Among 265 participants, 143 participants preferred the three-option-item test, 75
participants showed preference of the four-option-item test, and 45 respondents chose the fiveoption-item test (Figure 1).

No opinion, 1%
(n=2)

Preferred Item Format
3-option, 54%
(n=143)

5-option, 17%
(n=45)

3-option
4-option
5-option
No opinion

4-option, 28%
(n=75)
Figure 1. Preferred item format among three-, four-, and five-item tests

The Likert-scale question on the test takers‘ preference for a three-option-item test (item
three on the questionnaire; one was ―most dislike‖ and seven was ―most like‖), triangulated the
result from the first item. Spearman‘s correlation revealed a significant negative correlation
2

between the preferred number of options and the Likert-scale points, r = -.61, p < .001, R = .37:
The group preferring the three-option-item test showed the highest average points (M = 5.94, SD

31

= 1.28), the average points of those preferring the four-option item format was in-between (M =
4.36, SD = 1.11), and those preferring the five-option-item test had the lowest average points in
the Likert-scale ( M = 3.60 , SD = 1.51).
Regarding question six—whether test takers agree or disagree that an administration of
the CSAT with a three-option item format would be fair—164 respondent agreed and 86
respondents disagreed (Figure 2). A two-variable, Pearson‘s chi-square test was used to explore
the relationship between the preferred number of options and the test takers‘ agreements or
disagreements with the three-option-item administration on the CSAT. The test indicated that
there was a significant association between the preferred number of options and the agreement
2

with the administration of the three-option item format, X (2) = 52.71, p < .001, Φ = .46. Among
the respondents preferring three options, 85% agreed with the administration of the three-option
items on the CSAT. Among the respondents preferring the four-option item format, 47% agreed
with it. Only 34% agreed among the respondents preferring five options (Figure 3, Table 15).

Administration of the Three-option-item Format on the
CSAT
Agree
68%
(n=164)

Disagree
32% (n=78)

Agree
Disagree

Figure 2. Administration of the three-option-item format on the CSAT

32

Group with the three-option-item format preference
Disagree, 15%
(n=20)

Agree
85%
(n=117)

Agree
Disagree

Group with the four-option-item format preference
Agree
47% (n=33)
Disagree
53% (n=37)

Agree
Disagree

Group with the five-option-item format preference
Agree
34%
( n=14)

Disagree,
66% (n=27)

Agree
Disagree

Figure 3. Agreement with the administration of the three-option-item format based on the
preferred number of options

33

Table 15
Agreement vs. Disagreement based on the Preferred Number of Options
Preferred options (n)
Agree (n)
Disagree (n)
3 options
(137)
117
20
4 options

(70)

33

37

5 options

(41)

14

27

164

84

Total
(248)
Note. n = number of respondents.

Open-ended Items
Item two: why do you prefer the three-, four-, or five-option-item test?
Among the 143 respondents who preferred a three-option-item test, 48.3% liked the item format
because it was easy to choose a correct answer. A second group of 12.6% responded that they
liked three options due to time saving and the user-friendly format it presents. Among 75
respondents preferring the four-option-item test, 61.8% chose this format because they thought
that four would be the proper number of options and 10.7% liked it because they were used to
the four-option-item test. Among 45 respondents, the five-option preferring group, 62.2% said
that they were used the five-option-item test and 15.5% chose it because the five-option-item test
would, they said (in their own words), increase item discrimination (Table 16).

34

Table 16
Item Two: Why do you prefer a three-, four-, or five-option-item test?
3 options
4 options
Reason
n
Reason
- I can easily choose an answer.
70
- I think that the number of
options is proper.

5 options
n
46

Reason
- It is a familiar format because I
have been taking the five-option
items.

n
28

- I can save time.

18

- It is a familiar format because I
have been taking the four-option
items.

8

- I think that five-option items will
have better item discrimination.

7

- It is a user-friendly format

18

- I can easily choose an answer.

4

- I can easily choose an answer.

1

- I think that the number of
options is proper.

9

- I feel less anxiety with threeoption items.

4

- I think that the number of
options is proper.

1

- I feel less anxiety with threeoption items.

7

- I think that four-option items
will have better item
discrimination.

4

I can concentrate more on the
test.

1

- I do not think there is a difference among three-, four-, and five-option items.
Note. n = number of respondents.

35

11

Item four: the advantage and disadvantage of the three-option-item tests.
The first advantages of the three-option item format was the higher possibility of correct
answers (N=119) and the second was time saving (N=35) among 263 respondents. The
disadvantage of the three-option item format was the lower item discrimination (N=27). Also, the
respondents (N=22) were concerned that the level of difficulty would be increased due to the
reduced number of options. Interestingly, among those preferring the three-option item format
(N=143), 57.3% (N=82) of the participants did not respond with anything as a disadvantage of
the three-option item format (Table 17).

36

Table 17
Item Four: Advantages and Disadvantages of the Three-option-item Tests
Advantage
Disadvantage
Reason
n
Reason
- It has the higher possibility to
35
- Item discrimination will be
choose correct answer than four- or
decreased.
five-option items
(e.g., I can easily choose a correct
answer. A correct answer can be
easily guessed due to reduced
number of options)
- I can save time.

10

- Item facility will be decreased.
(Items will be more difficult.)

n
27

22

- It is a user-friendly format.

7

- There is the higher possibility to
have a correct answer by guessing
because of reduced number of
options.

7

- I can concentrate more on the test.

2

- Test validity will be decreased.
(e.g. Face validity will be decreased.
The test with three-option items does
not seem to screen out applicants for
universities.)

3

- I feel less anxiety.

2

- The test format (three-option items
format) is so unfamiliar.

2

- Item discrimination and test
validity will be increased.
Note. n = number of respondents.

1

Item seven: why do you agree or disagree with the administration of the three-option
item tests?
Among those who agreed with the administration of the three-option item format on the
CSAT (N=164), 34.8% (N=57) stated the higher possibility of correct answers as the reason why
they agreed and the time-saving (N=22) was the next reason. Respondents who disagreed (N=86)
pointed the lower item discrimination as the main reason (N=36). Also, 16 respondents disagreed

37

because they thought the items on the CSAT would be more difficult if the number of options
were to be reduced (Table 18).

Table 18
Item Seven: Why do you agree or disagree with the administration of the three-option-item tests?
Agree
Disagree
Reason
n
Reason
n
- It has the higher possibility to
57
- Item discrimination will be
36
choose correct answer than four- or
decreased.
five-option items
(e.g. I can easily choose a correct
answer. A correct answer can be
easily guessed due to reduced
number of options)
- I can save time.

22

- Item facility will be decreased.
(Items will be more difficult.)

16

- I do not think there is a difference
among three-, four-, and five-option
items.

17

- The test format (three-option items
format) is so unfamiliar.

14

- It is a user-friendly format.

13

- There is a higher possibility to have
a correct answer by guessing because
of reduced number of options.

3

- I feel less anxiety.

7

- Test validity will be decreased.
(e.g., Face validity will be decreased.
The test with three-option items does
not seem to screen out applicants for
universities.)

3

- Item discrimination and test
validity will be increased.

4

- I do not think there is a difference
among three-, four-, and five-option
items.

3

- I can concentrate more on the test.
Note. n = number of respondents.

3

Results
In the following section, I sum up the results based on the research questions.

38

(1) In terms of the average total score assignment, average item facility, average item
discrimination and overall reliability, are there significant differences among three-, four-, and
five-option multiple-choice item formats on a high-stakes English listening test?
With respect to the average total score assignments in all nine tests, the significance was
revealed between three-option-item tests and five-option-item tests (T = 7464.00, p = .000,
r = -.16 or z = -3.58, p = .000, r = -.16 with Bonferroni adjustment p < .017). When the hardest
tests (version I) were separately investigated from versions II and III, Mann-Whitney test
revealed the significant difference between the three-option-item tests and the five-option-item
tests (U = 2850.00, z = -2.94, p = .003, r = -.23 with Bonferroni adjustment p < .017), like the
result when all nine tests were included. When the easier tests (version II and version III), which
were equal of difficulty, were analyzed, no significant difference was indicated across the three-,
four-, and five-option-item tests.
In terms of the average item facility, average item discrimination, and overall test
reliability, a one-way ANOVA reported no significant difference, when considering all nine tests
and the level of difficulty. Additionally, Spearman‘s correlation revealed positive correlations
among three different item formats. However, the correlation coefficients were rather low (.444
to .541), which could mean that the ranks of test takers were a bit changed across three different
formats of tests.

(2) Can a listening test with three-option items function as selectively as those with five-option
items in screening applicants for university entrance?
Considering the normality of the data distribution, the five-option-item tests were closer

39

to the normal distribution than other two test formats: The mean for the three-option-item tests
was M = 14.48 (SD = 2.72), for the four-option-item tests was M = 14.03 (SD = 2.89), and for the
five-option-item tests was M = 13.88 (SD = 3.22). From the perspective of the level of difficulty
in a test, the hardest version was slightly closer to the normal distribution: The mean for version I
was M = 12.65 (SD = 3.20), for version II was M = 14.31(SD = 3.04), and for version III was
M = 14.71, (SD = 3.06). However, as indicated by the average item discrimination, there was no
significant difference among the item formats in terms of how they screen examinees. That is,
the listening tests with three-option items could function selectively as four- and five-option-item
tests in screening out applicants.

(3) Can a three-option multiple-choice item format significantly reduce the exam time?
The three-option-item format could reduce the total exam time by 10.8%, compared with
the time for five-option items (From 736.08 second to 667.92 seconds). Considering the average
processing time per item was 41 seconds, the reduction in time by 10.8% could allow for the
possibility of adding one or two more items (11.8% of total 17 items in the CSAT listening test),
which may increase the validity and reliability of the test.

(4) Do test takers prefer a three-option-item format on a high-stakes test?
As the survey data indicated, 54% of the respondents preferred the three-option-item test.
Also, 68% of the respondents agreed that the administration of the three-option-item test on the
CSAT would be fair. In exploring the relationship between the preferred number of options and
the agreement or disagreement with the administration of the three-option items on the CSAT,
results from a two-variable, Pearson‘s chi-square test indicated that there was a significant

40

association between the preferred number of options and the agreement with the administration
2

of the three-option-item format (X (2) = 52.71, p = .000, Φ = .46): The group that prefer the
three-option-item format more tends to agree with the administration of the three-option-item
format on the CSAT than those preferring the four- or five-option-item format. However,
interestingly, 6.5% (N=22) of the participants among those who preferred the three-option-item
format disagreed with the administration of three-option items on the CSAT. The main reason for
their disagreement was a perceived increased possibility of guessing, which they indicated would
unfairly benefit those who did not study as hard as they did.

Discussion
This study found that in terms of the average total score assignment, the five-option-item
tests could spread out examinees‘ scores more closely along a normal distribution than the threeoption-item tests. However, the average item facility, average item discrimination, and overall
reliability did not indicate any significant differences among the three different item formats.
Thus, results from this study support previous research (e.g., Costin, 1972; Delgado & Prieto,
1998; Green, Sax, & Michael, 1982; Owen & Froman, 1987; Sidick, Barrett, & Doverspike,
1994; Trevisan, Sax, & Michael, 1991; Trevisan, Sax, & Michael, 1994), stating that tests with
different numbers of options do not indicate any statistically significant differences.
However, in this study I also ran correlations among the three different item formats of
tests, which previous studies on the optimal number of options did not investigate before. While
Spearman‘s correlations were positive and statistically significant across the three different item
formats of the tests, the strength of the correlations were slightly lower than what I expected. The
result from the correlation suggests that factors external to the listening construct might be

41

differentially tapped into depending on the number of options presented.
Regarding the correlation result, I speculate that testwiseness strategies were an
additional construct that affected test takers‘ scores on the five-option-item format tests while
testwiseness strategies probably affected the scores least when three-option items were presented.
Rogers and Harley (1999) also reported that items with fewer options were less affected by
testwiseness. According to them, 13 items were susceptible to testwiseness in their four-optionitem test while only four items were testwise susceptible in their three-option-item test. My
speculation can be explained by the survey data in the present study: Some of participants
reported that they felt the three-option-item test was more difficult than the five-option-item test
because they could not delete any non-plausible options before listening to audio files as they
usually did on listening tests with more options. Also, test anxiety might be another factor
affecting the scores. According to the survey data, some test takers (n=7) had less anxiety with a
three-option-item format whereas there were test takers (n=8) who felt less anxiety with the fouroption-item format, over the three-option item format. Additionally, test takers‘ familiarity with
the five-option-item format possibly affected the scores: Most of test takers were trained with
five-option-item formats since all official middle/high school tests in Korea are administered
with the five-option-item format. (Test takers were quite surprised when they were given the
three-option-item test. They told me that the test seemed to be so awkward, like it was missing
something.) Finally, the cognitive reading load might be considered as one of the factors: the
five-option-item tests required reading options faster than the three-option-item tests did. I will
discuss in more depth the possible effects of testwiseness strategies and the cognitive reading
load on outcomes and perceptions later in this section.
In a high-stakes test, item discrimination is critical for screening and segregating test

42

takers. In the CSAT, test developers increased the number of options (from four to five options)
and adopted a weighted point system to increase item discrimination in 1993. However, with
respect to efficiency in developing items, this study suggests that the three-option-item format is
worth considering even for high-stakes tests like the CSAT. This also has been supported by the
claims that most multiple-choice items have only one or two functional distracters (Haladyna,
Downing, & Rodriguez, 2002) and the increased number of options are not related to better
testing statistics, rather the extra options provide unintended clues to other items (Rodriguez,
2005). A three-option-item format can contribute not only to saving time but also to covering
more content, if additional items are inserted (Rodriguez, 2005). This results in accurate testing
outcomes with increased test reliability (Rodriguez, 2005; Shizuka et al., 2006).
From the correlation analysis and the survey data, this study also suggests that the threeoption-item format can function well to prevent the need for testwiseness strategies, as Rogers
and Harley (1999) claimed. When asked about disadvantages of the three-option-item format,
students (n=22) stated that the three-option-item format would make a test more difficult. When
all distracters are equally plausible, the item itself is more difficult—or so it appears to be—
because there are no obvious distracters that can be immediately eliminated. Therefore, some
stated they felt like the five-option items were better, in that they provided test takers with an
initial satisfaction of being able to immediately recognize non-plausible distracters and eliminate
them: This nice, psychologically pleasing effect (being able to easily eliminate distracters) was
missing from the three-option-item format. This relates to the concept of ―testwiseness‖ (Rogers
& Harley, 1999)—these students have been highly trained in test taking, and one of their
strategies, most likely, is to read quickly through the options and eliminate non-plausible
distracters, even before listening to an audio file of a listening item. Some student who disagreed

43

with the three-option-item format on the CSAT said that the test items with three options made
them so confused: All options seemed to be correct answers. They stated that they could not use
the satisfying strategy to increase the possibility of a correct answer by deleting two or more
options before audio files play.
When the five-option-item format in a listening test is used to make tests more
difficult—based on the expectation that test takers will get fewer correct answers and, therefore,
a much wider spread of scores is obtained—test validity needs to be considered. Buck (2001)
claimed that the factors affecting the level of difficulty in a listening test are ―linguistic
characteristics, organization, familiarity and explicitness‖ of audio content (p. 150). Thus, more
options do not equate with a better measure of listening skills. Rather, adding more options to
items presents test takers with an additional reading cognitive burden: Test takers are forced to
read options as fast as possible within the limited time and choose a correct answer. Therefore,
instead of pouring efforts into writing more options per item, spending the same amount of time
to develop more three-option items that are carefully constructed for measuring listening ability
may contribute to the enhancement of content and construct validity.
In alignment with the study by Owen and Froman (1987), this research found that the
three-option-item format could save 10.8% of the total exam time, which enables to increase 11.8%
of total number of items on the CSAT English listening test. Therefore, better test reliability can
be attained through the added items, because longer tests (with more items) have better reliability
than shorter tests: With five options, more time is needed to answer the questions, so the test
must necessarily have fewer items, which limits reliability.
Even though prior literature has stated that blind guessing should not be an issue (Costin,
1976; Haladyna, 2004; Kolstad, Briggs, & Kolstad, 1985), in the context of this study, it may be

44

a critical part to be considered: According to Ebel (1965), blind guessing is ―selecting the answer
at random without considering the content of the options‖ (p. 715). In a time-limited test, as
Rodriguez (2005) stated, blind guessing can threaten test validity. This claim is applicable to a
lower-ability group of examinees on the CSAT English listening test. By blind guessing, lowerability test takers can get 33.3% of the answers correct on a three-option-item test, whereas those
taking a five-option-item test have only a 20% chance of correct answers. Considering the
unique situation in Korea—the CSAT is a very high-stakes test that can determine the future
social status of each student in Korea (Choi, 2008). For instance, during the listening test,
domestic and international flights are not allowed to take off and land, as the noise may affect the
test results (Asia times, 2005)—the 13.3% score difference is most likely to be viewed as
extremely detrimental to test reliability. Thus, the effect of blind guessing and the number of
items in a high-stakes test needs to be examined robustly.
In sum, the results of the present study, supporting those from the testing literature,
suggest that the three-option-item format contributes to developing test items more efficiently,
enhancing test validity and reliability, and preventing testwiseness strategies.

Conclusion
This study suggests that the three-option-format would be worthy of consideration in a
high-stakes test. However, context dependent factors need to be considered to answer the
argument of how many options are optimal that has been debated for decades.
Statistically, the three-option-item format can be recommended in developing test items
more efficiently while screening out test takers selectively, when the results in this study are
considered—no significant differences across the three different item formats in terms of the

45

average item facility, average item discrimination, and overall reliability. Also, the three-optionitem format can increase the test reliability by adding more items due to the saved exam time.
Additionally, from the cognitive perspective, the three-option-item format in the CSAT listening
test can enhance test validity (content and construct validity) by reducing reading loads in a
limited time—when the listening test is supposed to measure mostly listening skills, not
combined with cognitive skills.
Emotionally, the three-option-item format is strongly preferred among test takers
because fewer options make it easier for them to choose correct the answers. Interestingly,
among the test takers with testwiseness strategies, the five-option-item format is preferred to the
three-option-item format because testwiseness strategies to remove non-plausible options can be
easily applied. Thus, the three-option-item format could be recommended to prevent the need for
testwiseness strategies.
In addition, based on the correlation analysis that the tests with different item formats
did not correlate highly, it is possible that the inclusion of more options could potentially
increase variability in the underlying construct being measured. That is, more options could force
the need for testwiseness strategies and other cognitive capacities irrelevant to listening skills.
Thus, it is also essential to consider what needs to be measured by the listening test: mostly
listening skills alone or listening skills combined with other academic and cognitive capacities
that are perhaps irrelevant to the underlying listening skill construct.
However, contextually, test developers of the CSAT may be reluctant to adopt the threeoption-item format for two reasons: (1) The CSAT is such a high-stakes test that the need for
even non-statistical differences—very minute, non-statistically significant differences in
scores—may be still needed for very finite discrimination among test takers; and (2) The effect

46

of blind guessing in a time-limited test is extremely detrimental to reliability. Considering
Koreans‘ unique obsession with the CSAT scores, it might be hard to suggest to the
administration of the CSAT to adopt and employ a three-option-item format. Therefore, there
may be no right answer to the optimal number of options. Test developers and researchers will
have to consider the statistical, cognitive, emotional, and contextual factors when determining
the optimal number of options for their test.

Limitations and Future Research
When it comes to the limitations in the present study, I adopted the previouslyadministered CSAT-prep tests that students can download via the Internet. This may have the
possibility that participants were exposed to the original three versions of the five-option-item
tests. Thus future research with new tests would be worthwhile. In addition, the effect of the
CSAT‘s weighted point system on item discrimination was not investigated here. Therefore it is
worthy to consider the extent to which a weighted points (assigned based on the pre-expected
item facility) could affect item discrimination and overall test reliability. Finally, the effect of
blind guessing and the number of items, which was not investigated in the present study, would
be an interesting topic for future research.

47

Appendices

Appendix A-1
Test Version: I5i

1. 대화를 듣고, 남자가 구입할 물건을 고르시오.
①

②

④

③

⑤

2. 대화를 듣고, 남자의 심경으로 가장 적절한 것을 고르시오.
① bored

② relieved

④ frustrated

③ ashamed

⑤ impressed

3. 대화를 듣고, 두 사람이 대화하고 있는 장소로 가장 적절한 곳을 고르시오.
① park

② hotel

④ museum

③ library

⑤ cafeteria

4. 대화를 듣고, Jenny 가 남자에게 부탁한 일을 고르시오.
① 새로운 소식 알려주기

② 친구 대신 전화 걸어주기
49

③ Jenny 의 남자친구 만나보기

④ 전화를 끊지 않고 기다리기

⑤ David 의 남동생 소개시켜 주기

5. 대화를 듣고, 두 사람의 관계를 가장 잘 나타낸 것을 고르시오.
① coach – athlete

② Father – daughter

④librarian-student

③ teacher – student

⑤ salesclerk - customer

6. 대화를 듣고, 남자가 할 일로 가장 적절한 것을 고르시오.
① to see a doctor

② to buy some tea

③ to drop by a drugstore

④ to go to Carla‘s home

⑤ to visit a herbal garden

7. 다음을 듣고, 남자가 하는 말의 목적으로 가장 적절한 것을 고르시오.
① 적절한 칫솔 선택법을 알리려고

② 치의학의 발전을 소개하려고

③ 양치질의 중요성을 알리려고

④ 칫솔질하는 방법을 소개하려고

⑤ 전동칫솔의 위험성을 경고하려고

8. 대화를 듣고, 남자가 지불한 금액을 고르시오.
① ￦6,000

② ￦6,500

④ ￦7,500

③ ￦7,000

⑤ ￦8,000

50

9. 다음을 듣고, 여자가 설명하고 있는 실험도구를 고르시오.
①

②

③

④

⑤

10. 대화를 듣고, 남자가 여자를 위해 할 일로 가장 적절한 것을 고르시오.
① 학교 안내하기

② 일자리 구해주기

③ 아파트 청소하기

④ 컴퓨터 구입하기

⑤ 부동산 중개업소 들르기

11. 다음 자료를 보면서 대화를 듣고, 여자가 선택할 회사를 고르시오.

12. 다음을 듣고, 방송에서 언급한 내용을 고르시오. [3 점]
① 오존층 파괴의 새로운 원인 발견

② 콩고의 고릴라 출산율 증가

③ 영국 남부 지역의 식량 생산량 감소

④ 영국의 홍수 피해 복구 완료

⑤ 인도 내 휴대전화 이용자 증가

51

13. 다음 그림의 상황에 가장 적절한 대화를 고르시오. [1 점]

①

②

③

④

⑤

14. 대화를 듣고, 남자의 마지막 말에 대한 여자의 응답으로 가장 적절한 것을
고르시오.

Woman:

① Don‘t worry. No news is good news.
② You know, hunger is the best sauce.
③ Cheer up! Every dog has his day.
④ Well, strike while the iron is hot.
⑤ Come on! Haste makes waste.

52

15. 대화를 듣고, 여자의 마지막 말에 대한 남자의 응답으로 가장 적절한 것을
고르시오.

Man:

① Thanks. How nice you are!
② Turn right at the second corner.
③ What a fancy restaurant this is!
④ I am going to reserve two seats.
⑤ I‘m sorry that you made a mistake again.

16. 대화를 듣고, 남자의 마지막 말에 대한 여자의 응답으로 가장 적절한 것을
고르시오.

Woman:

① Keeping promise is important.
② I haven‘t visited my elementary school.
③ What do you say to meeting her tonight?
④ We email each other several times a week.
⑤ We‘ve seen each other somewhere before, right?

53

17. 다음 상황을 듣고, Jim 이 할 말로 가장 적절한 것을 고르시오.
Jim: Excuse me,

① are you in line?
② show me your ticket.
③ where is the entrance?
④ two tickets for 2:30, please.
⑤ could you tell me which is better?

54

Appendix A-2
Test Version: II5i

1. 대화를 듣고, 남자가 구입할 물건을 고르시오. [1 점]

①

②

④

③

⑤

2. 대화를 듣고, 여자의 심정으로 가장 적절한 것을 고르시오.
① bored

② angry

③ excited

④ scared

⑤ jealous

3. 다음을 듣고, 무엇에 관한 설명인지 고르시오.
① ruler

② scale

③ camera

④ whistle

⑤ stopwatch

4. 대화를 듣고, 남자가 할 일로 가장 적절한 것을 고르시오.
① 과제 제출하기

② 만화책 반납하기

④ 선생님께 사과 드리기

⑤ 친구의 생일 선물 사기

55

③ 부모님께 전화 드리기

5. 대화를 듣고, 남자가 지불할 금액을 고르시오.
① $8

② $10

③ $16

④ $20

⑤ $30

6. 다음을 듣고, 여자가 하는 말의 목적으로 가장 적절한 것을 고르시오.
① 상담 교사를 소개하려고

② 온라인 상담을 권장하려고

③ 식당 이용 시간을 알리려고

④ 상담 신청 방법을 안내하려고

➄ 수강 신청 기간을 공지하려고

7. 대화를 듣고, 여자가 남자에게 부탁한 일로 가장 적절한 것을 고르시오.
① to buy coffee for her

② to treat her to dinner

③ to clean the refrigerator

④ to wake her up in the morning

⑤ to study together for an exam

8. 대화를 듣고, 두 사람의 관계를 가장 잘 나타낸 것을 고르시오.
① 은행 경비원 - 고객

② 교통 경찰관 – 운전자

④ 기숙사 관리인 - 학생

⑤ 부동산 중개인 - 세입자

③ 우편배달부 - 거주자

9. 대화를 듣고, 두 사람이 대화하고 있는 장소로 가장 적절한 곳을 고르시오.
① in a car

② in a subway

④ in an elevator

⑤ in a parking lot

56

③ in a theater

10. 대화를 듣고, 여자가 남자를 위해 할 일로 가장 적절한 것을 고르시오.
① CD 구입해 주기

② 보고서 작성 도와주기

③ 노트북 컴퓨터 빌려주기

④ 컴퓨터 프로그램 설치해 주기

⑤ CD 에 자료 저장 방법 가르쳐주기

11. 다음 TV 편성표를 보면서 대화를 듣고, 남자가 시청하게 될 프로그램을
고르시오. [3 점]

12. Gloria Hotel 에 관한 다음 내용을 듣고, 일치하지 않는 것을 고르시오.

① 시내 중심에 위치해 있다.
② 전망 좋은 방이 구비되어 있다.
③ 디럭스 룸에는 새 가구가 비치되어 있다.
④스낵바에서는 음료수를 무료로 제공한다.
⑤ 실내 수영장은 밤 10 시까지 개방한다.

57

13. 다음 그림의 상황에 가장 적절한 대화를 고르시오.

①

②

③

④

⑤

14. 대화를 듣고, 여자의 마지막 말에 대한 남자의 응답으로 가장 적절한 것을
고르시오.

Man:

① I hope no one was hurt.
② Really? I‘ll try it right now.
③ I don‘t like to eat cucumber.
④ Then, apply this cream to your face.
⑤ You shouldn‘t scrub your face with a sponge.

58

15. 대화를 듣고, 남자의 마지막 말에 대한 여자의 응답으로 가장 적절한 것을
고르시오.

Woman:
① Yes. If it doesn‘t rain, it will be serious.
② Right. I hope that the rain will stop soon.
③ I agree. These plants grow well without rain.
④ Don‘t worry. We have enough rainfall this year.
⑤ Sounds good. I enjoy the sunny days these days.

16. 대화를 듣고, 여자의 마지막 말에 대한 남자의 응답으로 가장 적절한 것을
고르시오.

Man:

① Right, I had better see a doctor now.
② Thanks, but you‘d better take it easy.
③ Wow, that‘s wonderful news for me.
④ Well, don‘t overdo it from the beginning.
⑤ Really? It‘s been a week since I started dieting.

59

17. 다음 상황 설명을 듣고, Judy 가 아버지에게 할 말로 가장 적절한 것을 고르시오.

Judy:

① What scholarship am I going to win?
② Why don‘t you get advice from mother?
③ How can I get admitted to the university?
④ You look a bit down today. What‘s the matter?
⑤ Which university do you think is better for me?

60

Appendix A-3
Test Version: III5i

1. 대화를 듣고, 남자가 구입할 가습기를 고르시오. [1점]

①

②

④

③

⑤

2. 대화를 듣고, 여자의 심경 변화로 가장 적절한 것을 고르시오.
① upset → relieved

② nervous → disappointed

④ indifferent → interested

③ amused → frightened

⑤ thankful → dissatisfied

3. 다음을 듣고, 무엇에 관한 설명인지 고르시오.
① 전광판

② 신호등

③ 가로등

④ 횡단보도

⑤ 도로 표지판

4. 대화를 듣고, 여자가 남자를 위해 할 일로 가장 적절한 것을 고르시오.
① 옷 수선하기

② 셔츠 찾아오기

61

③ 친구 마중 나가기

④ 세탁물 맡기기

⑤ 출장 일정 조정하기

5. 대화를 듣고, 남자가 여자에게 지불할 금액을 고르시오.
① $20

② $60

③ $80

④ $100

⑤ $120

6. 다음을 듣고, 여자가 하는 말의 목적으로 가장 적절한 것을 고르시오.
① 자연보호 활동 참여를 권장하려고
② 동굴 탐사 장비 사용방법을 설명하려고
③ 비상상황 시 응급조치 방법을 소개하려고
④ 동굴 탐사를 위한 준비 사항을 안내하려고
⑤ 비상용품 구입 시 고려할 사항을 알려주려고

7. 대화를 듣고, 여자가 남자에게 부탁한 일을 고르시오.
① to print out pamphlets

② to schedule the events

③ to make the guest list

④to set the festival stage

⑤ to design the invitation card

8. 대화를 듣고, 두 사람의 관계를 가장 잘 나타낸 것을 고르시오.
① 승무원 - 승객

② 수리공 – 고객

④ 안내원 – 관람객

⑤ 모델 - 사진 작가

62

③ 관리인 - 입주민

9. 대화를 듣고, 두 사람이 대화하고 있는 장소를 고르시오.
① airport

② art gallery

④ restaurant

③ flower shop

⑤ amusement park

10. 대화를 듣고, 남자가 할 일로 가장 적절한 것을 고르시오.
① 방 청소하기

② 숙소 예약하기

④ 휴가 계획 세우기

③ 방학 숙제하기

⑤ MP3 플레이어 구입하기

11. 차트를 보면서 대화를 듣고, 대화의 내용과 일치하지 않는 것을 고르시오. [3점]

12. Pets & People에 관한 다음 내용을 듣고, 일치하지 않는 것을 고르시오.
① 위험에 처한 애완동물을 구호하는 단체이다.
② 집 없는 개와 고양이에게 새 주인을 찾아준다.

63

③ 회원이 되려는 사람으로부터 가입비를 받는다.
④ 개와 고양이의 입양비를 다르게 받는다.
⑤ 위험에 처한 애완동물을 돕는데 입양비를 사용한다.

13. 그림의 상황에 가장 적절한 대화를 고르시오.

①

②

③

④

⑤

14. 대화를 듣고, 남자의 마지막 말에 대한 여자의 응답으로 가장 적절한 것을
고르시오.

Woman:
① Would you follow the yellow line?
② You can write better by practicing.
③ Take the three o‘clock bus at the station.
④ Why don‘t you send your painting to a contest?
⑤ Close your eyes and imagine anything about the topic.

64

15. 대화를 듣고, 여자의 마지막 말에 대한 남자의 응답으로 가장 적절한 것을
고르시오.

Man:
① Calm down. You can get a refund.
② Keep an eye on my bag for a minute.
③ You can use a locker over there while shopping.
④ Would you start to work in customer‘s service?
⑤ Where can I buy a bag in this department store?

16. 대화를 듣고, 남자의 마지막 말에 대한 여자의 응답으로 가장 적절한 것을
고르시오.

Woman:
① Try not to watch TV for 24 hours.
② There‘s something strange in my food.
③ That‘s why we have some pizza and chicken.
④ Think about those poor children you can help.
⑤ I‘ve finished a report about the starving children.

65

17. 다음 상황 설명을 듣고, Matthew가 Tiffany에게 할 말로 가장 적절한 것을
고르시오.

Matthew:
① Tell your parents the truth and they‘ll help you.
② Don‘t use your parents‘ phones any more.
③ Sorry. You have the wrong number.
④ You‘d better go to the repair shop.
⑤ That number is not in service now.

66

Appendix B-1
Script: Test (I5i)

W: female voice M: male voice
1. W:How may I help you, sir?
M:I‘m looking for a mobile for my 3-month-old baby.
W:Do you have anything in mind?
M:Not really.
W: Then what do you think of ‗Starry night‘? Your baby will enjoy stars and moons.
M:It‘s cute. But, the mobile with a baseball over there looks better.
W: Which one? You mean the one with four cute bears?
M:Not that one. That looks too complicated for a young baby.
W:Then you like this simple baseball mobile with bats, gloves, and a ball.
M:You got it.
W:Great! It's machine washable. It‘s only $30.
M:Sounds great. I‘ll buy it.
(10 seconds)

2. [telephone rings]
W:Asia-Pacific Airways. Can I help you?
M:Yes. I need a flight from Seoul to Sydney on Tuesday.
W:Let me see. Yes. We have an 8:30 flight in the evening.
M:Eight thirty! What‘s the check-in time?

67

W:One hour earlier than your flight time. Will you take that?
M:No. I won‘t get to the airport in time. When will the next flight leave?
W: There won‘t be another direct flight on Tuesday. There will be one on Wednesday at the
same time.
M: Then can I make it to the business meeting in Sydney? It is at 9 o‘clock on Thursday.
W: [pause] I‘m sorry, but I don‘t think you can make it on time.
M: Really? Then, what am I going to do?
W:I‘m afraid you have no choice right now.
(10 seconds)

3. W: Bill, what would you like to have for dinner?
M: I don‘t have much time, so I just want to eat a hamburger at the cafeteria. What about you?
W: I‘ve heard there is a nice Italian restaurant near here. Can‘t we go there?
M: I‘d love to, but I should finish this report here before they close. I need this book.
W: Why don‘t you just borrow it and finish your report after a nice dinner?
M: I can‘t. I‘ve already borrowed all that I can.
W: Then let me check it out for you.
M: That‘s a great idea. Thank you.
(10 seconds)

4. W: David! How have you been lately?
M: Not bad. And you?
W: Much the same, except I do have some big news.

68

M: Big news? Come on! I‘m dying to hear it.
W: Hold on now... [pause] I had a blind date a couple of weeks ago.
M: Jenny! You said that was not your style. I can‘t believe it.
W: That was then and this is now.
M: This is all news to me. What is he like?
W: He seems nice, but I‘m not sure.
M: You aren‘t?
W: Not yet. Meet him for me and tell me about him, will you?
(10 seconds)

5. M: May I help you, ma‘am?
W: Yes, I am looking for exercise equipment that I can use easily.
M: We have dumbbells. Women easily use them anywhere. Besides, the price is reasonable.
W: But I don‘t know how to use them. Is it simple?
M: Sure. Let me show you. Hold your upper body straight. And swing the dumbbells like this
with your elbows close to your sides.
W: Not that difficult. I think I can do that.
M: There are a wide range of weights but as a beginner you should use one kilogram dumbbells.
W: OK. I will take them.
(10 seconds)

6. W: So, how are things going, Steve?
M: Well, to be honest, Carla, I‘ve got a cold.

69

W: Are you OK now?
M: Not really. I‘m worried because I‘m scheduled to hand in a report by Friday.
W: Don‘t worry. You are going to get better in no time.
M: Well, I took some medicine, but it didn‘t seem to help.
W: Listen, forget about that medicine! My mom‘s herbal medicine will get rid of your cold.
M: Oh, Carla. You are so kind but...[pause] no thanks.
W: Come on! You‘ll be up and dancing around soon.
M: OK. I‘ll give it a try.
W: Great. My mom is at home right now. Let‘s go get some herbal tea.
M: OK. I will.
(10 seconds)

7. M: Any toothbrush that you choose should have a soft brush and should be comfortable in
your hand. You can choose between a manual and an electric toothbrush depending on your
lifestyle and situation. If you prefer a manual toothbrush, make sure that the tip is small enough
to reach all areas of your mouth easily. People with arm and shoulder problems might prefer an
electric toothbrush for convenience as well as comfort. If purchasing an electric toothbrush, be
sure that the head is soft and the brushes move in a back and forth motion.
(10 seconds)

8. W: Mr. Brown, here are your pictures.
M: Wow, these pictures are great. As I ordered, you enlarged the picture of me alone. I like this.
W: I‘m happy you are satisfied.

70

M: How much is a small-sized one?
W: It‘s 200 won.
M: I received 30 small-sized pictures and one large picture. What about the large one?
W: Originally it is 1,000 won, but I will give you a 50% discount.
M: Thank you. Here is 10,000 won.
W: Here‘s your change, 3,500 won.
(10 seconds)

9. W: Attention, please. Today is the first day at the lab. So I‘m going to explain how to use the
tools on the table. The first thing on the left can be used to pick up and release small amounts of
liquid like water or alcohol. I‘ll show you how to use this. Squeeze the round bulb at one end of
the stick and put the other end into the water like this. Release the bulb and the water will move
up into it. If you want the water to go out, you just have to squeeze the bulb again. Understand?
(10 seconds)

10. M: How‘s your new apartment working out, Ann?
W: Well, I like the apartment, but it‘s too far from campus. I want to look for a new place.
M: Then did you go to a real estate agency?
W: No, you know, I have a final exam this Friday. Would you help me?
M: Sure. What kind of place are you looking for?
W: Above all, I want an apartment within walking distance to school.
M: Okay, anything else?
W: Uh, some place under $200 a month, including utilities, if possible.

71

M: I‘ll drop by a real estate agency for you on my way to class today.
W: Do you think there are vacancies around the campus?
M: I saw the sign they have many available apartments.
(10 seconds)

11. W: November 20th is my daughter‘s first birthday.
M: Congratulations! Are you going to hold a party?
W: Yes, I will. I want to give her a memorable party.
M: Are you going to prepare it by yourself?
W: No, I‘m considering these party planning companies.
M: How much money are you planning to spend?
W: Um... Less than $1,000.
M: How long do you want your party to last?
W: I need 4 hours. And I want a family picture to remember her birthday.
M: Well, this company looks good for your party.
(10 seconds)

12.

[News Signal]

M: This is Robert Brown with the Opening World. Here are today‘s headlines:
Ozone has a stronger climate effect than we have thought until now.
Gorillas are in danger of extinction because of hunting in Congo.
Serious floods hit southern England suddenly.
Mobile phone users are increasing sharply in India.

72

That‘s all for now. I‘ll be back in a few minutes.
[News Music]
(10 seconds)

13. ① W: Where is everybody?
M: They are all in the living room watching TV.
② W: Is this seat taken?
M: No, go ahead.
③ W: What can I do for you?
M: I‘d like to open a savings account.
④ W: How many people in your party?
M: There are three of us.
⑤ W: Excuse me, how can I get to City Hall?
M: Get on this bus. It is three stops away.
(10 seconds)

14. W: Time to wake up, James. Wake up!
M: What time is it?
W: It‘s 7:30.
M: 7:30! Why didn‘t you wake me up at 7? I‘m late.
W: I‘m sorry I didn‘t know you had to wake up early today.
M: Today I have to give a presentation in class.

73

W: Why didn‘t you set the alarm, if you had to wake up early?
M: I forgot to. I‘m in a hurry. Mom, where is my USB?
W: On your desk. James, get dressed first and have your breakfast.
M: I don‘t have time to eat. Oh, I don‘t know what to wear. I‘m really late, mom.
W:
(15 seconds)

15. [telephone rings]
W: Hello, John?
M: Oh, Jane. How are you?
W: Why are you still at home?
M: What do you mean?
W: We promised to meet at six today. Don‘t you remember?
M: Oh, no! I thought we would meet each other on Friday.
W: No, we planned to meet on Thursday. I am waiting for you at Tim‘s restaurant.
M: I‘m really sorry. Please wait until I get there. I will go as soon as possible.
W: I see. I‘ll wait for you. Drive safely.
M:
(15 seconds)

16. M:The flowers on your desk are very nice. Who sent them to you?
W:Patricia, my best friend. She is in Japan now.
M:She must be the friend that you‘ve told me about before.

74

W:Right. We actually grew up together.
M:So, how long have you known each other?
W:Let me see. I guess we‘ve known each other since elementary school. We‘ve been friends for
almost 15 years!
M:That‘s a long time.
W:Yeah, in some ways I feel like she‘s my sister.
M:How often do you keep in touch with her?
W:________________________________________________
(15 seconds)

17. W: Jim and his friend want to see the latest blockbuster. They promise to meet in front of the
box office after school. Jim gets to the theater and looks around. But his friend doesn't come yet.
Jim tries to buy tickets before his friend comes. It is such a nice movie that the theater is crowded.
He finds a lady who seems to be waiting to buy a ticket. Jim wants to stand behind her after
checking if she is standing there to buy one. In this situation, what would Jim most likely to say
to the lady?
Jim: Excuse me,
(15 seconds)

75

Appendix B-2
Script: Test (II5i)

W: female voice M: male voice
1. W: Tom, why were you late this morning?
M: My alarm clock broke again.
W: Again? Why don't you buy a new one? Here's a shop that sells clocks.
M: Okay. Let's look around to see if we can find the right one.
W: Oh, the pyramid-shaped one over there looks exotic.
M: Well, it is not my style. I like round clocks.
W: Then, how about the one with a dog picture on it? You can hear the barking sound of a dog.
M: But I think it is for children. The apple-shaped one next to it is childish as well.
W: Hmm.... What about the one that looks like a computer mouse?
M: It's not bad. But I like sports, so I'd like to buy the sporty one, instead.
W: Great! Get it. Just don't kick it around in the playground.

2. M: What are you doing, honey?
W: I'm looking up a telephone number. Can you help me out?
M: Sure. What number are you looking for?
W: The Consumer Protection Union.

76

M: The Consumer Protection Union? Why?
W: You know, I bought a new computer game for Mike on the Internet.
M: Yeah. You told me he was very happy with it. What's wrong?
W: Every time he tries to play it, there is a new problem.
M: Then, why don't you exchange it for another one?
W: I tried to several times, but the company won't take it back. How could such a big company
do that?

3. M: This is used to measure time in sports like swimming and track-and-field events. This can
also be used in laboratory experiments. This has buttons that you press at the beginning and end
of an event, so you can measure exactly how long it takes. Nowadays in schools, we see some
students using this while studying to check if they can solve the questions within the given time.

4. W: Kevin, you don't look good. What's the matter?
M: Ms. Park caught me reading a comic book in her class.
W: Oh, boy. Did she scold you?
M: No, she just told me not to do it again.
W: Then, why are you so blue?
M: Because I'm worried she might be disappointed with me. I was so embarrassed that I
couldn't say anything.

77

W: Why don't you apologize to her now?
M: Now? Don't you think it's too late?
W: Better late than never.
M: OK. I'll take your advice.

5. M: Wow, there are a lot of neckties.
W: Yeah. Take your time and call me if you need any help.
M: Um.... This striped one looks good.
W: Yes. This is very popular for young men.
M: I like it. How much is it?
W: It's 10 dollars, but you can get 20% off.
M: Really? That's great.
W: Today's our special bargain day. If you buy two, you get one free.
M: Sounds good, but I don't need that many. I only need one. I'll take it.
W: All right. One moment, please.

6. W: Hello, students. I'd like to inform you of the way to get counseling. First of all, you have to
make an appointment with our staff member, Ms. Rey. She will give you the next available
appointment. We encourage you to make an appointment before or after school, or during lunch

78

time. During lunch time, you don't have to wait long, so it is best to utilize that time. Please don't
hesitate to ask for counseling service. We'll try hard to be available and accessible to you!

7.M: Is there any juice in the refrigerator?
W: No. You drank it all last night.
M: Then I'll go to the supermarket and buy some.
W: Wait. Could you do me a favor?
M: Sure, what is it?
W: I have an exam tomorrow, and I need some coffee to stay up all night. Will you buy me
coffee?
M: Is that a good idea? Caffeine in coffee isn't healthy for you.
W: I know. But without it, I can't concentrate on studying.
M: You'd better not drink too much coffee, though.
W: Yeah, but I really need a cup of coffee tonight.
M: OK. But just this time. I'll be right back.

8. M: Hi, what can I do for you?
W: Hi, I'm a student of this dormitory. I have a problem with my mailbox.
M: Tell me the details.
W: I lost my mailbox key, so I haven't been able to get my mail. Do you have a spare key?

79

M: Of course I do, but I'm not supposed to give it out. These days we've had a lot of problems
with mailbox theft in the dormitory.
W: I know what you mean. But I need to get an important piece of mail.
M: Um.... I see. Do you have your ID card?
W: Yes. Here is my student ID.
M: Okay. This time I'll help you.
W: Thank you!

9. W: Mark, we were supposed to turn left.
M: Oh, no! I'll turn left at the next corner.
W: But, I'm afraid we'll get lost.
M: Don't worry. I have a good sense of direction.
W: Don't you remember when we spent one hour finding the cinema last weekend?
M: Please forget that ever happened.
W: Mark! Why didn't you stop? We could get a ticket.
M: Oops! I didn't even see the stop sign.
W: You'd better slow down a little bit.

10. M: Say, Monica, could you do me a favor?
W: Sure, what would you like?

80

M: I'm typing my paper on the computer right now.
W: And....?
M: I'm afraid I will lose my data.
W: Oh, do you want to know how to back up the data on a CD?
M: Yes, you're right. I just want to do it in case I should mess up my data.
W: Better to be safe than sorry. I can let you know how to do it right now. Do you have a new
CD?
M: Of course, I have one on my desk.
W: Bring your CD and I'll show you how to do it. It's really a piece of cake.
M: Thanks. I'll be back in a minute.

11. M: Wow, finally the exam is over. And it's Sunday. I'll relax and watch TV.
W: Frank, even though the exam is over, you have to keep studying.
M: I know, mom. But I want to watch TV today.
W: Then why don't you watch English Grammar on channel 5?
M: Please.... Don't say that. I've been looking forward to watching sports.
W: I see. What sports program do you want to watch?
M: Football on channel 7 at 7:00 and baseball on channel 6 at 9:30.
W: But I want to watch SBC Weekend News on channel 6.

81

M: Do you? Hmm.... Then, I'll study while you're watching the news and I'll watch what I want
at 9:30.
W: Now that's a good plan!

12. M: Are you looking for an ideal place for your vacation? Gloria Hotel is the answer. We
invite you to experience our wonderful hotel. Our hotel is located in the city center, within a
short walk of major attractions. We have a variety of rooms with nice views. We especially
recommend deluxe rooms which have been recently renovated. They have new furniture and
wallpaper. For all our guests we offer free drinks at the snack bar. You can also enjoy 24-hour
access to the indoor swimming pool. We look forward to seeing you soon. Thank you.

13. ① W: Can you tell me where the ladies' room is?
M: Yes, it's right down the hall. Just look for the sign.
② W: There's a park across the street.
M: Let's cross the street. The light is green.
③ W: You look busy. What are you doing here?
M: I'm gardening. It is my favorite hobby.
④ W: Look at the signs. We have to take the road on the right.
M: Right. I guess it'll take only half an hour to get to the falls.
⑤ W: It's getting colder and colder.

82

M: Yes, and all the leaves have fallen off the trees.

14. W: David, what's the matter with your face? You are so sunburned!
M: I know! I went hiking yesterday, and I walked for five hours in the sun without a hat.
W: Oh, dear! Does it hurt?
M: Yeah, it really does. I feel like I'm on fire.
W: Have you tried cucumber on your skin?
M: Cucumber? Does it really work?
W: Of course, it does. It helps to cool down your skin.
M: Does it? What should I do with the cucumber?
W: You just cut up a cucumber, and you put the slices on your skin. That's it.
M: _________________________________________

15. M: It's too dry these days.
W: You can say that again.
M: I think a drought has set in. It hasn't rained for months.
W: But didn't it rain last month?
M: It did rain last month, but a news reporter said the rainfall was only half the monthly
average.
W: Oh, that bad?

83

M: Yeah, it's really bad. I'm very worried.
W: Come to think of it, the grass has almost dried out.
M: It's getting worse every day.
W: __________________________________________

16. M: Hi, Jane!
W: Hi, Michael. I heard you were sick last week. Are you okay?
M: I'm getting better.
W: I'm glad you're okay.
M: Well, sometimes I still don't feel very well.
W: Really? Why don't you see a doctor?
M: I went to the hospital this morning. My doctor advised me to start exercising.
W: That's a good idea. You need to do some exercise.
M: I agree, but I don't know what kind of exercise I should do.
W: How about working out at a fitness center?
M: A fitness center? Can you recommend one?
W: I heard the fitness center next to my office is giving a 30 percent discount this month.
M: __________________________________________

84

17. W: Judy was very happy to know that she was accepted to a university she wanted to enter.
But this morning she got another letter of admission from a university guaranteeing her a
scholarship. Now she has to select one university. If she chooses the first university, she can
major in politics, which she really wants to study. If she chooses the second one, she won't be
able to major in politics, but she can receive the scholarship. She has to make a decision by this
weekend. So she has decided to get advice from her father. In this situation, what would Judy
most likely say to her father?

85

Appendix B-3
Script: Test (III5i)

W: female voice M: male voice
1. W: Hey, they‘re having a sale today.
M: That‘s nice. I‘m thinking of buying a humidifier. It‘s very dry these days.
W: That‘s right. The pig shaped one and the house shaped one are both cute.
M: Yes, but I want a simpler one.
W: Then, what do you think about this round one?
M: Well, that‘s too ordinary. What about this square one?
W: Which one? The one with flowers on it?
M: No, I mean the one with a mouse on it.
W: Yeah, that is cute and simple.
M: Yes. I‘ll get that one.

2. M: Welcome to Grand Cinema. How can I help you?
W: I reserved four seats for the two o‘clock movie by phone.
M: Your membership number, please.
W: It‘s 3826.
M: Please, wait a moment. [keyboard sound] I‘m sorry, but we have no reservation under that
number.
W: No way! I reserved it just yesterday. Can you check one more time?
M: Okay. [keyboard sound] Sorry. Are you sure you used that membership number?

86

W: Definitely. [pause] Oh, wait. Will you check my husband‘s membership number? It‘s 3825.
M: Sure, ma‘am. [keyboard sound] Yes, we have four seats reserved under that number.
W: Oh, that‘s great. I‘m glad to hear that.

3. M: These are signaling devices used to control the flow of traffic. You can find them at the
places where roads meet. You can find them in most cities around the world. They usually
consist of a set of three lights. Red indicates ―stop,‖ yellow indicates ―caution‖ and green
indicates ―go.‖ They help drivers to avoid accidents and pedestrians to cross the street safely.
You should always pay attention to these when using the roads.

4. W: Honey, are you going out now?
M: Yes. Are you staying home this evening? My laundry will be delivered around 8:00 P.M.
W: Oh, no. I‘m going out to see my friend, Jina.
M: Oh, Jina? You haven‘t seen her since she got back from England, right?
W: Right. She came back to Korea last weekend.
M: Sounds great! [pause] But what am I going to do with my shirts?
W: When do you need them?
M: Tomorrow morning. You know, I‘m leaving for my business trip.
W: Okay. Then I‘ll drop by the dry cleaner‘s and pick them up in the afternoon.
M: Thank you, honey.

5. M: Hi, Amy. I see you‘re having a garage sale.
W: Hi, Robert. Look around for something you need.

87

M: I see. Hmm... This bike looks almost new. How much is it?
W: I can give it to you for $80.
M: $80? That‘s a bit expensive.
W: But I bought it just two months ago and haven‘t ridden it much at all. It was $120, when it
was new.
M: Well, I‘m not sure...
W: If you buy this bike, I‘ll give you this basket for free. I bought it for $20.
M: Well, that‘s not bad. I‘ll take it.
W: Thank you.

6. W: May I have your attention, please? Tomorrow morning, we‘re going to explore a cave.
Please remember the following: first, don‘t forget to have a good breakfast. It will give you
enough energy to explore the cave. Second, wear a long-sleeved T-shirt, long pants and strong
boots to protect your body. Third, you will need a helmet and a good flashlight with spare
batteries. Exploring caves can be a wonderful experience. But, if you‘re not well prepared, it can
be dangerous.

7. M: Hi. How are things going?
W: I‘ve been so busy recently.
M: Yeah, I know. I heard that you‘re preparing for a school festival.
W: Yes. I have to make pamphlets, invite guests and schedule the events.
M: Wow, lots of things to do! Oh, I can help with the pamphlets.
W: Thank you, but I‘ve already asked someone else to help with that.

88

M: That‘s great.
W: But I haven‘t designed the invitation card. Could you do that for me?
M: Sure. When do you need it by?
W: As soon as possible. Thank you.

8. W: Welcome to Energy Factory. I‘d be happy to show you our exhibitions today.
M: How many exhibits do you have in Energy Factory?
W: We have five exhibits. And this exhibit is the most popular in this exhibition.
M: Wow, it‘s full of wonders.
W: Yes. This shows how electricity is used in our daily life.
M: Interesting! Oh, what‘s this?
W: This model shows how water produces electricity. Would you like to try this?
M: Sure, I‘d love to. How can I do it?
W: Pump the water into the tank until it is full.
M: Amazing! I powered the TV by pumping the water.
W: Good job. Now let‘s move on to the next hall.

9. M: Good afternoon, ma‘am. What can I do for you?
W: Good afternoon. I brought an empty flowerpot.
M: So, you want to plant something in it?
W: Yes. But I don‘t know what to plant.
M: We have various kinds of flowers and trees. Would you like to look around?
W: Wow, they‘re beautiful. Well... Could you recommend one?

89

M: How about sansevieria or English ivy? They‘re good for keeping the air clean.
W: Sansevieria sounds good to me. How much is it?
M: Ten dollars. It‘ll take about half an hour to plant it in your pot.
W: Okay. I‘ll be back after lunch.

10. M: Mom, you promised to buy me an MP3 player.
W: What? Why should I buy it for you?
M: How could you forget? You promised to buy it this vacation.
W: Yes. But you promised to clean your room three times a week. You didn‘t keep your promise.
M: Yeah. You‘re right. But could I have one more chance?
W: Well... [pause] Okay. But it‘s the last chance for you to earn an MP3 player.
M: I‘ll keep my room tidy and clean this time.
W: I‘ll watch you for a month and then decide.
M: Thanks for giving me another chance. I love you, Mom.
W: Why don‘t you go upstairs and start right now?
M: Okay. I‘ll sweep and wipe my room until it‘s shiny.

11. W: Good morning, Mr. Brown. Take a seat. What seems to be the problem?
M: Good morning. I have a pain in my stomach.
W: Okay. Do you have a fever?
M: A little. I haven‘t eaten anything since yesterday.
W: How long have you had the symptoms?
M: For two days.

90

W: Are you taking any medicine now?
M: Yes. I‘m taking aspirin.
W: Okay. Are you allergic to any medicine?
M: No, I‘m not.
W: I see. I think you have the flu. Take the medicine I prescribe and take a rest.
M: Okay. Thank you.

12. M: You should become a member of Pets & People today. Pets & People is an organization
to assist pets in danger. We help homeless cats and dogs and keep them until we find someone to
take care of them. Anyone who loves cats or dogs can become a member for free. As a member
of Pets & People, you can help take care of the dogs and cats or even adopt them from us. The
adoption fee is $40 for a cat and $50 for a dog. This money will be used to save other pets still in
danger.

13. ① M: Would you give me a hand?
W: Yes. I can help you carry your luggage.
② M: Oh, hurry up. The elevator is going up.
W: Thank you for holding the door.
③ M: Excuse me. Something‘s falling out of your bag.
W: Oh, I forgot to zip it up. Thanks.
④ M: Oh, I‘m sorry. I stepped on your foot.
W: That‘s okay. The elevator is so crowded.
⑤ M: I‘d like to borrow these books.

91

W: Okay. Give me your membership card, please.

14. M: Sarah, how‘s it going with your painting?
W: I‘ve just finished the sketch. How about you?
M: Don‘t ask me. I didn‘t even start.
W: What‘s the matter? Ms. Smith told us to finish by three o‘clock.
M: Yeah, I know. But I don‘t have any good ideas about today‘s topic.
W: You know, painting is just putting your own ideas on canvas.
M: Putting my ideas on canvas?
W: That‘s right. You can create anything from your imagination.
M: Then, how can I put my ideas on canvas?
W:

15. M: Excuse me, ma‘am. You can‘t have that bag while shopping.
W: What‘s wrong?
M: I‘m very sorry. But customers are not allowed to shop with a big bag.
W: Oh, I didn‘t know that. But if that‘s the case, you should at least notify customers about it.
M: I think you missed the sign. It says that no one may enter this place with such a big bag.
W: Oh, my... I‘m sorry. I didn‘t see it.
M: That‘s all right, ma‘am.
W: Then, what shall I do with my bag?
M:

92

16. W: Michael, I‘m excited to be participating in the 24 Hour Fast with you.
M: Me, too. But eating nothing for 24 hours is not easy.
W: You‘re right. But it will help starving children.
M: It‘s much harder than I expected.
W: Right. After this experience, you‘ll be proud of yourself.
M: Yes, I will. But I‘m so hungry now.
W: You‘ve been doing a good job until now.
M: Oh, dear! I can‘t stop thinking of food.
W:

17. W: Tiffany is happy to get a brand new cell phone. She likes to talk on it. She enjoys having
long conversations over the phone. She doesn‘t care about the cell phone charge until she checks
her phone bill. When she finally does, she is worried about it. It‘s too much. So she asks her
friend, Matthew, what to do. He wants to suggest that she should explain the situation to her
parents, and get help from them. In this situation, what would Matthew most likely say to her?
Matthew:

93

Appendix C
Survey Questionnaire

1. Which multiple-choice item format do you most prefer among the three-, four-, and fiveoption items?

2. Why do you prefer it?

3. What do you think about a three-option item format?
Most dislike
1

Most like
2

3

4

5

6

7

4. Can you think of any other reasons why the three-option-item format would be good?

5. Can you think of any other reasons why the three-option-item format would be bad?

6. If the CSAT were to be administered with a three-option item format, do you think you would
like it?

7. Why do agree or why do you disagree with the three-option-item format on the CSAT?

Note. The original questionnaire is written in Korean, the native language of the participants.

94

References

References

Brown, J. D. (2001). Using surveys in language programs. Cambridge language teaching library.
Cambridge, U.K: Cambridge University Press.
Brown, J. D. (2005). Testing in language programs. New York, NY: McGraw-Hill.
Bruno, J. E. & Dirkzwagber, A. (1995). Determining the optimal number of alternatives to a
multiple choice test item: an Information theoretic perspective. Educational and
Psychological Measurement, 55, 959-966.
Budescu, D. V., & Nevo, B. (1985). Optimal number of options: An investigation of the
assumption of proportionality. Journal of Educational Measurement, 22, 183–196.
Card, J. (2005, November, 30). Life and death exams in South Korea. Asia times. Retrieved on
Mar. 23, 2010 from http://www.atimes.com/atimes/Korea/GK30Dg01.html
Cizek, G. J., & Rachor, R. E. (1995, April). Nonfunctioning options: A closer look. Paper
presented at the annual meeting of the American Educational Research Association, San
Francisco, CA.
Choi, I-C. (2008). The impact of EFL testing on EFL education in Korea Language Testing,
25(1), 39-62.
Costin, F. (1970). The optimal number of alternatives in multiple-choice achievement tests:
Some empirical evidence for a mathematical proof. Educational and Psychology
Measurement, 30, 353-358.
Costin, F. (1972). Three-choice versus four-choice items: implications for reliability and validity
of objective achievement tests. Educational and Psychological Measurement, 32, 10351038.
Costin, F. (1976). Difficulty and homogeneity of three-choice versus four-choice objective test
items when matched for content of stem. Teaching of Psychology, 3, 144–145.
Cox, E. P. (1980). The optimal number of response alternatives for a scale: A review. Journal of
Marketing Research, 17, 407-422.
Crehan, K. D., Haladyna, T. M., & Brewer, B. W. (1993). Use of an inclusive option and the
optimal number of options for multiple-choice items. Educational and
Psychological Measurement, 53, 241–247.
Delgado, A. R., & Prieto, G. (1998). Further evidence favoring three-option items in multiplechoice tests. European Journal of Psychological Assessment, 14, 197–201.

96

Dörnyei, Z. (2003). Questionnaires in second language research: Construction, administration,
and processing. Mahwah, N.J: Lawrence Erlbaum Associates.
Ebel, R. L. (1969). Expected reliability as a function of choices per item. Educational and
Psychological Measurement, 29, 565-570.
Green, K., Sax, G., & Michael, W. B. (1982). Validity and reliability of tests having different
numbers of options for students of differing levels of ability. Educational and
Psychological Measurement 42, 239-245.
Grier, J. B. (1976). The optimal number of alternatives at a choice point with travel time
considered. Journal of Mathematical Psychology, 14, 91-97.
Gyeonggi Provincial Office of Education. (2007, November 22) The CSAT English prep-test
[Data file]. Available from EBSi Web site,
http://www.ebsi.co.kr/ebs/ent/enta/retrieveNaarPrdRdrInfo.ebs
Gyeonggi Provincial Office of Education. (2008, November 18) The CSAT English prep-test
[Data file]. Available from EBSi Web site,
http://www.ebsi.co.kr/ebs/ent/enta/retrieveNaarPrdRdrInfo.ebs
Gyeonggi Provincial Office of Education. (2009, November 17) The CSAT English prep-test
[Data file]. Available from EBSi Web site,
http://www.ebsi.co.kr/ebs/ent/enta/retrieveNaarPrdRdrInfo.ebs
Haladyna, T. M. (2004). Developing and validating multiple-choice test items (3rd ed.). Mahwah,
NJ: Erlbaum.
Haladyna, T. M., & Downing, S. M. (1993). How many options is enough for a multiple choice
test item. Educational & Psychology Measurement, 53, 999-1010.
Haladyna, T. M., Downing, S. M., & Rodriguez, M. C. (2002). A review of multiple-choice
item-writing guidelines for classroom assessment. Applied Measurement in Education,
15, 309–334.
Hedrick, W. B., & Cunningham, J. W. (1995). The relationship between wide reading and
listening comprehension of written language. Journal of Reading Behavior, 27(3), 425438.
Hedrick, W. B., & Cunningham, J. W. (2002). Investigating the effect of wide reading on
listening comprehension of written language. Reading Psychology, 23(2), 107-126.
Hogben, D. (1973). The reliability, discrimination and difficulty of word knowledge tests
employing multiple choice items containing three, four or five alternatives. Australian
Journal of Education, 17, 63-68.

97

Kolstad, R. K., Briggs, L. D., & Kolstad, R. A. (1985). Multiple-choice classroom achievement
tests: Performance on items with five vs. three choices. College Student Journal, 19,
427–431.
Landrum, R. E., Cashin, J. R., & Theis, K. S. (1993). More evidence in favor of three-option
multiple-choice tests. Educational and Psychological Measurement, 53, 771–778.
Lee, S. (2005, October, 23). Samsip samilganeui gamgeun suneung chujewuiwoneui bimil. The
hankyoreh. Retrieved on Mar. 23, 2010 from
http://www.hani.co.kr/arti/society/schooling/73560.html
Lord, F. M. (1977). Optimal number of choices per item-A comparison of four approaches.
Journal of Educational Measurement, 14, 33-38.
Lord, F. (1944). Reliability of multiple-choice test as a function of choices per item. Journal of
Educational Psychology, 35, 175-180.
Millman, J., Bishop, H. I., & Ebel, R. (1965). An analysis of test-wiseness. Educational and
Psychological Measurement, 25, 707-726.
Owen, S. V., & Froman, R. D. (1987). What‘s wrong with three-option multiple choice items?
Educational and Psychological Measurement 47, 513-22.
Rodriguez, M. C. (2005). Three options are optimal for multiple-choice items: A meta-analysis
of 80 years of Research. Educational Measurement: Issues & Practice, 24(2), 3-13.
Rogers,W. T., & Harley, D. (1999). An empirical comparison of three- and four-choice items
and tests: Susceptibility to testwiseness and internal consistency reliability. Educational
and Psychological Measurement, 59, 234–247.
Ruch, G.M., & Stoddard, G. D. (1927). Tests and measurements in high school instruction.
Chicago: World Book.
Shizuka, T., Takeuchi, O., Yashima, T., & Yoshizawa, K. (2006). A comparison of three- and
four-option English tests for university entrance selection purposes in Japan. Language
Testing, 23(1), 35-57.
Sidick, J. T., Barrett, G. V., & Doverspike, D. (1994). Three-alternative multiple choice tests: An
attractive option. Personnel Psychology, 47, 829–835.
Straton R. G, & Catts, R. M. (1980). A comparison of two, three, and four-choice item tests
given a fixed total number of choices. Educational and Psychology Measurement, 40,
357-365.
Torabi-Parizi , R., & Campbell, N. J. (1982). Classroom test writing: Effect of item format on
test quality. Elementary School Journal, 83(3), 155-160.

98

Trevisan, M. S., Sax, G., & Michael, W. B. (1991). The effect of the number of options per item
and student ability on test validity and reliability. Educational and Psychological
Measurement, 51, 829-837.
Trevisan, M. S., Sax, G., & Michael, W. B. (1994). Estimating the optimum number of options
per item using an incremental option paradigm. Educational and Psychological
Measurement, 54, 86–91.
Tversky, A. (1964). On the optimal number of alternatives at a choice point. Journal of
Mathematical Psychology, 1, 386-391.
University of Washington (1983). Washington Pre-College Admissions Test Battery. Seattle,
WA: University of Washington, Washington Pre-College Department.

99