a...“

,.<.r;v -
’3‘”..-

 

Jab—nu

 

 

«1

.ﬁ

c:
.

..

z“
r

V‘

4’”

‘ .u

w.
, ‘ ‘ ., JSA
V "R —‘
‘;J:':..£“$J-~'~Z%§ £931.”: “2‘ “a": 5’.”- "
.1. J-

.“ xix-:23}:
&7_'”r ‘ ‘v 27‘“

 

RABIES

illlmlljiljlw \\\”l\\\\°\\\\\\\\\\\\\l

This is to certify that the

dissertation entitled

On the Meaning and Measurement of Test
Appropriateness

presented by

Jose Manuel Cortina

‘s

\v

has been accepted towards fulﬁllment
of the requirements for

Ph . D . degree in Psychology

 

 

Neal Schmitt

 

Major professor

Date 7/12/94

MS U is an Afﬁrmatiw Action/Equal Opportunity Institution 0-12771

 

 

LIBRARY
Michigan State
University

 

 

 

4‘s

‘-

PLACE II RETURN BOXto monthl- chockommm your record.
To AVOID FINES Mum on or More data duo.

DATE DUE DATE DUE DATE DUE

     

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

,_._._—_A#i

 

ON THE MEANING AND MEASUREMENT OF TEST APPROPRIATENESS

By

Jose Manuel Cortina

A DISSERTATION

Submitted to
Michigan State University
in partial fulﬁllment of the requirements
for the degree of

DOCTOR OF PHILOSOPHY

Department of Psychology

1994

ABSTRACT
ON THE MEANING AND MEASUREMENT OF TEST APPROPRIATENESS
By

Jose Manuel Cortina

Recent research has shown that a test can be psychometrically valid and yet be
inappropriate for certain individuals such that the test scores for these individuals cannot
be interpreted as accurately indicating their standing on the construct of interest. This
body of research, however, has been largely statistical in nature, with a focus on indices
of appropriateness such as the 12 index developed by Drasgow, Levine, and their
colleagues. The purpose of the present paper was to examine appropriateness as a
construct, and develop and partially test a model of its determinants based on literature
from educational, social, personality, and quantitative psychology. Speciﬁcally, the
effects of item characteristics, math anxiety, test anxiety, carelessness, and
conscientiousness on statistical knowledge test scores and the 12 index of test
appropriateness were examined in a sample of 165 undergraduate statistics students. The
results showed that item characteristics, math anxiety, carelessness, and the item
characteristic by conscientiousness interaction were signiﬁcantly related to knowledge test
scores while none of the hypothesized predictors of 12 were significantly related to it.

Implications for appropriateness and testing are discussed.

ACKNOWLEDGEMENTS
I’ve been waiting a long time to write my acknowledgements, if for no other

reason than because it would suggest that I have something to acknowledge.

And I do.

It appears that I have ﬁnally managed, over the shrill objections of the
administration, to annex a series of degrees up to and including Doctor of Philosopy from
Michigan State University. How and when this happened, I have no idea, but there are
many people who deserve recognition: accessories after the fact, if you will, and I think
you will.

Thanks Kim for things too numerable to mention, but speciﬁcally for reminding
me that the writing of my Results section was not an insurmountable task. You were
right. Thanks Ron for Sega and Scotch and other friendship-related esoterica. Thanks
Stephen and Dale for not letting triﬂes like work get in the way of beer and golf. Thanks
Mick for reminding me not to take life too terribly seriously. After all, there’s a 50-50
chance that I won’t survive the day anyway. Thanks Sher for friendship unconditional
upon the level of my idiocy. Thanks Stan and Jean (Jan and Stean?) for Biscotti and
access to D’s closet. Thanks Tim and M0 for French wine, artichoke dip, and kindness.
Thanks John for playing up my golf skills in the letter of rec. Thanks Mike, Steve, Dan,
and Kevin for sitting through all of those defense meetings without laughing, at least not
in my presence. Thanks Mary for not getting too upset when I wallpapered your desk.

Thanks Suzy, Greta, and Cheryl for always making life easier. Also, special appearances

by Jeff 8., Sandy L., JoAnn S., Whit, El, Jen, Dennis, J.T., Smitty, Rob, Barker, Dan. W.,
Hatrack, Sandy T., Kara 8., Rick D., Gordon W., and a player to be named later.

Thanks to my family for love and the absence of worry.

And thanks most of all to Neal. There aren’t many people in this ﬁeld or any
other who could have put up with me through a thesis, comps, a dissertation, 818,
Personnel Selection, 818 again, and everything in between (e.g., undocumented book and
journal theft, unannounced interruptions, incessant questions, an overloaded computer
account, typos, etc.). You are not only the ﬁnest mentor I have ever seen or heard about,
you are, as far as I can tell, the best possible mentor, especially for a person with my
particular eccentricities. My career goal, though unattainable, is to be the Academic that

you are.

JMC 6/94

P.S. Special thanks to Ronald Wilson Reagan, who never let me down.

iv

TABLE OF CONTENTS

Page

LIST OF TABLES ............................................ viii
LIST OF FIGURES ............................................ ix
INTRODUCTION .............................................. 1
Test Appropriateness as it has been Studied ....................... 2
Foundations of Test Appropriateness ............................ 6
Sources of Type P Inappropriateness ........................... 12
Acquiescence/Denial ................................. 12

Need for Approval .................................. 13
Extreme Response Set/Central Tendency ................... 13

Test Anxiety ....................................... l4
Cognitive Controls .................................. 16
Response Bias/Test Wiseness ........................... 17
Carelessness/h/Iotivation ............................... 19
Omissiveness ...................................... 20
Section Summary ................................... 21
Sources of Type I Inappropriateness ........................... 21
Test Anxiety by Item Characteristics ...................... 22
Motivation by Item Characteristics ....................... 26
Omissiveness by Item Characteristics ...................... 29

Field Articulation by Item Characteristics ................... 31
Responses Bias by Susceptibility to Bias ................... 32

Test Wiseness by Susceptibility to Wiseness ................. 35

Topic Irrelevant Ability by Topic Irrelevant Item Content ....... 38

Need for Approval by Item Characteristics .................. 41
Section Summary ................................... 44
Measures of Type P Inappropriateness .......................... 46
Acquiescence/Denial and Extreme Response Set/Bias .......... 46

Need for Approval .................................. 47

Test Anxiety ....................................... 48
Cognitive Controls .................................. 48
Response Bias/1‘ est Wiseness ........................... 49
Carelessness ....................................... 50
Section Summary ................................... 50

Measures of Type I Inappropriateness .......................... 51
Non-IRT based Indices of Inappropriateness (U nstandardized) . . . . 51

IRT-based Indices of Inappropriateness (U nstandardized) ........ 55
Standardized, Non-[RT based Indices of Type I Inappropriateness . . 57
Standardized, [RT-based Indices of Type I Inappropriateness ..... 59
Overall Summary ........................................ 64
The Present Study ........................................ 67
METHOD ................................................... 71
Sample ................................................ 71
Design ................................................ 71
Measures .............................................. 72
Conscientiousness ................................... 72
Math Anxiety ...................................... 72
Test Anxiety ....................................... 72
Carelessness ....................................... 73
Statistics Knowledge Test .............................. 74
Procedure .............................................. 77
Data Analysis ........................................... 78
RESULTS ................................................... 79
Tests Measuring Respondent Characteristics ...................... 79
Statistical Knowledge Test .................................. 82
I2 .................................................... 86
Tests of Hypotheses ....................................... 87
Difﬁculty-Based Item Characteristics and Knowledge Test Scores ....... 94
Conscientiousness and Knowledge Test Scores .................... 94
Math Anxiety and Knowledge Test Scores ...................... 100
Test Anxiety and Knowledge Test Scores ....................... 102
Difﬁculty-based Item Characteristics and 11 ...................... 104
Conscientiousness and I1 ................................... 104
Carelessness and 1, ....................................... 106
Math Anxiety and L ...................................... 110
Test Anxiety and l, ...................................... 112
DISCUSSION ............................................... l 16
Hypothesis 1: Difﬁculty-based Item Characteristics and
Test Scores ............................................ 116
Hypothesis 2: Conscientiousness and Test Scores ............ 117
Hypothesis 3: Carelessness and Test Scores ................ 117
Hypothesis 4: Math Anxiety and Test Scores ............... 118
Hypothesis 5: Test Anxiety and Test Scores ................ 118
Hypothesis 6: Conscientiousness by Item Characteristic Interaction and
Test Scores ............................ 118
Hypothesis 7: Carelessness by Item Characteristic Interaction and Test

vi

Scores ................................ l 19

Hypothesis 8: Math Anxiety by Item Characteristic Interaction and Test
Scores ................................ 1 19
Hypothesis 9: Test Anxiety by Item Characteristic Interaction and Test
Scores ................................ 1 19
Hypothesis 10: Conscientiousness by item Characteristic Interaction and
11 ................................... 120
Hypothesis 11: Carelessness by Item Characteristic Interaction and
12 ................................... 120
Hypothesis 12: Math Anxiety by Item Characteristic Interaction and
i1 ................................... 120
Hypothesis 13: Test Anxiety by Item Characteristic Interaction and
11 ................................... 121
Implications and Conclusions ............................... 121
Limitations ............................................ 123
LIST OF REFERENCES ....................................... 126
APPENDD( A: Personality Measures ............................... 137
APPENDD( B: Statistical Knowledge Items .......................... 140
APPENDIX C: Descriptives for Statistical Knowledge Items .............. 158
APPENDIX D: Item Parameter Estimates for Statistical Knowledge Items ..... 168

vii

Table 1

Table 2

Table 3

Table 4

Table 5

Table 6

Table 7

Table 8

Table 9

Table 10

Table 11

Table 12

Table 13

Table 14

List of Tables

Page
Sources of Both Type P and Type I Inappropriateness for Both Maximum
and Typical Tests ................................... 65
Descriptive Statistics for All Tests and lz’s .................. 80
Factor Analysis of Knowledge Test Items .................. 84

Variance of Dependent Variables Attributable to Between- and Within-
Subjects Effects .................................... 88

Intercorrelations Among All Terms Used in Regression Analyses . . 91

Regression of Percentage Correct on Knowledge Test onto
Conscientiousness and Item Characteristics .................. 95

Regression of percentage correct on knowledge test onto carelessness and
item characteristics .................................. 99

Regression of percentage correct on knowledge test onto math anxiety and
item characteristics ................................. 101

Regression of percentage correct on knowledge test onto test anxiety and

item characteristics ................................. 103
Regression of II onto conscientiousness and item characteristics . . 105
Regression of 11 onto carelessness and item characteristics ...... 107
Regression of 12 onto math anxiety and item characteristics ..... 111
Regression of 1, onto test anxiety and item characteristics ...... 113
Summary of Hypotheses and Support .................... 115

viii

Figure 1

Figure 2

Figure 3
Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Figure 9

Figure 10

Figure 11

Figure 12

List of Figures
Page

A General Model of the Determinants of Item Responses as They Relate
to Test Appropriateness ................................ 8

A General Model of the Determinants of Item Responses with Interaction
Effects ........................................... 10

Proposed interaction between item characteristics and test anxiety . . 25
Proposed interaction between item characteristics and motivation . . 28

Proposed interaction between response bias and susceptibility of items to
bias ............................................. 34

Proposed interaction between test Wiseness and susceptibility of items to
test Wiseness ........................................ 37

Proposed interaction between topic irrelevant ability and topic irrelevant
content ........................................... 40

Proposed interaction between need for approval and opportunity to display

need for approval ................................... 43
A detailed model of the determinants of item responses as they relate to
appropriateness ..................................... 45
A model of the determinants of II ........................ 63

Plot of the effect on test scores of the conscientiousness by item
characteristics interaction .............................. 97

Plot of the effect on L of the interaction between carelessness and item
characteristics ..................................... 109

Introduction

The topic of this paper is test appropriateness. In general terms, a test is
appropriate for a given individual to the extent that it measures the construct or constructs
that it is supposed to measure and nOthing else. Although there is a wealth of research
on the determinants of test scores (e.g., test anxiety, response biases, motivation, item
wording, etc.), there is a relative paucity of research on the determinants of test
appropriateness. The goal of this paper is to develop and partially test a model of test
appropriateness based on literature from 1/0 psychology, educational psychology,
education, and quantitative psychology. Although some of the issues that are discussed
. could be applicable to tests. of any kind, I focus only on multiple choice tests.
Nevertheless, I make an attempt to include a wide range of test content, both maximum
performance measures (i.e., tests composed of items with possible responses that are
either absolutely correct or absolutely incorrect such as mathematics knowledge, reading
ability, paragraph comprehension, spatial relations, etc.) and typical performance measures
(i.e., tests composed of items with responses that are not necessarily right or wrong, such
as personality inventories, interest inventories, etc. It should be noted, however, that
typical performance tests can have right and wrong answers in a sense when they are used
for selection purposes). At the outset, one note of clariﬁcation is in order. Although
discussions of appropriateness are perhaps best directed at the individual item (since this

is where our attempts to measure constructs with tests begin), the terms "test

2
inappropriateness" and "item inappropriateness" are often used interchangeably in this
paper. The reason for this is simply that a test is norhing more than a set of items. To
the extent that those items are inappropriate, the test composed of them is obviously
inappropriate.

I begin with an overview of appropriateness as it has been studied, and follow
with a review of the relevant literature, the purpose of which is to develop a model of test
appropriateness.

Test appropriateness as it has been studied

A tesr is inappropriate to the extent that it measures constructs other than the
construct of interest. A test may be inapprOpriate, however, only for certain respondents.
For example, consider a psychometrically sound paper and pencil test of English
comprehension. For most respondents, this test will yield scores that accurately reﬂect
the English comprehension of the respondents. In other words, the test would be
appropriate for these respondents. Now consider the performance of a visually impaired
individual on this paper and pencil test. This respondent would almost certainly score
very poorly on this test, not because of a lack of English comprehension, but because this
respondent can not cope with the format of the test. For this reason, this test would be
inappropriate as a measure of English comprehension for this individual.

Research on appropriateness has focussed primarily on the development of
techniques that identify respondents for whom a given test is inappropriate. Speciﬁcally,
these techniques involve the identiﬁcation of response patterns that are aberrant and,
therefore, suggest inappropriateness. One way of describing the logic of these indices is

in terms of Guttman vectors. A Guttman vector is simply a vector of zeroes and ones in

3

which all of the ones precede all of the zeroes. If a respondent were to respond to items
in a way that matched perfectly with the difﬁculties of the items (and if there were no
possibility of guessing), then the responses of that respondent, when ordered in terms of
item difﬁculty, would form a perfect Guttman vector. The idea is that the respondent
answers all items at or below a certain difﬁculty level correctly. At some point, however.
the difﬁculty becomes too great for that respondent, and all items above that level of
difﬁculty are answered incorrecrly. If this were the case, then this respondent should
receive a perfect score on an appropriateness index. If, however, some of the items
measured constructs other than the construct of interest, then the responses of a given
individual might depart from a Guttman vector, and the responses of this person would
then be "ﬂagged" by an index of inappropriateness. Consider again the example of a
visually impaired individual taking a test of English comprehension, except that now there
are some paper and pencil items and some items that are asked and answered in an
interview format. For those respondents without impaired vision, we would expect little
difference between the written questions and the interview questions. As a result, we
could order all of the dichotomously scored item responses for these respondents in terms
of their difﬁculty values and expect them to form something resembling a Guttman vector
such as this
1111111100000000

The visually impaired individual, however, would almost certainly do much better on the
interview items regardless of their group-determined difﬁculty values.

This individual, therefore, would have an "aberrant" pattern of responses such as the

following

4
001000110010110

Almost all of the 1’s for this individual could be expected to represent interview
items. Since these interview items as a group should have levels of difﬁculty similar to
those of the paper and pencil items. there should be items of both types at all levels of
group-determined difﬁculty. For the visually impaired individual, however, the most
prominent source of "difﬁculty" is whether or not the items must be seen to be answered.
In other words, there is a source of item difﬁculty for the visually impaired respondent
that does nor apply to the group on which the item difﬁculties were determined.

While this is a useful example for explication. it is an exaggeration. More realistic
examples would be mathematical word problems (or word problems of any kind) given
to people who cannot read well, items with "culture-loaded" content given to someone
unfamiliar with the culture, and any knowledge, ability, or personality test given to
someone with extreme test anxiety.

Many indices have been developed that identify aberrant response patterns, such
as Sato’s Caution Index (Sato, 1975), the Dependability Index (Kane & Brennan, 1980),
and the 11 index (Drasgow, Levine, and Williams, 1985). The 12 index, however, has
received the most recent attention and is, therefore, the appropriateness measure to be
used in the present study. Although the speciﬁcs of the index are described later in the
paper, the general purpose of 12 is to assess the extent to which the responses of a given
respondent conform to the three-parameter Item Response Theory (IRT) model. This is

analogous to the Guttman-based indices such as those of Sato (1975) and Kane &

5

Brennan (1980), except that the L index assesses the congruence between the responses
of an individual and the item parameters and ability estimates from IRT.

As I mentioned earlier, most of the work on these indices has been statistical in
nature. This work has established the fact that these indices are reasonably effective in
detecting departures from expected response models. By contrast, very little work has
been done on the determinants of such departures. Many possible sources of departure
have been suggested, such as cheating, response coding errors, and fatigue. But little
empirical work has been done to establish these factors as sources of departure from
expected response models. In other words, the construct validity of these indices has not
been ﬁrmly established. As a result, we know that these indices detect something, but
we have no clear idea about what this something is.

The present paper attempts to address this issue of the construct validity of
measures of inappropriateness. The ﬁrst step is to treat inappropriateness as a construct
by exploring its meaning and its implications. The second step is to discuss factors that
might be expected to lead to inappropriate responses and build these factors into a model
of inappropriateness. The third step is to begin testing the model to see if the
determinants that are included in the model actually do have an impact on measures of
inappropriateness. To this end, I begin with a discussion of the foundations of mental
testing in general and test appropriateness in particular, and how early concerns over
appropriateness led to modiﬁcations in the conceptual model used to describe item
responses. I then explain how speciﬁc individual and item characteristics might combine
to affect both the level of test scores and the appropriateness of test scores for certain

people. Finally, I describe a test of parts of this model.

Mﬁons of test appropriateness

Although the history of mental testing in general can be traced back thousands of
years to the ancient Chinese and Greeks (DuBois. 1966: Anasrasi, 1988), the roots of
contemporary testing can be found in the early nineteenth century. The work of Galton.
Cattell, Binet, Terman, Goddard. and Others is well documented and need not be reiterated
here (see Hothersall, 1990 or Boring, 1950 for thorough reviews). One theme that runs
through the work of all of these early testing experts is an assumption that any given
mental test measures the same constructs (although the term "construct" wasn’t used) for
all people. In other words. item responses are determined only by individual differences
on the trait of interest and, where appropriate, item difﬁculties. The possibility of test
inappropriateness for certain individuals was not considered. One of the more striking
examples of this assumption at work comes from the testing of immigrants at Ellis Island
in 1914. At Ellis Island, immigrants were asked in their own language several trivia
questions developed by Goddard and his Staff. The trivia questions consisted, for the
most part, of bits of Americana such as "Who is Christy Matthewson?" and "What is
Crisco?" and were designed to assess intelligence. With the beneﬁt of hindsight, it is
obvious that these questions, while perhaps valid as measures of intelligence for an
American sample, were utterly inappropriate for immigrants from Italy, Hungary, Russia,
etc. In other words, these items assessed different constructs for different respondents
depending on whether the respondents were American or not. This contamination,
however, was not identiﬁed by Goddard. Because he assumed that the item responses

were caused by the level of intelligence of the respondent and the difﬁculties of the items

7

and norhing more. he concluded that over 80% of immigrants to the United States were
"feebleminded" (Hethersall, 1990).

Yerkes was among the ﬁrst to identify individual differences other than the
CODSU'UCI of interest that affect item responses. In the preliminary testing of the Army
Alpha test of intelligence or "native wit" in 1917, he recognized that many of the
respondents were not sufﬁciently literate to follow the instructions for the test (Hothersall.
1990). In other words, Yerkes recognized that individual differences other than native
wit, namely reading skills, were determining item responses. For this reason, the Army
Alpha test was inappropriate as a measure of intelligence for the illiterate. It was in
response to this issue that the Army Beta was developed.

Cady (1923), Allport (1928), and Rosenzweig (1934) were among the ﬁrst to
identify item characteristics other than difﬁculty (or the personality-test equivalent of item
difﬁculty, item popularity) that affect item responses. These authors suggested that item
characteristics such as social desirability would also have an effect on item responses, at
least for some respondents.

What I have presented above are the components of a very general model of
mental test item responses. Item responses are determined by the respondent’s level on
the construct of interest, the degree to which various extraneous constructs affect the
reaction of the respondent to items, the difﬁculty of the construct-relevant content of the
item, and other construct-irrelevant factors that inﬂuence item responses. This model is

presented in Figure l.

32.23529; a2 2 222 >2: mm 822:2 Eu: Lo mEnEEBon o... be 3—5:. 3.55“. < ._ vim."—

 

 

 

 

 

 

 

 

8553355
5:525
. ,2. . . .. >SzoEa
Satin . IIWEIII!
mmmzoawmz s
Em:— ..

 

  

woﬁmEth’ES—o
4<ZOmmmm
m30w2<m~xw

 

 

hmwmwpz_u0

 

Hosmhmzoo

 

 

 

9

To the extent that extraneous respondent characteristics have an impact on item
responses for a given respondent, the test composed of those items is inappropriate as a
measure of the construct of interest for that person.

This conceptual model went largely unchanged until 1968 when Donlon & Fischer
introduced their index of test appropriateness: the personal biserial coefﬁcient. The
speciﬁcs of the personal biserial are described later in this paper. The point that I wish
to make about the personal biserial here is that it represents the ﬁrst effort to identify
threats to appropriateness in the form of interactions between item characteristics and
characteristics of the respondent. Since the personal biserial is an index of the
relationship between group-determined item difﬁculties (computed from a sufﬁciently
large sample) and dichotomously scored item responses for a given respondent, it is an
index of the extent to which the item difﬁculties hold for a given respondent. In other
words, it assesses the strength of the relationship between item responses and the
interaction between group-determined item difﬁculties and characteristics of the
respondent. To the extent that the relationship between item characteristics and item
responses depends on or covaries with the level of a characteristic of the respondent, the
test composed of those items is measuring different constructs for different people, i.e.,
the test is inappropriate as a measure of the construct of interest

If we include person by item interactions, the model in Figure 1 is modiﬁed into

the model in Figure 2.

10

acute 5:382: .23 823m“: So: he 3558.23 2: Co .238 382% < .N 8.3:;

 

 

m03wEwho<m<Io
2w: mwzho

 

 

 

 

mmmZOammm
2w:

>SDOEH=Q
bzmcizoo

 

 

 

 

 

 

 

moﬁmeKZES .0
4<ZOmmwa
mDOmz<mhxm

 

 

 

hmmmwhz. ".0
hoamhmzoo

 

 

 

 

11

Now, it would appear that we have two different types of sources of test
inappropriateness. The ﬁrst type, which I will call Type F (for Personal characteristics)
inappropriateness. results from test items measuring characteristics of respondents other
than those that the test was intended to measure. The second type, which I will call Type
I (for Interaction) inappropriateness. results from an effect on item responses of the
interaction between personal characteristics of respondents and item characteristics. What
follows is a discussion of the various sources of these two types of inappropriateness
followed by a discussion of the measures that have been used to identify them. The
sources of inappropriateness that are discussed are the most prominent in the testing and
measurement literature. Before moving on to this discussion, it should be noted that
many of the sources that are discussed may seem more applicable to typical performance
measures (e.g., personality tests) than to maximum measures. It is my position, however,
that most of the sources of inappropriateness that may seem relevant only to one type of
test have an analog in the other type. For example, response sets such as extreme
response set (the tendency to give only extreme ratings), which are discussed in more
detail below, may seem to be relevant only to personality-type tests, but they have an
analog in maximum tests. The analog is what I refer to as positional bias or the tendency
to choose the extreme response options when unsure of or uninterested in the correct
response. So, many of these sources of inappropriateness can be conceptualized as
relevant for either maximum or typical teSts. An effort is made in the discussion that
follows to point out both the maximum and typical sides of each of the sources of

inappropriateness.

12

Sources of Type P ingpmopriateness

micscenceldenial. Acquiescence is the tendency of a respondent to uniformly
agree with statements that are made. Denial is the tendency of a respondent to uniformly
disagree. Although this applies primarily to typical tests where respondents are asked to
agree or disagree, or provide a rating of the extent to which they agree, it could also
apply to maximum tests with only true/false response options.

Acquiescence was originally investigated as a source of error in personality tests
(Humm, Storment, & Ioms, 1939; Humm & Humm, 1944; Cronbach, 1946), and it is one
of the few sources of inappropriateness that doesn’t seem to have an analog in common
multiple choice maximum tests.

Acquiescence, like all response sets, tends to emerge when a respondent ﬁnds an
item to be ambiguous or difﬁcult. In fact, Messick (1966) claimed that there are two
types of acquiescence: that based on misunderstanding of items or carelessness and that
based on personality.

Acquiescence, and not item content, has been found to account for much of the
variance in responses to the MMPI (Jackson & Messick, 1958, 1965; Bock, Dicker, &
Van Pelt, 1969), although some have claimed that acquiescence is not a problem (Block,
1965; Rorer, 1965).

Acquiescence leads to particular types of proﬁles in personality and interest tests
that may or may not reﬂect the true nature of the respondent. To the extent that the
MMPI and tests like it measure acquiescence in addition to or instead of the constructs
that they are supposed to measure, those tests are inappropriate as measures of those

constructs for those people who are high in acquiescence, although it should be noted that

13
acquiescence itself has been conceptualized as a personality construct of interest (Couch
& Kenisron, 1960; Wiggins, 1962; Messick, 1966).

Need for approval. Need for approval has been suggested as an individual
difference characteristic that causes certain people to paint a favorable or socially
desirable picture of themselves (Crowne & Marlowe. 1964). This characteristic would
lead people to respond to items in a way that they feel depicts them nor as they are, but
as they would like to be and as they would like to be perceived by others. This need can
be personality based or situationally based (e. g., taking a test for research purposes versus
taking a test as part of a selection battery).

This leads us to the well-known phenomenon of "Faking good". It has long been
known that people can and do fake good on a wide variety of tests (personality tests:
Ruch, 1942; Green, 1951: interest batteries: Kingston, George, & Ewens, 1956; Gehman,
1957: Rorschach: Henry & Rotter, 1956: intelligence tests: Saupe, 1960). To the extent
that Faking good is taking place for a given individual, the test is inappropriate as a
measure of the construct of interest.

The maximum test analog to Faking Good is simply "Cheating". The principle
is the same. Some maximum test takers wish to depict themselves as they would like to
be instead of as they are. In response to this desire, they cheat. Obviously, to the extent
that a maximum test reﬂects cheating, it is inappropriate as a measure of the construct of
interest.

Extreme response set/centrjal tendencv. Some investigators have found evidence

of Extreme response set (Berg & Collier, 1953: Cronbach, 1946, 1950) and its opposite,

central tendency (Gaier, Lee, & McQuitty, 1953; Damarin, 1970). Extreme response set

l4

exists when a respondent tends to choose the extreme response options (i.e., the ﬁrst in
a list of options or the last in a list of options) over other response options irrespective
of item content or correctness of the extreme options. Central tendency exists when a
respondent tends to choose the middle response options over the extreme response
options. Although both extreme response set and central tendency in typical tests have
clear analogs in maximum tests, the maximum and typical forms of this source of
inappropriateness are treated separately here. In this paper, I refer to the typical form of
these sources of Type P inappropriateness as response sets, and to the maximum form as
response biases.

Response sets and biases, like acquiescence, tend to emerge when items are
difﬁcult or ambiguous. Extreme response set leads to particular types of proﬁles in
personality and interest tests. Depending on the particular test being taken, extreme
response bias can spuriously raise or lower scores on intelligence tests (Metfessel & Sax,
1957, 1958; Rapaport & Berg, 1955). Again, to the extent that this response set or bias
affects the score of a given respondent, the score is a misrepresentation of the constructs
of interest.

Test Anxiety. Although the concept of anxiety has existed in psychology since
shortly after the inception of psychology, the notion of anxiety with respect to particular,
normal situations is relatively new. Mandler & Sarason (1952) were among the ﬁrst to
discuss such a construct when they presented their measure of test anxiety. They found
that test anxiety, by causing responses to the test situation that were irrelevant to cognitive
test performance, decreased test performance. Mandler & Sarason (1952) and Waterhouse

& Child (1953) also found an interaction between these situation-speciﬁc trait anxieties

13

instructions. etc.) such that those respondents who were low on the situation-speciﬁc trait
anxiety seemed to beneﬁt slightly from the general situational anxiety whereas those
respondents who were high on the speciﬁc trait anxiety showed a decrease in
performance.

Although there was some initial skepticism about the distinctiveness of speciﬁc
trait anxieties such as test anxiety, there now seems to be reasonable agreement on its
distinctiveness (Harper, 1976: Watson & Clark, 1984). Test anxious people have been
found to have poorer study habits, fact retention, and elaborative processing, as well as
diminished abilities to synthesize or analyze relevant information (Herrmann, 1982).
They have been found to have deleterious cognitive responses to test situations (Endler
& Hunt, 1966), and show deﬁciencies in all‘stages of information processing (Benjamin,
McKeachie, Lin, & Holinger, 1981). Also, Naveh-Benjamin, McKeachie, & Lin (1987)
distinguished between two types of test anxious people: those with good study habits who
have trouble only with information retrieval in the test situation, and those with poor
study habits who have trouble with all stages of information processing. The implication
of Naveh-Benjamin et a1. (1987) is that some respondents are test anxious because they
are not prepared, while others have no good reason for being anxious. If this is the case,
then the former type of anxiety would perhaps be better labelled something else, since it
is not the test per se that these respondents are anxious about. Rather, they are anxious
about a test that they are not prepared for, which would seem to be something entirely

different from general anxiety in testing situations.

_ 16
The implications of test anxiety for typical tests should be similar to those for

maximum tests. Test anxiety inhibits information processing, which means that highly
test anxious respondents should have more difﬁculty providing accurate responses to
items on typical tests than do low test anxious respondents.

To the extent that a test reﬂects test anxiety or similar constructs (e.g., number
anxiety; Dreger & Aiken, 1957; frustration anxiety; Waterhouse & Child, 1953), it is
inappropriate as a measure of the construct of interest for that respondent

Cognitive Controls. Cognitive controls are involuntary ways of approaching and
interpreting complex situations or stimuli. They represent different ways in which we
organize the information that we are constanly receiving from the external world. Among
the various types of cognitive controls are ﬁeld articulation, equivalence range, levelling-
sharpening, cognitive complexity, and scanning. Although many researchers have
hypothesized relationships between these individual controls and intellectual abilities (e. g.
Gardner, Jackson, & Messick, 1960; Witkin, 1959; Klein, 1959; Gardner, 1959), there has
never been a clear distinction among these various cognitive controls (McGee, 197 9), and
the construct validity of many of their measures has been called into question as well
(Sherman, 1967; McGee, 1979). Of these cognitive controls, ﬁeld articulation has
received the most attention. It has been developed as a construct itself (Broverman,
Klueter, Kobayashi, & Vogel, 1968), and measures of ﬁeld articulation, such as the
Witkin Embedded Figures Test, have reﬂected convergent and discriminant validity
(Satterly, 1976; Witkin, 1974). For these reasons, this discussion focuses on ﬁeld

articulation to the exclusion of the other cognitive control principles.

17

Field articulation is the extent to which a person is able to pick out certain
relevant aspects of a complex stimulus or situation to the exclusion of other, superﬂuous
aspects. It has been shown to be related to scores on mathematics tests (Satterly, 1976),
learning and memory (Goodenough, 1976), paragraph comprehension (Klein, 1967), and
acquiescence (Forehand, 1962).

Field articulation might also be expected to affect responses to typical-test items.
Speciﬁcally, respondents who are low in ﬁeld articulation (also called ﬁeld dependent
respondents) may have difficulty extracting relevant information ﬁom questions on
personality or interest inventories and may therefore have difﬁculty providing accurate
information.

Regponse biapsZtest wiseness. Response bias refers to the tendency to select
particular response options for reasons other than content. For example, Lawrence (1957)
showed that children have a tendency to guess the ﬁrst distractor presented in a list of
distractors when unsure of the correct response. This tendency is a response bias.
Extreme response bias, which was mentioned earlier, might be considered another
example.

There is some confusion in the literature about the distinction between response
bias and test wiseness (Samacki, 1979). Test wiseness has been deﬁned as the capacity
of the respondent to utilize the characteristics and formats of the test and/or test situation
to receive a high score. The distinction between response bias and test wiseness seems
to be a matter of rationale. If a respondent has a tendency to guess the extreme response
options when in doubt of the correct answer, and the reason for this behavior is simply

that the respondent is in the habit of guessing the extreme response options, then this

18

would be merely a response bias. If, on the other hand, the respondent had a tendency
to guess the extreme response options because he/she had heard that this particular test-
maker tended to put the correct options on the extreme positions, or if the respondent
noticed that many of the correct responses on the early part of the test were in the
extreme positions, and this were the reason for the reliance on the extreme options when
guessing, then the use of the extreme positions would be an example of test wiseness.
So, if the response tendencies of a respondent unsure of correct answers are based on
whim or habit, they are response biases. If the response tendencies of a respondent
unsure of correct answers are based on the characteristics of the test or testing situation,
then they reﬂect test wiseness.

Test wiseness has been found to have an effect on a wide variety of multiple
choice tests (Dolly & Williams, 1986; Wahlstrom & Boersma, 1968; Millman, Bishop,
& Ebel, 1965). To the extent that a test measures test wiseness instead of the construct
of interest, the test is inappropriate as a measure of the construct of interest.

Test wiseness can also affect scores on typical tests in certain situations. For
example, if a mental health professional is given the MMPI as part of a battery of
selection procedures, he/she would probably know enough about the test to be able to
provide whichever type of proﬁle that he/she wished. This knowledge of the MMPI
would also carry over to other personality tests as well. In this way, the mental health
professional would be using the characteristics of the test to his/her advantage.

The relationship between response bias and appropriateness is less clear. If
response biases are exposed only when the respondent has no idea what the correct

answer to an item is (or in the case of typical tests, what the most accurate answer is),

l9

and if the response bias is in fact not based on rationale of any kind, then the response
bias will affect item responses only in a random way, and there should be no effect on
test scores. If, however, the response biases emerge even when the respondent does have
some idea of the correct response, and if those biases override the content knowledge of
the respondent such that the biases lead the respondent to choose an option that is at odds
with his/her content knowledge, then the response biases would affect test scores.

Carelessnesslmotivation. One of the earliest identiﬁed contaminants of test scores
was carelessness. Respondents who are not motivated to provide responses on a test that
reﬂect that respondent’s true standing on the construct of interest, be it maximum or
typical, may respond carelessly to items. Also, respondents who are anxious about the
test situation may accidentally provide answers that do not reﬂect his/her true standing
on the construct of interest (e.g., coding errors on computer-scored answer sheets).

Peterson (1961) included nonsense items in an interest inventory as a carelessness
check and found that many respondents endorsed them, especially when they were in the
latter part of the inventory. This nonsense item technique has been incorporated into
many noncognitive tests (e.g., MMPI, CPI). To the extent that a test reﬂects carelessness
or motivation to perform to the best of one’s abilities, the test is inappropriate as a
measure of the construct of interest.

This brings us to a related topic: motivation. In real world testing situations, such
as classroom testing and employee selection situations, tests are generally assumed to
measure not one, but two constructs: the construct of interest (such as content
knowledge), and motivation. Tests in real world contexts are developed primarily as

measures of content knowledge and perhaps application, and though motivation can and

20

perhaps should be viewed as a contaminant, it is seldom examined or taken into account
when test scores are interpreted. If motivation is a part of such tests, then it should be
taken more seriously. Its effects on test scores should be examined more carefully, for
a test that is designed to measure content knowledge or intelligence is inappropriate as
a measure of those constructs to the extent that it is affected by motivation.

There is one ﬁnal point that should be made with respect to motivation. I said
earlier that motivation to provide responses on a test that reﬂect that respondent’s true
standing on the construct of interest was related to carelessness. One could make the
argument that cheating or "faking good" would be another result of such motivation, but
in the opposite direction. For the sake of simplicity, when I refer to motivation, I am
speaking only of motivation (or lack of) that results in careless responding. Outcomes
such as "faking good" have been/will be dealt with during discussions of need for
approval.

Omissiveness. One ﬁnal source of inappropriateness that should be mentioned is
omissiveness. Omissiveness is the tendency to leave blank those items for which one is
not sure of the correct or accurate answer instead of guessing. Although there is very
little research on omissiveness, Rosenberg, Izard, & Hollander (1955) found that there
was an "undecided" or omissiveness response set that affected responses to noncognitive
(i.e., typical) test items. Also, Schurnan & Kalton (1985) discussed research which
showed that better educated respondents were less willing to endorse "Don’t Know"
response options on attitude surveys than were less educated respondents. To the extent
that a test measures omissiveness instead of the construct of interest, the test is

inappropriate as a measure of the construct of interest It should also be noted that the

21

measurement of omissiveness is perfectly straightforward: it is assessed by counting the
number of omissions. It is, therefore, not included in the section on measures of Type
P inappropriatene ss.

Section summit} In the above section, several sources of Type P
inappropriateness were identiﬁed. These sources were Acquiescence/Denial, Need for
Approval, Extreme Response Set/Central Tendency, Test Anxiety, Cognitive Control,
Response Bias/T est Wiseness, Carelessness, and Orrrissiveness. These factors have been
found to affect scores on a variety of tests, both typical and maximum, and to the extent
that those tests are intended to measure something other than the above-mentioned factors
but are contaminated by one of these factors, those tests are inappropriate as measures of
their respective constructs of interest. It should be noted that a certain amount of
inappropriateness in a test is not a reason to do away with the test. The inappropriateness
should simply be kept to a minimum.

Sources of Type I Inappropriateness

 

Type I inappropriateness is described above as being present to the extent that
there is a person by item interaction that affects item responses. For example, Anderson
(1990) suggests that the item responses of test anxious respondents (on a cognitive test)
are more adversely affected by item difﬁculty than are respondents who are not test
anxious, that is, although item difﬁculty affects the "correctness" of item responses for
respondents who are low in test anxiety, it affects "correcmess" more strongly for
respondents who are high in test anxiety.

Although it would be possible to suggest any personal characteristic by item

characteristic interaction as a source of inappropriateness, some are more plausible based

22

on previous research. It is these latter types of interactions that are the focus of this
section. It should be noted that while most of the interactions discussed below have
implications for both maximum and typical tests, some have implications for only one or
the other. Where applicable, implications for both are discussed.

Before moving on to these Type I sources of inappropriateness, there is one ﬁnal
issue that must be addressed. I have thus far discussed personal characteristics that affect
item responses, and I am about to discuss personal characteristic by item characteristic
interactions that affect item responses. One might ask why I haven’t discussed item
characteristics separately as a source of inappropriateness. The reason is that item
characteristics, if they have only a direct effect that is not moderated by personal
characteristics, do not distinguish among people. In other words, item characteristics
alone do not contribute to inappropriateness because, by acting alone, they affect every
respondent in the same way. So, the most that item characteristics can do is to add a
constant to the score of every respondent. Since this would not change our conclusions
with respect to the standing of any of the respondents on the construct of interest, item
characteristics alone cannot be considered sources of inappropriateness. It is only when
their effects depend upon some personal characteristic of the respondent that they become
relevant to appropriateness.

Test anxieg by item characteristics. One of the implications of the Anderson
(1990) paper is that test anxiety should interact with any item characteristic that
contributes to item difﬁculty. Although we often think of item difﬁculty solely in terms
of the difﬁculty of the content of the item, item difﬁculty (i.e., the percentage of

respondents answering the item correctly or, in Item Response Theory terms, the

23

probability of a respondent with a certain ability level answering the items correctly) is
affected by a number of item characteristics.

Item difﬁculty has been found to be affected by ambiguity of item content
(Peabody, 1966), item/stem complexity (e.g., word problems, Zimmerman, 1954), and
response option complexity (e.g., none of the above, Hughes & Trimble, 1965; Dudycha
& Carpenter, 1973). Research has also found that negatively worded items are more
difﬁcult than positively worded items (e.g., "Which of the following are p93..." as opposed
to "Which of the following 35...", Dudycha & Carpenter, 1973), and that open stem items

are more difﬁcult than closed stem items (e.g., "Hitler was a member of the as

 

opposed to "Hitler was a member of which of the following parties?", Dudycha &
Carpenter, 1973). Finally, there is evidence to suggest that item position can have an
effect on item responses. For example, all else being equal, item to item transfer of
training leads to later items being less difﬁcult than earlier items (Whitcomb & Travers,
1957). In the typical performance test area, there is evidence that suggests that items later
in a test are answered not on the basis of fact per se, but instead are answered in a way
that will create consistency with responses to items appearing previously in the set of
items (Schuman & Kalton, 1985; Feldman & Lynch, 1988).

In addition to these research ﬁndings, I suggest that one additional factor, topic
irrelevant item content (e.g., reading component in a math test), contributes to item
difﬁculty. It has been found that test anxiety is related to the complexity of a task
through information processing (Benjamin et al., 1981; Naveh-Benjamin et al., 1987;
Paulman & Kennally, 1984). Speciﬁcally, the cognitive component of test anxiety diverts

cognitive resources from the task of test taking, thereby decreasing the amount of

24

cognitive resource devoted to the task. This suggests that any factor which increases the
difﬁculty of a test item will interact with test anxiety to affect item responses.

Speciﬁcally, I suggest the following proposition:

Proposition 1 - Test anxiety interacts with item content difﬁculty, ambiguity of
item meaning, positive/negative stem wording, open/closed stem format, response
option complexity, stem complexity, topic irrelevant item content, and item
position to affect item responses such that the responses of test takers high in test
anxiety are more adversely affected by these item characteristics than are the
responses of test takers low in test anxiety. The form of this expected interaction

is depicted in Figure 3.

25

been...“ so. new 8233835 Eu: :333 520885 38qu .m Semi

 

:e... 2.32:5 39
$0
_ I _
>._.m_xz< hmmb. IQ: I. 33
>E_XZ< hmw... 30...
.i :9:

 

mmwzoammm Em:

26

As can be seen, it is proposed that item difﬁculty in the form of various item
construction characteristics has a slight, negative impact on item responses for low Test
Anxious respondents and a considerably larger negative impact on the responses of high
Test Anxious respondents. Although only one interaction is presented in Figure 3, it is
intended to represent all of the item construction principles listed in Proposition 1.

All of these interactions, with the possible exception of that associated with
content difﬁculty, could apply to both maximum and typical tests. For example,
respondents high in test anxiety are less likely to provide responses that accurately reﬂect
their true standing on the construct of interest on items that are high in ambiguity than
are respondents low in test anxiety, while this difference does not exist (or is less
profound) for items low in ambiguity.

If this proposition is supported, then a test which contains items that vary with
respect to any of the above mentioned characteristics is inappropriate (i.e., Type I) as a
measure of the construct of interest to the extent that respondents vary with respect to test
anxiety.

Motivation by item clmgcterm It was mentioned earlier that a lack of
motivation to provide responses that reﬂect one’s true level on the construct of interest
can lead to carelessness, which implies inappropriateness. Careless responding has often
been identiﬁed as a contaminant of test scores in applied psychology. For example, one
of the criticisms of concurrent validation designs is that incumbents, because they have
nothing to gain by performing well on selection tests administered to them in a validation
context, may respond carelessly to some items (Schmitt, Noe, Gooding, & Kirsch, 1984).

It seems that they would be most likely to respond carelessly to those items that would

27

require more effort. i.e., the more difﬁcult items. In other words, we would expecr a
motivation by item difﬁculty interacrion. It has been shown that a variety of test takers
are able to accurately estimate item difﬁculties, with correlations between true and
esrimated difﬁculty ranging from .56 to .77 (Diamond & Lorge, 1954). Likewise. any
factor that contributes to item difﬁculty would be expected to interact with morivation to

affect item responses. This sugoests the following proposition:

Proposition; - Motivation interacts with item content difﬁculty, ambiguity of item
meaning, positive/negative stem wording, open/closed stem format, response
option complexity, Stem complexity, and topic irrelevant item content to affecr
item responses such that respondents who are low in motivation reﬂect more
carelessness on items that possess characteristics such as item content difﬁculty,
negative wording, and complex response options than they do on items that do not
possess these characteristics (e. g., items with simple content, positive wording, and
simple response options). Respondents high in motivation reﬂect little

carelessness in either case. The nature of this interaction is depicted in Figure 4.

28

52538:. Ea 3522855 Eu: 5253 530225 3835 .v 9.23”.

3.2055

10.: 30..
_ _ o

 

.............. 20.2252 26.. I 39

o
I
.-
U
o
o
a
o
u
c
o
D
c
O
D
C.
C
O
o
a
I
O
n
o
o
o
O
.
c
C
.0
O
l
c
O
.
o
C
.
O
o
c
C
U
o
v
a
o
I
O
C
o
o
I
o
C
o
o
c
o
o
O
o
n
o
o
o
n
on
..

//./

20:.<>_PO_2 :9:

.l 10.:

 

$058660

29

The form of the interaction presented in Figure 4 is similar to that of the
interaction presented in Figure 3. For highly motivated test takers, item construction-
based difﬁculty is expected to have a slight negative impact on item responses. For low
motivation test takers, this negative effect should be more pronounced. Again, the form
of this interaction holds for all of the principles lisred in Proposition 2.

All of these interactions. with the possible exception of content difﬁculty, could
apply to both maximum and typical tests. For example, respondents low in motivation
are less likely to provide responses that accurately reﬂect their true standing on the
construct of interest on items that are negatively worded than are respondents high in
motivation, while this difference does not exist (or is less profound) for items that are
positively worded.

Motivation is not expected to interact with item position because, while it might
be expected that the difﬁculty associated with items early in a tesr would affect low
motivation respondents more severely than it would respondents high in motivation,
fatigue effects over time would be expected to show similar effects for the two motivation
groups, so that the position effects would be washed out.

Omissiveness by item characrerisrics. It was mentioned earlier that respondents
vary with respect to their reactions to items when they are unsure of the answer. Some
respondents will guess at any item even if they have no idea of the correct response (what
Cronbach (1946) called the gambler’s mentality), while others will tend to leave such
items blank. Omissiveness may not be common among the generation of school-goers
that grew up with standardized tests, since these respondents would generally know to

guess at every item unless told to do otherwise. The older generation, however, along

30
with the less-educated, are less likely to possess such test wiseness. These respondents
may reason that they should leave items for which they have no response blank since they
should, in fan. get the item wrong. Since it is quite possible to guess correctly on
multiple choice tests, differences in omissiveness will lead to differences in test/item
scores. Furthermore, if respondents are more likely to guess or fail to guess at the more
difﬁcult items, any factor that contributes to item difﬁculty should contribute to the

omissiveness by item interaction. This suggests the following proposition:

Proposip'on 3 - Omissiveness interacts with ambiguity of item meaning, difﬁculty
of item content, positive/negative stem wording, open/closed stem format, response
option complexity, stem complexity, topic irrelevant item content, and item
position to affect item responses such that the responses of respondents high in
omissiveness are more adversely affected by these item characteristics than are the
responses of respondents low in omissiveness. In typical test terms, respondents
high in omissiveness are less likely to provide responses that accurately reﬂect
their true standing on the construct of interest on items that possess characteristics
such as stem complexity and topic irrelevant content than are respondents low in
omissiveness, while this difference does not exisr (or is less profound) for items
that do not possess these characteristics. The nature of these interactions should
be similar to that presented in Figure 3 and will therefore not be depicted

graphically.

31

Field articulagtiognbv item chacteristics. It was said earlier that cognitive controls
are involuntary ways of approaching and interpreting complex situations or stimuli, and
that one of these controls, ﬁeld articulation, is the extent to which a person is able to pick
out certain relevant aspects of a complex stimulus or situation to the exclusion of other,
superﬂuous aspects. If respondents vary with respect to ﬁeld articulation, then ﬁeld
articulation by item characteristic interactions would be possible for those item
characteristics that serve to distinguish among respondents with different levels of ﬁeld
articulation. In particular, those item characteristics that contribute to the complexity of
the item/stimulus should interact with ﬁeld articulation to affect item responses. For
example, it might be expected that respondents low in ﬁeld articulation (i.e., ﬁeld
dependent respondents) would have more difﬁculty with word problems (i.e., items
embedded in a context) than would respondents high in ﬁeld articulation. This suggests

the following proposition:

Proposition 4 - Field articulation interacts with item stem complexity, topic
irrelevant item content, and response option complexity to affect item responses
such that the responses of respondents low in ﬁeld articulation are more adversely
affected by these item characterisrics than are the responses of those persons high
in ﬁeld articulation. In typical tesr terms, respondents low in ﬁeld articulation are
less likely to provide responses that accurately reﬂect their true standing on the
construcr of interest on items that possess these characteristics than are
respondents high in ﬁeld articulation, while this difference does not exist (or is

less profound) for items that do not possess these characteristics (e. g., items with

32

simple stems and no topic irrelevant content). The nature of these interactions is

also expecred to be similar to that presented in Figure 3 and will therefore not be

depicted graphically here.

Response bias bv susceptibilitv to bier; It was said earlier that response biases are
response tendencies (such as central tendency, extreme response bias, etc.) of a respondent
unsure of correct answers based on whim or habit. and that if correCt response options
were evenly distributedabout the response positions, then the response bias will affect
item responses only in a random way, and there should be no effect on test scores.
Research has shown, however, that correct responses often are not evenly distributed
(Metfessel & Sax, 1957, 1958). Therefore, a given response bias, although whimsical,
can lead to higher or lower test scores if the items in the test are susceptible to such bias.

This suggests the following proposition:

Proposition 5 - Response bias interacts with susceptibility of a test to response
bias to affect item responses such that respondents who possess a particular
response bias receive test scores that are higher than those of respondents who do
not possess the bias if the test is loaded with items whose correct options are in
positions that are likely to be chosen by the respondent with the bias. If the test
is loaded with items whose correct options are in positions that are not likely to
be chosen by the respondent with a particular bias (e.g., many items with correct
answers in the extreme options given to a respondent with a central tendency
bias), then that respondent receives a lower test score than the respondent without

the bias. In typical test terms, a respondent who possesses a particular response

33
set is less likely to provide responses that accurately reﬂect her/his true standing
on the construct of interest on a test that is loaded with items whose options that
are "correct" for that person (i.e., that best reﬂect the respondent’s true standing
on the construct of interest) are in positions that are unlikely to be chosen by the
respondentwith the particular bias than is a respondent who does not possess the
set, while this difference does not exist (or is less profound) on a test that is not

loaded with such items. The nature of this interaction is presented in Figure 5.

34

10.:

was 3 38: he 3:538?” .23 SE 8.8%“: 283.2. 338.25 3.835 .m 2%.".

«Em o. .Emomsm

3%..

...mm... It; mSm

o
o-
o
o
o
o
o
o
o
o
o
a
o
o
o
o
o
o
O
o
o
o
O
o
o
o
o
o
o
o
o
o
o.
o

 

C.
O.
O

ham... ...mz.<0< wﬁm

 

I. 25..

i. 20.:

mmmcoammm Em:

35

Figure 5 suggests ﬁrst that, for respondents with no response biases, susceptibility
of test items to bias has no effect on item responses. For those test takers whose response
bias is contradictory to the susceptibility of the items (i.e., a respondent with a
predilecrion for extreme option positions who takes a test with a preponderance of correct
options in the middle positions), susceptibility should have a negative effect on item
responses, while the opposite should occur for respondents whose bias is in line with the
bias of the test.

Test wiseness by susceptibilitv to wiseness. It was said earlier that if the response
tendencies of a respondent unsure of correct answers are based on the characteristics of
the test or testing situation, then they reﬂect test wiseness. Any characteristic of items
that tends to elicit this test wiseness in those respondents who possess some degree of test
wiseness should interact with wiseness to affect item responses. For example, consider
the following item from a test of metric system knowledge (assume that respondents are

instructed to select the single "best" answer to each question):

Which of the following is the unit of measurement closest to a unit in the
English system?

a. one—thousandth of a liter

b. one milliliter

c. one centiliter

(1. one decaliter

36

Now consider the possible answers of two respondents of equal ability, one of
whom is high in test wiseness. the Other low. with neither possessing the content
knowledge necessary to answer this question. The respondent low in test wiseness has
no recourse but to guess blindly, with a probability of a correct response equal to .25.
The respondent high in test wiseness. however. can use the fact that options a and b are
equivalent to discard both of them as possible correct responses. Thus, this respondent
has only two options to guess from. with a probability of correct response equal to .50.
It can be said that this item is susceptible to test wiseness because one of the more
commonly identiﬁed aspects of test wiseness, deduction, can be used to increase one’s
chances of responding correcrly to the item. If response b had been "one liter" instead
of "one milliliter", then the susceptibility to test wiseness of the item would be removed,
and the probabilities of correct responses for the two respondents would be equal. This

suggests the following proposition:

Proposig'pn 6 - Test wiseness interacts with the susceptibility of items to test
wiseness to affect item responses such that test wiseness leads to higher
probabilities of correct responses on items that are susceptible to test wiseness, but
is unrelated to the probability of correct response for those items that are not
susceptible to test wiseness. To the extent that some form of beneﬁt accrues to
respondents with particular proﬁles on typical tests (e.g., a personality test used
as a selection instrument), this interaction is expected to hold for typical tests as

well. This interaction is presented in Figure 6.

326$? 32 2 2:0: .6 E53383 95 $2823 58 503.2. 5382:. c8255 .0 2:wE

menace? 2 .Eoomam

 

37

 

 

:9: 25..
_ _ c
E. 26..
l :9:

 

newconmom Em:

38

As can be seen in Figure 6. the proposed relatiOuships among wiseness,
susceptibility of items to wiseness. and item responses are identical to those among bias
with the test. susceptibility tq bias. and item responses. Speciﬁcally, test wiseness is
beneﬁcial to the "test wise" on items that are susceptible to wiseness and has no effect
on items that are not.

Topic irrelevant abilitv bv topic irrelevant item content. It was discussed above
that a test is inappropriate as a measure of the consu'uct of interest to the extent that it
measures a construct orher than the construct(s) of interesr. Any item characteristic which
affects the relationship between the topic irrelevant ability of a respondent (e.g., verbal
ability on a math test) and item responses can be said to moderate that relationship. For
example, there is no reason to suspect that verbal ability would have an effect on a
respondent’s answer to a calculation problem in mathematics (i.e., a problem with no
words, just numbers, such as 2 X 4). There is, however, reason to suspect that verbal
ability would have an effect on a respondent’s answer to a verbally phrased math problem

(e.g, What is the product of two and four?). This suggests the following proposition:

Proposition 7 - Topic irrelevant ability interacts with topic irrelevant item content
to affect item responses such that the responses of respondents low in topic
irrelevant ability are more adversely affected by topic irrelevant item content than
are the responses of respondents high in topic irrelevant ability. In typical test
terms, respondents low in standing on the irrelevant construct are less likely to
provide responses that accurately reﬂect their true standing on the construct of

interest on items that are high in topic irrelevant content (i.e., content that matches

39

the irrelevant construct than are respondents high in the irrelevant construct, while
this difference does not exist (or is less profound) for items low in topic irrelevant

content. This interacrion is presented in Figure 7.

40

52:00 “Ego—0.:
.538 285.2: can;
:9: as“:

. ..=m< rimm— OEOH 254

..=m< imam: CECE. 10.:

_ 9E2 23 bags Ego—2: 032 5833 c2832.: cacao:—

o

.N. 2:?”—

125.—

.I 10.:

 

mmmcoamom Em:

41

Figure 7 shows that topic irrelevant content may have a slight negative impact on
item responses for test takers high in topic irrelevant ability, but a large negative impact
on responses of test takers low in topic irrelevant ability.

Need for approval bv item characteristics. The ﬁnal form of Type I
inappropriateness to be discussed here involves need for approval. Speciﬁcally, certain
types of items are more likely than others to elicit responses that reﬂect the need for
approval of the respondent (This body of literature is discussed in more detail in the
section titled, "Measures of Type P Inappropriateness). While need for approval is
usually studied as one potential determinant of responses to typical test items, the
responses to maximum test items that are the result of cheating seem to be
psychologically equivalent to responses on typical test items that are the result of need
for approval (as it has been conceptualized). In the maximum domain, certain items on
a test might be more susceptible to cheating than others. For example, items at the ends
of columns on bubble sheets might be easier to identify for someone copying answers
than would items surrounded on all sides by other items. In the typical domain, items
have been found to vary with respect to social desirability (Marlowe & Crowne, 1964).
For both maxinium and typical tests, items may vary in the extent to Which they reflect

need for approval. This suggests the following proposition:

Proposit_ion 8 - Need for approval interacts with the characteristics of an item that
might serve to reflect such a disposition to affect item responses. In the
maximum domain, items which provide more of an opportunity to cheat result in

inflated scores for respondents who are high in need for approval but not for

42

respondents low in need for approval. Items which provide little or no
opportunity to cheat reﬂect no such difference across respondents. In‘the typical
domain, respondents high in need for approval are less likely to provide responses
that accurately reflect their true standing on the construct of interest on items that
are high in social desirability than are respondents low in need for approval, while
this difference does not exist (or is less profound) for items low in social

desirability. This interaction is presented in Figure 8.

43

ma .
ch 30
c an
Em.
.—u o. h
55:
ammo
6:5
335%
BO: .
.2— =
. 0::
or—ﬁm .
n m&

_¢sz_ >
— m_n_m_—U
O
_ o

E

n_< mo“. ommz
2,0

._

 

OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO

.._<>O
Emm<
m
on. 952 :9
I
.l :9:

 

C
Gamma E
0:

44
Figure 8 shows that opportunity to display need for approval, whether the
opportunity be a clear view of a neighbor’s paper or an item high in social desirability
on a typical test. has no effect on the responses of test takers low in need for approval
but a large positive impact on the responses of test takers high in need for approval.
Seg’on sucmmarv. In this section, I reviewed some of the forms that Type I
inappropriateness can take and suggested specific propositions about the form of
interactions between personal characteristics of respondents and item characteristics. If
these propositions are mapped onto the model presented on page 6 along with the sources
of Type P inappropriateness discussed earlier in this paper, a new model (Figure 9)

emerges.

45

32.222595 2 .222 a2: 3 82o%£80.: he 355.520.. 9... .8 .25.: 20:53. < .m 2:3...

 

 

. . haw—.200 2W: h2<>m4wc§ 0.8» .h
Egon. Empu .o

20:30.". 2w: .u

Suva OmmOagmmO .v

thm w). pﬁOEw>F<sz .0
EKSOO 20.50 waOmm w: N
52200 2m: “.0 >5.ng ..

wogmamhogzzo 2w: c930

 

 

> 5:032:
525200 2m :

 

 

 

 

 

 

mmmZOmmmm
Em:

 

 

 

omewawio d
mmwzmwwawso .s
mmemmi' hmwSSQ meOmec .o

20:<50:=< Cami .u
>.—w_xz< bmmh .v

hum wmzonmmw: mﬁwc—Xw .a
45,019"? con. .9wa .«

4<2w302wowm500< .—
memepogg .220“ch «892:an

 

 

 

.— wwmmhz. ".0
#0295200

 

46

In this model, all personal characteristics of respondents have direct effects on
item responses, and all of these except the impact of the standing of the respondent on
the construct of interest represent Type P inappropriateness. Also, many of the item
characteristics moderate the effects of the personal characteristics, and these moderating
effects represent Type I inapprOpriateness. The next section of this paper deals with the
measurement of Type P and Type I inappropriateness.

Measures of Type P inappropriateness

The sources of Type P inappropriateness have direct, simple effects on test scores.
They can be studied as main effects. As such, they can often be measured directly. I
now describe the various ways that sources of Type P inappropriateness have been
measured. Where relevant, I discuss differences between measures for Maximum and
typical tests.

Acquiescenceldenigl and eigeme response set/bias. Since these two sources of
inappropriateness have been measured in similar ways, they will be treated together. The
general method for assessing acquiescence/denial or extreme response set and their effects
on test scores (both Maximum and typical) has been to compare the number of response
options in question that were endorsed (e.g., the number of "agree" responses for
acquiescence, the number of extreme options for extreme response set) to that expected
by chance (Humm & Humm, 1944; Jackson & Messick, 1958). The problem with this
method is that deviation from a chance model may simply reﬂect the actual standing of
the respondent on the construct of interest. The solution to this problem for acquiescence
was to reverse the wording of some items such that a person could contradict him/herself

by agreeing categorically to all items.

47

Although no such solution has been devised for extreme response set. it is
generally accepted that extreme response set. because it reﬂects an exaggeration of the
true level of the respondent on the construct of interest instead of a complete distortion
(as is the case with acquiescence), is mm as serious a problem (Cronbach, 1950).

Need for approval. One of the earliest response sets to be identiﬁed was the
tendency to "Fake good" on personality tests (Ruch, 1942), interest batteries (Gehman.
1957), and even projective tests (Henry & Rotter, 1956). This tendency was often studied
but little understood until the work of Crowne and Marlowe (1964). These authors linked
the tendency to Fake good to an involuntary need for approval that led certain
respondents to display themselves in as favorable a light as possible. Their measure of
need for approval (the Crowne-Marlowe Social Desirability Scale), which has been
incorporated into many of the more prominent personality tests, such as the MMPI,
involves True-False questions that reﬂect large amounts of social desirability but which
should be answered in only one direction by virtually all respondents who are responding
honestly. As Crowne & Marlowe aptly describe such items, " First, they are "good",
culturally sanctioned things to say about oneself, and second, they are probably untrue of
most people (p.210.)." Also included are items which are undesirable but probably true.
An example of the former type would be, "Before voting, I thoroughly investigate the
qualiﬁcations of all the candidates." An example of the latter would be, "I sometimes
feel resentful when I don’t get my way." Although the ﬁrst item contains a most
admirable quality in a person, it is assumed to be false for the vast majority of
respondents who are responding honestly. Likewise, although the second item may not

be admirable, it is probably true of most people. In this way, the Crowne-Marlowe Scale

48
and others like it (e.g., MMPI F-Scale; Edwards, 1957; Hartshome & May, 1928) seek

to identify those people who are attempting to provide a proﬁle of themselves that is
"socially desirable" instead of accurate.

The question that remains is, What do we do once we have identiﬁed a person
who may be responding in this fashion? One approach has been to retest them in an
attempt to get better measures of the constructs of interest. This approach can be used
for both typical and Maximum tests. A second approach for typical tests has been to try
to correct scale scores based on Social Desirability scores.

Test Anxieg. Although test anxiety has existed as a concept in psychology for
at least forty years, it has been measured almost exclusively with the Mandler & Sarason
(1952) measure, which has withstood much scrutiny (see discussion above on Test
Anxiety). Nevertheless, Morris, Davis, & Hutchings (1981) developed a measure of test
anxiety which appears to improve upon the Mandler & Sarason (1952) measure by
tapping both the cognitive and emotional aspects of test anxiety. When test anxiety is
identified as having an impact on a given individual’s test score, several methods for
decreasing the anxiety can be employed. For example, test anxiety has been shown to
decrease as a function of instructional method (Tobias, 1979), study habits training (
Naveh-Benjamin et al., 1987), feedback (Campeau, 1968), and training in positive
affective responses (Watson & Clark, 1984).

Cog_n_itive controls. Because there are several different cognitive controls that have
been identiﬁed in the literature, there are dozens of cognitive control measures. The
present study focusses solely on ﬁeld articulation. For this reason, only measures of ﬁeld

articulation will be considered. The two most commonly used measures of ﬁeld

49

articulation are the Embedded Figures Test and the Rod and Frame test. Both tests
involve the identiﬁcation of a ﬁgure of some kind that is embedded in a larger visual
context. Thus, the high ﬁeld articulation individual can sort through the irrelevant,
contextual features and identify the ﬁgure. The low ﬁeld articulation individual ( or ﬁeld
dependent individual) has difﬁculty separating ﬁgure from context.

One additional measure of ﬁeld articulation was discussed in Broverman et a1.
(1968). These authors explained sex differences in ﬁeld articulation with certain
neurological differences that are, in turn, caused by hormonal differences between the
sexes. Although these neurological differences could, perhaps, be used as measures of
ﬁeld articulation, there has been no such attempt reported in the literature.

Although there is a small amount of research which suggests that ﬁeld articulation
does respond to training (Klein, 1967), it is difﬁcult to assess the extent to which such
training would really be helpful in sorting these effects out of scores on tests designed to
measure other constructs. One option would be to partial out scores on tests such as the
Embedded Figures from scores on tests of interest. Another option would be to eliminate
items that are likely to contain a ﬁeld articulation component, such as word problems.
This second option, however, involves the interaction between ﬁeld articulation and item
characteristics, and will therefore be dealt with in the section on Type I inappropriateness.

Response Biasﬂjest wiseness. The distinction made earlier between response bias
and test wiseness involves the rationale behind one’s response strategy. Although the
results of such a rationale (or lack of) can sometimes be identiﬁed through analyses
similar to those used to identify extreme response set (of, Fagley, 1987; Lawrence, 1957;

Gaier et al., 1953), the rationale has been identiﬁed only with measures that are external

50

to the test of interest. One such measure has been that of Gibb (1964). This measure
simply asks a respondent about the test strategies that he/she uses when taking a test, such
as time-using strategies, error-avoidance strategies, guessing strategies, deductive
reasoning strategies, estimation of instructor intent, and cue usage. Sarnacki (1979) used
this measure to identify individual differences with respect to many of these strategies.

Carelessness. There are a variety of carelessness measures, but most of them have
a similar form. Such measures contain items which, in one way or another, can be
considered nonsense for the respondent. The nonsense content suggests that every
respondent who is paying attention should respond in a particular way. For example,
items from the MMPI K-scale or the Comrey Validity Check Scale might ask the
question, "Have you ever been to the movies? Yes, Not sure, No" (Nonrandom Response
Scale, Hough etal., 1990), to which, it is assumed, anyone who is paying attention should
respond Yes. Such measures should "catch" any respondent who is responding carelessly
regardless of the reason for the carelessness, be it lack of motivation, miscoding of
responses, misunderstanding of the question, etc.

Carelessness, insofar as it is a function of the motivation of the test taker, can also
be manipulated indirectly by manipulating motivation. The higher the motivation of the
respondent, the less carelessness the respondent will exhibit.

Section sum. In this section, measures of the sources of Type P
inapprOpriateness were brieﬂy reviewed. These measures fall into two categories. The
ﬁrst category consists of those measures that are separate from the tests or items of

interest. Examples are the Crowne-Marlowe Social Desirability Scale and the "Lie" scales

51
of the MMPL For the most part, the measures of Need for approval, test anxiety, ﬁeld

articulation, test wiseness, and carelessness fall into this category.

The second category consists of those measures that result from reanalysis of data
from the tests or items of interest. For example, extreme response set is typically
measured by comparing the position of one’s item responses to a chance model. There
is no separate measure involved. For the most part, the measures of Acquiescence/denial,
extreme response set/central tendency, and response bias fall into this category.
Measures of Tm I inappropriateness

Because Type I sources of inappropriateness are interactions, they are more subtle
than Type P sources and, therefore, more difﬁcult to detect than Type P sources. As a
result, the measures of Type I inappropriateness must also be more subtle, or at least
more complex. The measures must be able to detect changes in the effects of personal
characteristics on item responses that are due to changes in item characteristics. No
measures with these properties have been recognized, but such measures do exist, they
simply haven’t been recognized.

These measures can be divided into two groups: those based on Item Response
Theory and those based directly upon the pattern of right and wrong answers (Harnisch
& Linn, 1981). These groups can be further divided into those for which some attempt
to standardize has been made and those for which no such attempt has been made. More
accurately, it can be said that both [RT-based and non-IRT based indices vary with
respect to the extent to which they have been standardized relative to the total score (or

theta) of the respondent. The meanings of these groupings are discussed in the sections

52

that follow. For more thorough reviews, see Hamisch & Linn (1981), Rudner (1983), and

Birenbaum (1985).

Non-[RT based indices of Type I inappropriateness (unstandardized). One

example of an unstandardized, non-IRT based index of Type I inappropriateness is Sato’s

Caution Index (1975). The formula for Sato’s index is

“L J
2(1—upnr X uvn’

 

In] 1'31.”
C ..
‘ J
"1 [231":
2224-71 —
y-r

where

i = 1,2,...I, indexes the examinee

j = 1,2,...I, indexes the item

uij = 1 if examinee i answers item j correctly and 0 if examinee i answers
item j incorrectly

n1L = total correct for the ith examinee

n]. = total number of correct responses to the jth item

The name of the index comes from the idea that a large value indicates an unusual
response pattern and, therefore, that caution should be used in interpreting the total score
of this respondent (Harnisch & Linn, 1981). There are other such non-IRT based indices
of Type I inappropriateness (e.g., the agreement/disagreement indices and the
dependability index of Kane & Brennan, 1980; van der Flier’s U (van der Flier, 1977) and

its equivalent, the Nonconformity index of Tatsuoka & Tatsuoka (1980)). but they are

53
generally highly correlated with one another (Hamisch & Linn, 1981; Rudner, 1983) and

the rationale is similar for all of them. Essentially, these indices answer the question, To
what extent do the item responses of a given respondent conform to the item difﬁculties
(as calculated from the total sample of examinees)? In terms of Type I inappropriateness,
these indices answer the question, To what extent is there something about this respondent
that renders the item difﬁculties invalid for this respondent? In other words, To what
extent is there an interaction between the personal characteristics of the respondent and
characteristics of the items? This question applies to both Maximum and typical tests.
The only difference is that the notion of item difﬁculty in Maximum tests should be
replaced in typical tests with some measure of the percentage of the sample endorsing a
given response option.

Another way of describing the logic of these indices (as well as the IRT-based
indices) is in terms of Guttman vectors. A Guttman vector is simply a vector of zeroes
and ones in which all of the ones precede all of the zeroes. If a respondent were to
respond to items in a way that matched perfectly with the difﬁculties of the items (and
if there were no possibility of guessing), then the responses of that respondent, when
ordered in terms of item difﬁculty, would form a perfect Guttman vector. The idea is that
the respondent answers all items at or below a certain difﬁculty level correctly. At some
point, however, the difﬁculty becomes too great for that respondent, and all items above
that level of difﬁculty are answered incorrectly. If this were the case, then this
respondent should receive a perfect score on the appropriateness index.

One reason for a departure from this perfect Guttman vector is that a respondent

is guessing some items correctly. In a four Option, multiple choice test, we would expect

54

a respondent to guess correctly 25% of the items that are too difﬁcult for them. A second
reason for a departure from a perfect Guttman vector is an interaction between personal
and item characteristics. Consider the following. Item difﬁculties are calculated on an
entire sample of scores. If the responses of a given respondent do not conform to those
difﬁculties (for reasons other than guessing), then there is some characteristic of that
individual respondent that is giving that person an advantage over the group on some
items and/or a disadvantage over the group on other items, with the result being that the
person answers correctly some items that should be too difﬁcult for that person while
answering incorrectly some of the easier items. The result is a vector of item responses

(ordered by difﬁculty) like the following:

11111111100001111010

On the one hand, it would appear that the items became too difﬁth for this
respondent after the ninth item in this order. On the other hand, this respondent did very
well on the last seven items, items that the sample on which the difﬁculties were based
found to be most difﬁcult. There are two possible explanations (other than guessing).
The ﬁrst is that the content of the items beyond the ninth item was too difﬁcult for this
respondent, but something about this respondent (e.g., possession of a cheat sheet, a quick
view to a neighbor’s paper) gave him/her an advantage over the rest of the sample on the
last seven items. The second explanation is that this respondent is actually of very high
ability (or whatever construct is supposed to be measured with these items) but was at a

disadvantage on the items in the middle of this row (perhaps because of coding alignment

55

errors, misinterpretation of items, etc.). Either way, there is an interaction between the
respondent and some characteristic of the items. The problem is that we have no way of
knowing which is the correct explanation. In other words, we have no way of knowing
the true standing of this respondent on the construct of interest, i.e., the test is
inapprOpriate as a measure of the construct of interest.

It is important to note that, if the advantage or disadvantage of the respondent does
not produce inconsistency (i.e., does not force a departure from a Guttman vector), then
none of these indices (IRT-based or otherwise, standardized or otherwise) will detect it.
However, if no inconsistency is produced, then there cannot be a person by item
interaction. Instead, there would be a simple main effect for the personal characteristic,
and the inappropriateness would be Type P inappropriateness and not Type I
inappropriateness.

Sato’s Caution index and others like it are designed to detect just this sort of
interaction. The main problem with these indices is that they are highly related to total
score. Speciﬁcally, respondents with very high or very low total scores are more likely
to be identiﬁed as aberrant because there is more room for aberrance. For example, a
respondent with a very low total score who happens to answer one or two of the more
difﬁcult items correctly will likely receive a large score on any index that is not well-
standardized simply because those one or two item responses are so inconsistent with the
total score of the respondent whereas the same situation applied to a respondent with an
average score will produce an index value that is not nearly as extreme. Since the goal
of inappropriateness measurement is to measure inappropriateness independent of total

score (or theta), this is seen as a disadvantage to poorly standardized measures.

56

IRT-bgsed indices of Type I inappropriateness (unstandardiped). There are many
[RT-based indices of Type I inappropriateness, and they are usually based either on the
Rasch model (one-parameter, 1960) or the three-parameter model (Hambleton & Cook,
1977). These indices address the question: To what extent do the responses of a given
respondent conform to the Item Characteristic Curves of the items in a test? In terms of
Type I inappropriateness: To what extent is there something about this respondent that
renders the Item Characteristic Curves invalid for this respondent? In other words, To
what extent is there an interaction between the personal characteristics of the respondent
and characteristics of the items? This question also applies to both Maximum and typical
tests. In the Rasch model approaches, the ICC’s differ only with respect to the difﬁculty
parameter, whereas in the three-paremeter model approaches, the ICC ’s differ with respect
to difﬁculty, discrimination, and the pseudo-guessing parameter.

An example of an index based on the Rasch model is the unweighted total ﬁt
mean square (U I) discussed in Wright and Panchapakesan (1969). The formula for U1

is

P 1-P
U1,-" ”(N V)

 

where i indexes the examinee, j indexes the N items, Pij is the probability of a

correct response predicted by the Rasch model, and uij is the observed item response.

57

This is essentially a measure of the average discrepancy between the observed responses
of a given examinee and the responses predicted by the model. The larger the
discrepancy is, the more caution should be used in interpreting the total score, i.e., the
greater the degree of inappropriateness.

An example of an index of Type I inappropriateness based on the 3-parameter

model is the 10 index described by Levine and Rubin (197 9). The formula for 10 is

N
WEI] PHI 4’9“”)
1-1

where Pi]. is the probability of a correct response based on the three parameter model and
uij is the observed response.

This is the log of the compound probability of the observed response pattern for
a maximum likelihood estimate of ability (Rudner, 1983). The rationale for this index
is similar to that of the U1 index: 10 is a measure of the discrepancy between the observed
responses and the responses predicted by an IRT-model, speciﬁcally, the three-parameter
model. 10 is perhaps the most widely cited index of Type I inappropriateness.

The problem with these two indices, as with the non-[RT based indices discussed
above, is that they are poorly standardized, that is, they are highly related with total score
(Rudner, 1983; Birenbaum, 1985). The solution is, of course, to attempt to develop
indices that are well-standardized and, therefore, relatively unrelated to total score. The

next two sections are devoted to just such indices.

58

Standardized non-IRT based indices of Type I inappropriateness. One example

 

of a standardized, non-IRT based index is the personal biserial of Donlon and Fischer
(1968). Although this index has been shown to be related to total score, it is useful as
an illustration of the meaning of standardization. The personal biserial coefﬁcient is
simply the biserial correlation between the dichotomously scored item responses of a
given respondent and the difﬁculties of those items. The reason I claim that this index
is standardized is that it is the Mal correlation. Since the biserial correlation is
insensitive to the variances of the variables being correlated, the total score of a given
respondent, which is based on the proportion of correct responses given by the
respondent, should have little effect on the personal biserial (as opposed to the point
biserial correlation between responses and difﬁculties).

As was mentioned above, the personal biserial has been found to be highly related
to total score, which means that it is not well standardized. A better example of a
standardized, non-IRT based index is the Modiﬁed Caution Index (MCI, Hamisch & Linn,

1981).

In‘

I
=20— '- 2 Iu
.C,‘ 91-1 WT? 7-»th a":

if +2
- n - . :n
131. 1 war». I

where the symbols are the same as those in Sato’s original index (Equation 1)

59
This is simply Sato’s Caution Index (Sato, 1975) modiﬁed to yield a lower bound

of O and an upper bound of 1, thus eliminating the extreme scores that can be obtained
on the caution index for very high scoring examinees who miss a single very easy item
or for very low scoring examinees who answer correctly a single very difﬁcult item. The
MCI index has been found to have little or no relationship with total score (Harnisch &
Linn, 1981; Rudner, 1983). It is, therefore, considered to be a well-standardized index.

Standardized IRT-baped indices of Type I inappropriateness.

 

These are by far the most commonly used and studied statistical indices of
appropriateness and, therefore, the most common indices of Type I inappropriateness.
Two examples of such indices are the standardized extended caution index of Tatsuoka
and Tatsuoka (1982) and the standardized 10 index (1,) of Drasgow, Levine, and Williams
(1985). Since these two have been found to be highly correlated (Birenbaum, 1985), and
since 11 has been applied to a wider variety of situations (e. g., Drasgow, Levine, &
McLaughlin, 1991; Drasgow, Levine, McLaughlin, Williams, & Candell, 1989), the 1,
index will be the focus of the present paper.

The formula for 11 is

,3 Irma
" [Var(l.)]

where

60

E(IQ=§[P,(9)1nPi(6) +[1-P,(6)]rn[1 -P,(e)]]

and

P,(9)
1 -P,(é)

 

Vera.) =§P.(e)t1- .(en In

12 is approximately normally distributed with a mean of 0 and a standard deviation
of l. A negative L indicates inconsistency of the pattern of responses. The literature on
L has focussed exclusively on the negative form of the index, with an 12 value of -l.65
(i.e., the score from the standard normal distribution corresponding to a level of
signiﬁcance of .05 for a one-tailed test) indicating an aberrant response pattern. A
positive 12 indicates hyperconsistency, or a pattern of responses that ﬁts the IRT model
so well that it is suspicious. Positive L’s have received virtually no attention in the
appropriateness literature, and their meaning is largely unclear.

The 11 index is a measure of goodness of ﬁt of an IRT model to a particular
response pattern. In other words, 12 measures the extent to which a given response pattern
is determined by factors other than ability (or the noncognitive equivalent) and the
parameters of the three-paremeter model.

The study of appropriateness measurement with 12 has been almost completely

statistical. There has been virtually no attempt to assess the construct validity of 11 or any

61

of the measures of Type I inappropriateness. It is known that L detects departure from
the IRT model, and there have been numerous suggestions as to the possible causes of
such a departure (e.g., cheating, coding errors, test anxiety), but no attempt has been made
to model these causes as determinants of L. We know only that these indices measure
the consistency of response patterns relative to item characteristics. As Reise (1990)
points out, however, these indices tell us pf. response inconsistency, not 1111 the responses
are inconsistent. As a result, we have no clear idea of what L and other similar indices
are measuring. I maintain that these indices are reﬂective of person by item interactions
and that viewing inappropriateness in this way will lead to a greater understanding of the
construct that one intends to measure as well as the nature of inappropriateness.

I have offered various sources of Type I inappropriateness and rationale for their
effects on item responses vis a vis appropriateness, and suggest further that it is precisely
these sources that are captured by L. In this way, I offer an assessment of the construct

validity of L. Speciﬁcally, I suggest the following extensions of my earlier propositions:

Proposition 1A - 8A - The L index becomes more extreme as the
interaction between characteristics of the respondent (e. g., test anxiety, test
wiseness, etc.) and characteristics of the items (e. g., ambiguity of meaning,
complexity of response options, etc.) becomes more pronounced.
Speciﬁcally, for all of the interactions involving respondent characteristics
other than omissiveness, L becomes more extreme in the negative direction

as the interaction effect becomes stronger. For the interactions that do

62

involve omissiveness, L becomes more extreme in the positive direction as

the interaction becomes stronger.

The interactions involving omissiveness are expected to produce positive L values
because they are expected to produce hyperconsistency. Those respondents high in
orrrissiveness are expected to omit those items that are too difﬁcult for them, which means
that they have no chance of answering them correctly, which in turn would produce a
near-perfect Guttman vector (i.e., a hyperconsistent response pattern).

To the extent that this set of propositions is borne out, the construct validity of the
L index and other indices like it will be more ﬁrmly established. Also relevant to the
issue of construct validity is the fact that L is not hypothesized to detect any of the
sources of Type P inappropriateness. Since the sources of Type P inappropriateness alone
_d_o_pg§ lead to inconsistency of response patterns, they should not be detected by L except
insofar as they are related to their respective interactions. This suggests a ﬁnal model of

appropriateness as measured by L.

63

 

454k

:1

 

 

 

AL

J .3 3:32.508 2: we 35.: < .2 2:me

 

 

.53 .500 SN: hz<>mam§= 030—. .5
P5393200 impm .o

20—bm0m 2m: .m

Empw owwaniwn—O .v

2w 5 wan—wonthzowz .o
>:xw.E200 20:35 meOQMm: .N
#25200 2w: ".0 >.:=0.02< .—

m0=mEmho<m<20 2w: «3po

 

 

2.430735

 

02220033:

 

 

 

 

mmw2w>_mm_20 .o
wmwzmmw EC<0 .N
mmewm_>> bmmthm wm20dmw: .0

zQ»<.50:.c< 04W“. .m
>pw.xz< hwmp .c

hum meOmmwﬁ waK—xw .o
._<>0m&< m0... ow! .u

._<.zw 0602m0mm500< .—
mo. mewpogzo 4(20wzwm mDOwZéhxw

 

 

 

BEE; do
52528

 

 

64
Overall summg

Sources of test inappropriateness and research on these sources were discussed.
Two types of sources were identiﬁed: those involving main effects for respondent
characteristics on item responses (Type P) and those involving interactions between
respondent characteristics and item characteristics (Type 1). Measures of these sources
were also discussed. Sources of inappropriateness were discussed in terms of both
Maximum and typical tests. Table 1 summarizes the sources of inappropriateness that

were discussed.

5
6

0088080830
80: 050088
N .008 280

88.83088
N 30080 00.8

80. 0889000 8
88: .8 .5088

x 80 00.00800

0880 .8 .80
308 x 9?:

 

H mm>H

E8000 :8:
#8885 0:0 052,800
000380 0200586

0880
9080005 330%
@0389 08:00.80

E 30090
00 0800800 038088

080 808

8800
2880080 0300088

1005880013.:I0

 

8.808% m 05

a

 

8:80 00
0008080820 3800806
X 00003008 300

008888830 E80

09.283088
N .0858 008

83 0889000 08

$5088
80 338830 X and}.

280385..” gm El

E8000 880
0588.5 05 88,800
80303 00805080

0.88 88
80800000 330%

8823a 0030003
5 88000
.8 «.3088 038088

mauaono
0:830 08885

.00 .850 06 .8 0:0
00 0080800 038088

185t8008n0.

a. e. .o. 0.0. 0. 81m. am

4%

 

98 898882 son 8..“ 800880 2 an:

88303.
2.08

.882 080

88008

8820088300
00009000 080bxm

8889‘ 8.0 000 Z

3000
\00=00003_00<

100088

NUMDOm

 

H 050.0.

.8820 E0: . 83880808 88

 

839-8880 88 888 2 083-8880 88 82.80 9
6 X 0883888 080888 .8 M02 X 8883888 08098.— .8 #08 8883885
6
888.0 :8: 85880808 88
88-8880 032-8880 80882
X 88:60:. 88:08.8 80.080 X 888608 mama—0988 88080 8888889
8808
88mm? 9 E "8:083 :0 085 8 8808 08:00.? 8 8288
8 55080088 0089 880888 .8 002380088 :38 :0 .0080 80:08? 88h.
X 80:85 .8 88080» X «8:08.303 880nt h8 88088 \83 8:088”—
888:0.810108550. 183880018. .:.0..
30808084“
H m m X H 8.880% m an. €08088mw H mg 8.080% m an.
308d EDEHXSZ mUMDOm

P800 ~ 035.

67

The present study

The purpose of the present study was to test a part of this model. Speciﬁcally,
I examined the effCCts on knowledge test scores and test inappropriateness as measured
by l, of test anxiety, math anxiety (which can be viewed as a speciﬁc application of test
anxiety. See p.15 for the reference to "number anxiety"), conscientiousness, carelessness.
difﬁculty-based item characteristics such as positive negative wording, open/closed stem
format, and response option complexity, and the interaction between each of the four
respondent characteristics mentioned above and difﬁculty-based item characteristics. The
details of the present study are described below. The specific hypotheses tested in this
study are as follows.

Hypothesis 1 - Difﬁculty-based item characteristics have a deleterious effect on

knowledge test performance.

Hypothesis ; - Respondents who are higher in conscientiousness have higher

knowledge test scores than do respondents who are lower in conscientiousness.

Hypothesis 3 - Respondents who are higher in carelessness have lower knowledge

test scores than do respondents who are lower in carelessness.

Hypothesis 4 - Respondents who are higher in math anxiety have lower knowledge

test scores than do respondents who are lower in math anxiety

Hypothesis 5 - Respondents who are higher in test anxiety have lower knowledge

test scores than do respondents who are lower in test anxiety

Hypothesis 6 - The effect of conscientiousness on knowledge test scores is

moderated by the extent to which the knowledge test items contain difficulty:

based item characteristics such that the knowledge test responses of those

68

respondents higher in conscientiousness are less adversely affected by difﬁculty-
based item characteristics than are those of respondents lower in
conscientiousness.

Hypothesis 7 - The effect of carelessness on knowledge test scores is moderated
by the extent to which the knowledge test items contain difﬁculty-based item
characteristics such that the knowledge test responses of those respondents higher
in carelessness are more adversely affected by difﬁculty-based item characteristics
than are those of respondents lower in carelessness.

Hypothesis 8 - The effect of math anxiety on knowledge test scores is moderated
by the extent to which the knowledge test items contain difﬁculty-based item
characteristics such that the knowledge test responses of those respondents higher
in math anxiety are more adversely affected by difﬁculty-based item
characteristics than are those of respondents lower in math anxiety.

vaothesis 9 - The effect of test anxiety on knowledge test scores is moderated
by the extent to which the knowledge test items contain difﬁculty-based item
characteristics such that the knowledge test responses of those respondents higher
in test anxiety are more adversely affected by difficulty-based item characteristics
than are those of respondents lower in test anxiety.

Hymthesrs 10 - The effect of conscientiousness on 11 values is moderated by the
extent to which the knowledge test items contain difﬁculty-based item
characteristics such that the 11 values will be negatively related to the presence of
difﬁculty-based item characteristics in test items for those respondents lower in

conscientiousness but relatively unrelated for those of respondents higher in

69

conscientiousness. In other words. difﬁculty-based item characteristics will lead
to inappropriateness for those respondents low in conscientiousness but n0t for
those respondents high in conscientiousness.

Hypothesis ll - The effect of carelessness on 12 values is moderated by the extent
to which the knowledge tesr items contain difﬁculty-based item characteristics
such that 12 values will be negatively related to the extent to which difﬁculty-based
item characteristics are present in test items for those respondents higher in
carelessness but relatively unrelated for those of respondents lower in carelessness.
In other words, difﬁculty-based item characteristics will lead to inappropriateness
for those respondents high in carelessness but not for those respondents low in
carelessness.

Hypothesis 12 - The effect of math anxiety on 12 values is moderated by the extent
to which the knowledge test items contain difﬁculty-based item characteristics
such that 11 values will be negatively related to the extent to which difficulty—based
item characteristics are present in test items for those respondents higher in math
anxiety but relatively unrelated for those of respondents lower in math anxiety.
In other words, difﬁculty-based item characteristics will lead to inappropriateness
for those respondents high in math anxiety but not for those respondents low in
math anxiety.

Hypothesis 13 - The effect of test anxiety on 12 values is moderated by the extent
to which the knowledge test items contain difﬁculty-based item characteristics
such that 1, values will be negatively related to the extent to which difﬁculty-based

item characteristics are present in test items for those respondents higher in test

7O

anxiety but relatively unrelated for those of respondents lower in test anxiety. In
Other words, difﬁculty-based item characteristics will lead to inappropriateness for
those respondents high in test anxiety but not for those respondents low in test

anxiety.

Method

Sample

Subjects were 165 undergraduates from a large, midwestern university. 67% of
the subjects were women. No other demographic data were collected. They were
recruited from Introductory Statistics classes towards the end of the semester so that they
had had an opportunity to learn most of the course material. Subjects were given extra
credit for participation. This sample, while convenient, was also quite appropriate for the
variables examined in this study. The general focus of this study was testing, with
emphasis on the relationships among respondent characteristics, item characteristics, and
construct validity of items. Since testing is a common part of most university educations,
a sample of college students was ideal for the examination of factors that affect testing.
261311.

The present study used a repeated measures regression design with four between
subjects factors, one within subjects factor, and two dependent variables. The between-
subjects predictor variables were Conscientiousness, Math Anxiety, Test Anxiety, and
Carelessness. The within-subjects factor was Item Difﬁculty as determined by item
construction principles. The dependent variables were scores on tests of statistical

knowledge and the consistency of responses to items from those tests. Thus, eight

71

72

repeated measures regression analyses were performed, one for each combination of
between-subjects variable and dependent variable.
Measures

Conscientiousness.

Conscientiousness was assessed with four items from the twelve-item
conscientiousness scale of the NEO-PI personality inventory (Costa & McCrae, 1991).
These four items were the items in the scale which related directly to dependability (as
opposed to organizational skills and goal orientation; see Appendix A). The Cosra &
McCrae (1991) measure was used because it is one of the few questionnaire measures
designed speciﬁcally to assess conscientiousness as deﬁned by proponents of the Big Five
theory of personality (e. g., Digman, 1990). Internal consistency reliability for the four
items was estimated to be .68, suggesting that uniquenesses for the four items were
acceptable (Cortina, 1993).

Math Anxieg.

Math Anxiety was assessed with the 11-item Math Anxiety Questionnaire
developed by Wigﬁeld & Meece (1988), which in turn was based on a measure which
was originally developed by Richardson & Suinan (1972). This measure was used
because it taps both the emotional and cognitive components of anxiety. Internal
consistency reliability for the four items was estimated to be .85, suggesting that
uniquenesses for these items were also acceptable (Cortina, 1993).

Test Anxiety.

Test Anxiety was assessed with the 10 - item Test Anxiety Scale developed by

Morris, Davis, & Hutchings (1981). This measure improves upon earlier test anxiety

73
scales (e.g., Mandler & Sarason, 1952) which failed to tap both the emotional and

cognitive components of anxiety. Internal consistency reliability for the four items was
estimated to be .83, suggesting that uniquenesses for these items were also acceptable
(Cortina, 1993).

Carelessness.

Carelessness was assessed with a six-item scale constructed by the author. These
items were similar to the "nonsense" items included in many noncognitive tests in that
they were designed to produce a particular response from any respondent who pays
attention to (i.e., reads) the item. The unique aspect of the items that make up the
carelessness scale used in the present study is that they are not easily recognized as items
which tap carelessness. Typical carelessness items are absurdities which can be
recognized by respondents who are merely scanning items. Such identiﬁcation can have
a deleterious inﬂuence on test taking motivation. The items used in the present study
were statistical knowledge items that were answered correctly by all respondents during

all pretesting situations (details are described below). Items such as

Another word for the average is
a) mean
b) variance
c) standard deviation

d) range

74

should be answered correctly by any Introductory Statistics student who reads the
question, as was the case during pretesting. Any variability in responses to such items
should be due only to carelessness.

Mﬁcs knowledge test.

All subjects were administered the 75-item test of statistical knowledge contained
in Appendix B. The items on the statistical knowledge test were items typically found
on exams for Introductory Statistics classes. Items with three levels of content-irrelevant
difﬁculty were developed. 24 items were open—stemmed, negatively worded, contained
complex response options, and had complex stems (e. g., word problems), using the
definition of stem complexity from Zimmerman (1954). These were the "Difﬁcult" items.

The following is an example of one of the "difﬁcult" items:

Difﬁcult item - Suppose I know the number of times each Michigan resident has been
swindled by Gov. Engler (So, I have access to this population of scores). I then take
many different samples of 15 people each and calculate the mean for each sample. If the
mean of the means were 7.8 and the standard deviation of individual scores were 2.2, the
population mean and the standard error of the mean would not be

a. mu = 2.79, sigma = .57

b. mu = 7.8, sigma = 2.2

c. either a or b

d. all of the above

75
25 Moderately difﬁcult items were similar to the Difﬁcult items except that they

were positively worded and closed stemmed. The following is an example of one of the

"moderate" items:

Moderate item - For some strange and terrible reason, I am interested in knowing the
average number of white collar crimes committed per day by Sen. Bob Dole over the past
2000 days. In an attempt to estimate this value, I randomly choose twenty days from
these 2000, count the number of white collar crimes he committed on each of those 20
days, and get the average of those twenty numbers.

What is the statistic that I have used?

a. The average number of white collar crimes per day committed by Dole

over the past 2000 days.

b. The average number of white collar crimes committed per day by Dole

over the 20 days that I measured.

c. 2000

d. all of the above

26 Easy items were similar to the Moderately difficult items except that the stems
were noncomplex and there were no complex response options. The following is an
example of one of the "easy" items:

Easy item - Which of the following is an advantage of the mean as a measure of central
tendency?

a. It is greatly affected by extreme scores

76
b. It can be manipulated algebraically

c. It is not greatly affected by extreme scores

d. It is difﬁth to calculate

Participants’ scores on the items within each level of difﬁculty were collapsed to
form single variables for the regression analyses involving knowledge test scores
described below. Difﬁculty as deﬁned in this paragraph refers to aspects of the item
format that are thought to decrease the proportion of correct responses independently of
the examinees’ knowledge of the domain being assessed.

An attempt was made to equate items with respect to content difﬁculty (i.e.
construct relevant difﬁculty). The reason for this is that the item characteristics that are
of the most concern to test constructors are those over which they have direct control.
While most tests should and do vary with respect to content difﬁculty, other item
characteristics such as option complexity and stem complexity can and should be
controlled, especially if they foster aberrant response patterns. Items for the statistical
knowledge test were chosen from a pool of test items that had been administered to
undergraduates as items in actual tests. Speciﬁcally, 75 items which contained none of
the difﬁculty-inducing item characteristics mentioned earlier (e.g., complex response
options, negative wording, etc.) were chosen and distributed randomly into one of the
three groups. Inspection of the item difﬁculty values (percentage incorrect) calculated
from these previous testing situations showed that the average item difﬁculties (proportion
answering the item incorrectly) within the three groups were almost identical (.26 for

items which were to be used for the "easy" test, .23 for items which were to be used for

77

the "moderate" test, and .25 for the items which were to be used for the "difﬁcult" test),
suggesting that these three groups of items were virtually identical with respect to
content-relevant difﬁculty. Differences in test scores across difﬁculty levels as deﬁned
above can then be attributed to the manipulations of item characteristics. The measure
of statistical knowledge used was proportion of correct responses.

Response Consistency - Response consistency was assessed with the 11 index of
Drasgow et al. (1985). The 12 index requires the calculation of the item parameters of the
three-parameter IRT model. These parameters were calculated from the responses of
subjects to the 75 test items.

Procedure

Subjects were ﬁrst approached during their statistics classes and asked if they
would be willing to participate in the experiment. Those who agreed were asked to sign
up for a testing date as well.

The 75-item statistics knowledge test and the four tests measuring respondent
characteristics were administered to large groups of subjects at a time. The measures of
conscientiousness, math anxiety, and test anxiety were administered ﬁrst, followed by the
knowledge test. Two forms of the test were created. The two forms differed only in that
the order in which the items were presented was reversed. Within each form, item order
was random. The purpose of generating two forms was to allow an examination of order
effects. Neither knowledge test scores nor 11 values differed across the two forms.

The items measuring carelessness were embedded within the knowledge test.
Since all 75-items were administered to all subjects, all three levels of difﬁculty were

experienced by all subjects. In an attempt to increase motivation to respond carefully, the

78

test administrators explained to subjects that the test results would be used by the
Instructor to evaluate his own teaching performance. They were also told that the test
provided an opportunity to practice for upcoming tests in the class. Finally, $100 was
awarded to each of three of the top performers on the knowledge test.
Data Analysis

After establishing the unidirnensionality of the knowledge test, the responses of
subjects to the test were analyzed with the BILOG IRT computer program (Mislevy &
Bock, 1990). This analysis yields both ability estimates for respondents (9) and item
parameter estimates that are necessary to compute 12 as outlined above in the introduction.

Hypotheses were tested with repeated measures hierarchical regression (RMI-IR:
Cohen & Cohen, 1983; Hollenbeck, Ilgen, & Sego, in press). As there was no a_p_r3>;i_
rationale for investigating the effects of the four between-subjects predictors in
conjunction with one another, separate regression analyses were performed for each of the
four between-subjects variables (conscientiousness, math anxiety, test anxiety, and
carelessness) and each of the two criteria (percentage of knowledge items answered
correctly and 1,) for a total of eight regressions. In each regression, the dependent
variable (knowledge test scores or 1,) was regressed onto one of the between-subjects
factors and the within-subjects factor. The details of this procedure are described below.
It was expected that the interaction between item characteristics and each of the four
between-subjects factors would explain a significant portion of the relevant variance in
knowledge test scores above and beyond that explained by the main effects for the
predictor variables, and that insofar as these interactions were signiﬁcant, they would also

explain relevant variance in 1,.

Results
Tests measuring respondent chargctersitics
Table 2 contains means, standard deviations, intercorrelations, and intemal
consistency estimates for the Conscientiousness, Math Anxiety, Test Anxiety, and
Carelessness Scales. Means, standard deviations, and intercorrelations are presented for
1,. The values presented for the knowledge tests refer to the knowledge tests composed
of the items that remained after the initial BILOG analysis (see below for details of this

analysis).

79

80
Table 2

Descriptive stgtistics for all tests and L’s

 

lﬂL__..§__ s! .l

1. Conscien. 15.86 2.39 .68

2. Math Anx. 35.33 8.16 .85 -.05
3. Test Anx. 18.51 5.94 .83 -.16*
4. Careless. 5.32 .87 .28 .01
5. Easy Test .57 .16 .69 .01
6. Mod. Test .54 .17 .70 .11
7. Diff. Test .46 .16 .64 -.06
8. 12 (easy) .09 1.01 -.07
9. 12 (mod) .05 .92 .04
10. 12 (diff) -.06 1.02 .08

IN

.48*

-.O6

-.21*

-.22*

-.l7*

.08

-.03

.08

I03

-.04

-.12

-.1O

-.08

-.02

-.05

.03

A. .5
.37*

.36* .57*
.44* .51*
415 410
a14 .04
mos 406

Table 2 cont’d

7. Diff. Test
8. 12 (easy)
9. 12 (mod)

10. 12 (diff)

Q l
55*

-.26* -.01
.04 -.05
.03 -.17*

81

loo
1&0

-.O7

-.14 .10

 

* - p<.05

82

There are several points to be made with respect to this table. Regarding the
measures of respondent characteristics. there was reasonable variability on the
Conscientiousness, Math Anxiety, and Test Anxiety scales. Also. the means for these
scales were comparable to those reported in previous literature (e.g., Morris et al., 1981;
Wigﬁeld & Meece, 1988; Costa & McCrae, 1988). There was considerably less
variability in the Carelessness measure, but this is not surprising given the simple nature
of the questions in the scale.

Internal consistency estimates for the conscientiousness. math anxiety, and test
anxiety scales were adequate suggesting acceptable levels of item uniqueness (Cortina,
1993). Although the estimate for the conscientiousness scale in the present Study was
lower than those presented in previous research, this is not surprising given the fact that
the estimate in the present study was based on only four items.

Internal consistency for the Carelessness scale, however, was quite low (0t=.28).
Again, this is not surprising given the fact that a substantial portion of respondents
answered all of these items in the same way.

Table 2 also contains information about the statistical knowledge items and 11
values. This information is discussed below.

Statistical knowledge test

The "easy", "moderate", and "difﬁcult" tests were composed of 26, 25, and 24,
items respectively. Item means, standard deviations, and intercorrelations can be found
in Appendix C.

Before IRT analyses were performed, the dimensionality of the items was assessed

with a factor analysis of the interitem correlation matrix after that matrix was transformed

83

into a matrix of polychoric correlations. Table 3 contains the eigenvalues and percentage

of variance accounted for all faCtors with eigenvalues greater than 1.

84
Table 3

actor analysis of knowledge test items

 

FACTOR EIGENVALUE PCT OF VAR

1 11.10568 14.8
2 4.06301 5.4
3 3.88617 5.2
4 3.41616 4.6
5 3.14234 4.2
6 3.04511 4.1
7 2.92488 3.9
8 2.78527 3.7
9 2.65622 3.5
10 2.49907 3.3
11 2.38604 3.2
12 2.20712 2.9
13 2.16898 2.9
14 2.03637 2.7
15 1.97130 ' 2.6
16 1.89015 2.5
17 1.75348 2.3
18 1.65575 2.2
19 1.61516 2.2
20 1.57926 2.1
21 1.52589 2.0
22 1.45656 1.9
23 1.36599 1.8
24 1.30923 1.7
25 1.25342 1.7
26 1.19881 1.6
27 1.16056 1.5
28 1.08787 1.5

29 1.01588 1.4

 

85

The conclusion to be drawn with respect to dimensionality depends on the criterion
that one uses. There are many factors with eigenvalues greater than 1. Given the range
of knowledge tapped by the test and the range of item characteristics, this is not
surprising. However, only one of the factors explains more than 5.4% of the test
variance. Also, the ﬁrst factor eigenvalue is almost three times the size of the next
largest eigenvalue (11.11 vs. 4.06; Hulin et al., 1983). Given the latter two facts, IRT
analysis was deemed appropriate.

Item parameter estimates for the 75 items and 0-parameter estimates for the
165 respondents were generated with BILOG (Mislevy & Bock, 1979). This analysis
suggested that ﬁve of the items (Nos. 25, 41, 46, 64, and 75) did not conform to the
three-parameter IRT model. X2 values and degrees of freedom for these items were
8436.2 (<ldf), 5.1 (4df), 7480.8 (<ldf), 9.3 (5df), and 13.5 (4df). These items were
then discarded, leaving 26 "easy" items, 23 "moderate" items, and 21 "difficult" items
to be reanalyzed with BILOG. Because there were different numbers of items in the
three tests, subsequent analyses involved test means instead of simple raw score
composites.

Item parameter estimates from this second BILOG analysis as well as
corresponding X2 values can be found in Appendix D. As expected, discrimination
and guessing parameter estimates were similar across the three types of test items
while difﬁculty parameter estimates were considerably higher for the "difﬁcult" items
than for the "easy" and "moderate" items (mean difﬁculty parameter estimates were
.91 for the "difﬁcult" items as opposed to .42 and .38 for the "easy" and "moderate"

items).

86

These three sets of items were then combined to form three knowledge scales.
As was mentioned above, mean scores across items for these tests were used instead
of simple sums to control for the different numbers of items in each knowledge scale.
Means, standard deviations, internal consistency estimates, and intercorrelations for
these three tests are presented in Table 2. The differences in means suggest that the
item characteristics manipulated in this study affected test scores in the manner
predicted, i.e., the items with the more difﬁcult formats were answered incorrectly
more often than were the items with the less difﬁcult formats.

Test scores were signiﬁcantly correlated with math anxiety and carelessness:
those respondents who were higher in math anxiety received lower scores on the
knowledge tests, and those respondents who answered more of the carelessness items
correctly received higher scores on the knowledge tests (as expected). The
correlations involving carelessness are particularly compelling given the relative lack
of variability on the carelessness measure.

Internal consistency estimates for the knowledge tests were marginal, but this is
not surprising given the range of content within each of the three tests.

.1.

The item and theta parameter estimates generated by the BILOG program were
used to compute 1, values for each respondent for each test. Means and standard
deviations for these l,’s can be found in Table 2. As can be seen, there are trivial
mean differences in 1, values across the three tests. As expected, 1, values, which are
standardized with respect to theta, had little or no relationship with knowledge test

scores. l,’s were, however, unrelated to any of the respondent characteristics,

87

suggesting that there is little or no main effect for these measures on appropriateness
as measured by 1,.
Test§ of Hypotheses

Hypotheses were tested using repeated measures regression (Cohen & Cohen,
1983; Gully, 1994). This procedure involves several steps. First, variance of the
criterion variable is partitioned into that attributable to between-subject differences and
that attributable to within-subject differences. In the case of the present study there
were two such criterion variables: percentage of correct responses and 1, values. Table
4 contains variance attributable to between- and within-subjects effects as well as

percentages for each of these values for each of the two criteria.

88
Table 4

Variance of dependent variables attributable to between- and within-subjects effects

 

Dependent Btwn. Ss m M Zo_gf_s2
Variable Variance Variance for Btwn. for W i
Knowledge

Test Scores .0196 .0093 67.8 32.2

1, .2916 .6684 30.4 69.6

 

89

The table shows that approximately two-thirds of the variance in knowledge
test scores was between-subjects variance while approximately two-thirds of the
variance in 1, was within-subjects variance. This suggests that between-subjects
predictors are more likely to explain the variability in test scores whereas within-
subjects predictors are more likely to explain the variability in 1,.

The second step in the procedure involved the regression of the variance
attributable to within-subjects differences onto all within-subjects predictors and
interactions involving only within-subject predictors. In all of the analyses for the
present study, there were two such within-subjects predictors. These two predictors
corresponded to the two dummy codes created to represent the three levels of
difﬁculty-based item characteristics. The F-tests for the within-subjects effects were
performed on P“, #obs-N-Pu degrees of freedom where P,” is the number of within-
subjects predictors entered in the ﬁrst step, Pu is the total number of within-subjects
predictors entered as of that step, #obs equals the number of total observations, and N
equals the total sample size. In those analyses involving percentage of correct
responses on the knowledge test, this ﬁrst step in the regressions was perforrrred with
2 and 330 degrees of freedom. In those analyses involving 1,, this ﬁrst step in the
regressions was performed with 2 and 328 degrees of freedom.

In the third step of the procedure, the variance attributable to between-subjects
differences is regressed on all between-subjects predictors and interactions involving
only between-subjects predictors. In all of the analyses for this study, there was one
such main effect and no between-subjects interactions. The degrees of freedom for

this step are P,, N-Pb where P, is the number of between-subjects predictors entered in

90

this step. In those analyses involving percentage of correct responses on the
knowledge test, this second step in the regressions was performed with 1 and 164
degrees of freedom. In those analyses involving 1,, this second step in the regressions
was performed with 1 and 163 degrees of freedom.

Finally, remaining within-subjects variance is regressed onto interactions
involving both within- and between-subjects predictors. In all analyses for this study,
there were two such interactions: one for each of the two dummy variables. The
formula for the degrees of freedom for this step is PW #obs-N-Pa where P,,,2 is the
number of predictors entered at this step and Pt2 is the total number of predictors
entered as of this step. In those analyses involving percentage of correct responses on
the knowledge test, this ﬁnal step in the regressions was performed with 2 and 328
degrees of freedom. In those analyses involving 1,, this second step in the regressions
was performed with 2 and 326 degrees of freedom.

All of the hypotheses in this study were tested with this procedure. As was
mentioned above, separate regression analyses were conducted for each of the four
between-subjects predictors in question and each of the two criteria. Although the fact
that the four between- subjects predictors were correlated with one another means that
the regressions were somewhat redundant, the separate analysis of these predictors
elirrrinated the need to tease apart the effects of predictors analyzed in conjunction.
Table 5 contains the correlations among all terms used in the regression analyses that

are described below.

Table 5

91

Intercorrelgtions among all terms used in regression analyses

 

'FERlA

CONS
(CAUKE
hdAbDK
'TAlCX
IthCﬂ)
[NDHF
LACIHQ
DCON
rrcmur
IDCMUR
RIND¥T
IDNDXT

DAEMADUK
[HEADDK

IJZ

FCWFTST‘

CONS

1.0000
.0126
«0558
«1650*
.0000
.0000
.1045*
.1045*
.1045*
.1045*
«0087
«0087
«0087
«0087
.0173
.0206

(CAdlE

.0126
LOOOO
«0641
«0457

.0000

.0000

.0013

.0013

.0013

.0013
«0100
«0100
«0100
«0100
«1141*

.3790*

LAADCK

«0558
«0641
LOOOO
.4827*
.0000
.0000
«0058
«0058
«0058
«0058
.1567*
.1567*
.1567*
.1567*
.0457
«1968*

'TAiCK

«1650*
«0457
.4827*
LOOOO
.0000
.0000
«0172
«0172
«0172
«0172
.0757
.0757
.0757
.0757
«0128
«0981*

IDNKDEV

.0000
.0000
.0000
.0000
1.0000
.5000
.9835*
«4917*
.9835*
«4917*
.9624*
«4812*
.9624*
«4812*
«0131
.0766

Table 5 cont’d
DDIF

CONS .0000
CARE .0000
MANXDOOO
TANX .0000
DMOD«5000*
DDEF 1.0000
MCON«4917*
DCON .9835*
MCAR-4917*
DCAR .9835*
MMAT«4812*
DMAT .9624*

MDANX«4812*

DFANX.9624*
LZ .0050

KNTST«2658*

MCON

.1045*
.0013
«0058
«0172
.9835*
«4917*
LOOOO
«4836*
L0000
«4836*
.9438*
«4733*
.9438*
«4733*
«0088
.0870

92

DCON

.1045*
.0013
«0058
«0172
«4917*
.9835*
«4836*
LOOOO
«4836*
L0000
«4733*
.9438*
«4733*
.9438*
.0038
«2508*

MCAR

.1045*
.0013
«0058
«0172
.9835*
«4917*
LOOOO
«4836*
L0000
.4836*
.9438*
«4733*
.9438*
«4733*
«0088
.0870

DCAR

.1045*
.0013
«0058
«0172
«4917*
.9835*
«4836*
1.0000
.4836*
L0000
«4733*
.9438*
«4733*
.9438*
.0038
«2508*

. MMAT

«009
«010
.157*
.077
.962*
«481*
.944*
«473*
.945*
«473*
L000
«463*
L000
«463*
«019
.038

Table 5 cont’d

DMAT MDANX DFANX LZ KWTST

CONS «0087 «0087 «0087 .0173 .0206
CARE «0100 «0100 «0100 «1141* 3790*
MANX .1567* .1567* .1567* .0457 «1968*
TANX .0757 .0757 .0757 «0128 «0981*
DMOD «4812* 9624* «4812* «0131 .0766
DDIF 9624* «4812* 9624* «0050 «2489*
MCON «4733* .9438* «4733* «0088 .0870
DCON .9438* «4733* .9438* .0038 «2508*
MCAR «4733* .9438* «4733* «0088 .0870
DCAR .9438* «4733* .9438* .0038 «2508*
MMAT «463 1* 1.0000 «463 1* «0176 .0380
DMAT 1.0000 «4631* 1.0000 .0087 «2649*
MDANX «463 1* 1.0000 «4631* «0176 .0380
DFANX 1.0000 «463 1* 1.0000 .0087 «2649*
LZ .0087 «0176 .0087 1.0000 «0688
KWTST «2649* .0380 «2649* «0688 1.0000

 

1 - CONS=Conscientiousness;CARE=Carefu1ness;MANX=Math Anxiety;TANX=Test
Anxiety;DMOD=Dummy for "moderate" test vs. "easy" tesuDDIF=Dummy for
"difficult" test vs. "easy" test;MCON=Dummod * Cons interaction;DCON=Dumdif *
Cons interaction:MCAR=Dummod * Care interaction;DCAR=Dumdif * Care.
interaction;Ml\/IAT=Dummod * Mathanx interaction;DMAT=Dumdif * Mathanx
interaction;MDANX=Dummod * Testanx interaction:DFANX=Dumdif * Testanx
interaction;KWTST=Knowledge test

* - p<.05

94

Difﬁculty-based item characteristics and knowledge test scores

The ﬁrst step in all of these analyses involved the entering of the two dummy
variables corresponding to the three levels of difﬁculty-based item characteristics. The
regression of percentage correct for the knowledge tests onto these two dummy
variables yielded an R2 value of .065. In repeated measures regression, however, only
the within-subjects variance is relevant for within-subjects predictors. The R2 value
associated with the regression of the within-subjects variance in knowledge test scores
onto the two within-subjects dummy variables was .20 (Fa323=41~5v p_<.01), suggesting
that difﬁculty-based item characteristics had a substantial effect on knowledge test
performance and, thus, providing support for Hypothesis 1.
Congcientiprgsnessirnd knowledge test scores

The results with respect to the relationship between conscientiousness and

knowledge test scores can also be found in Table 6.

95

 

 

 

 

Eve N; 62 .N as. 53225 . m
V m2 ._ co. 333222850 N
:39 n; Mam .N . om. 65:28.25 :6: _
m «a 85.25% Essefm . mama new

mm .555 g 62%;

 

 

waiteﬂazo Eu: _Em 3053:5628 2.5 an. cute—Bead :0 30:8 cums—581% Co :BmmBMom

c 035.

96

As can be seen, conscientiousness had little effect on knowledge test scores.
The conscientiousness by item characteristic interaction, however, was signiﬁcant
(F1326 =3.12, p<.05). This interaction is plorted in Figure 11. The plot was created by
creating a regression equation for percentage correct with six components: one
constant and ﬁve products. The ﬁve products represented each of the ﬁve terms (i.e.,
one term representing conscientiousness, two dummy variables representing difﬁculty,
and two interactions between conscientiousness and the dummy variables) used in the
regressions times their respective regression weights. When item difﬁculty was equal
to 1 (ie., the " easy" test), the four terms involving dummy variables reduced to zero,
leaving only the constant and the term containing the conscientiousness score. When
item difﬁculty was equal to 2 (i.e., the " moderate" test), the terms involving the
dummy variable corresponding to the "difﬁcult" test reduced to zero. When item
difﬁculty was equal to 3 (i.e., the "difﬁcult" test), the terms involving the dummy

variable corresponding to the "moderate" test reduced to zero.

Percentage Correct

0.6

97

 

0.5
0.4 ' i
0.3

0'2 i.‘ ‘.

 

   
 
 
 
 
 
 
  

 

 

*Cons =13
+Cons =18

 

 

 

Item Difficulty

Figure 11. Plot of the effect on test scores of the conscientiousness by item

characteristics interaction

 

98

Figure 11 shows that the interaction can be seen most clearly in the difference
in the relationship between conscientiousness and percentage correct between the
"moderate" and "difﬁcult" tests. The plot shows that the interaction is not of the form
expected. The knowledge test scores of respondents high in conscientiousness were
more adversely affected by difﬁcult item characteristics than were the scores of
respondents low in Conscientiousness. The relationship between item difﬁculty and
percentage correct was negative for both levels of conscientiousness. These analyses
fail to provide support 'for analyses 2 and 6.

The results with respect to the relationship between carelessness and knowledge

test scores can be found in Table 7.

99

 

_V
:35 5.9.
:39 n;

m

can .N ace.
m2 4 _N.
mam .N . on.

 

Mm 35298 «m 3:335 um

 

mm .555 mm £3

53082:. . m
32.32080 N
85:82:20 Eu: _

 

cocoEm moi

 

6325

 

 

85:32:23 Eu: new 3058.243 See an. omen—305. so 8230 ousgeo Co :232 um

 

N. 2%...

100

Carelessness was signiﬁcantly related to knowledge test scores, thus providing
support for Hypothesis 3. The R2 associated with the regression of between-subjects
variance onto carelessness scores was .21 (131.163=43.81, p<.01). Those respondents
who were higher in carelessness had lower knowledge test scores than did those
respondents who were lower in carelessness. This effect, however, was not moderated
by item characteristics, thus failing to provide support for Hypothesis 7. The R2 value
associated with this interaction was .0023.

Math anxiety and knowledge test scores
The results with respect to the relationship between math anxiety and

knowledge test scores can be found in Table 8.

101

 

_v
save as
A_c6aamu_v

m.

cwm.~ _oc.
mc_._ co.
mam.~ . cm.

to 35230 Nw 35298 um

 

 

mm 5.5 .mmlc.._.\3.

5:022:—

32xc< 552

8358820 Eu:

 

3.22m

033:5

mam

 

 

 

8.5589155 Eu: 955238.“ 5a:— 95 “we. 3334.315 82.3 03.583 he coﬁmoaom

w 033—.

102

Math anxiety was signiﬁcantly related to knowledge test scores. The R2
associated with the regression of between-subjects variance onto math anxiety scores
was .06 (F1‘163=9.87, p<.01). Those respondents who were higher in math anxiety had
lower knowledge test scores than did those respondents who were lower in math
anxiety as hypothesized (Hypothesis 4). This effect was not rmderated by item
characteristics, thus, Hypothesis 8 was not supported. The R2 value associated with
this interaction was .0023.

Test Anxiety Mnowledge test scores
The results with respect to the relationship between test anxiety and knowledge

test scores can be found in Table 9.

103

 

 

 

 

 

_v can .N :5. 522:2:— . m

acre “2.. a: ._ 5. 36:5. ea. N
:39 n: mam :6. . om. “2.3.6.6220 56: _
m. mm coca—axe «m 35298 «m 38:5 mam

mm .555 g 6329

 

 

 

motwtosmﬁmso 50: 62a >aomxca “we 09:0 “woe unto—BOCJ :0 HOOP—Co D 3:09—an mo :Ommmvhﬂvm

a 2an

104

Neither test anxiety nor its interaction with item characteristics signiﬁcantly
predicted knowledge test scores. Test anxiety explained only 1.4% of the between
subjects variance, and its interaction with item characteristics explained less than 1%
of the within-subjects variance. Thus, there was no support for Hypotheses 5 and 9.
Difﬁculty-based item characteristics and 1,

As with the analyses of knowledge test scores, the ﬁrSt step in all of the
analyses of 1, involved the entering of the two dummy variables corresponding to the
three levels of difﬁculty-based item characterstics. The R2 value associated with the
regression of the within-subjects variance in 1, onto the two within-subjects dummy
variables was .0005 (F3261), suggesting that difﬁculty-based item characteristics had
no effect on 1, values.

Conscientiousnesgrnd 1,
The results with respect to the relationship between conscientiousness and 1,

can be found in Table 10.

105

 

Gems n:
_v

V

m.

 

 

 

cam .m So. 5380:: m
m2 4 «co. amoemaoueﬁomeov m
wmm .m co. moumtaoﬁazu 8»: _
a 9.3 3% 865 9a

mm .55 HHS. 6225»

 

 
 

8332835 Eu: Ea 3058:5628 See me commas: em

2 03mg.

106

Conscientiousness had a nonsigniﬁcant effect on 1, values (Fl.l63<l), accounting
for less than 1% of the between subjects variance in 1,. The effect of the interaction
between item characteristics and conscientiousness on 1, was also nonsigniﬁcant (i.e.,
Hypothesis 10; R2=.007; F1326=l.15; p>.05).

Qapelessnessgnd 1,
The results with respect to the relationship between conscientiousness and 1,

can be found in Table 11.

107

 

Among: :gm
AzgvmVxxus
_V

m.

cmm.~ mmc.
me... ﬁve.
wwm.~ co.

to

 

353:3 am 35298 «m

.3: ..=_.a....m.. 3 5?.

82082:.

32.3225

35:23.20 Eu:

 

35:5

632.;

dBm

 

 

85:23:28 Eu: 95 32.3228 25 ._ mo :232wom

_ _ 035.

108

Carelessness had a signiﬁcant effect on 1, values (F,.163=7.50; p<.01),
accounting for 4.4% of the between subjects variance. Those respondents who
reflected more carelessness had lower 1, values (i.e., provided response patterns with
greater inappropriateness) than did those respondents who reflected less carelessness.
The effect for the carelessness by item characteristics interaction was also signiﬁcant
(F232,,= 3.61;p_<.05). This effect accounted for 2.2% of the within-subjects variance
after removal of the relevant main effects. Figure 12 contains a plot of this

interaction.

109

 

 

 

L2
0.3_ p
0.2 ------------ i ........ , .................... ..
0-1 ""77. ‘ 5 f. , ---- i: ' if r . _. ‘. .53?

 

 

 

 

 

 

 

Item Difficulty

Figure 12. Flor of the effect on 1, of the interaction between carelessness and item

characteristics

110

This interaction was also not of the form hypothesized (see Hypothesis 11).
Those respondents who were higher in carelessness (i.e., respondents that had lower
scores on the carelessness measure) displayed less inappropriateness as a function of
item difﬁculty than did respondents lower in carelessness.
_M_ath Anxiety and l,

The results with respect to the relationship between conscientiousness and I,

can be found in Table 12.

111

 

£75 3;
3.3 N?

_V

m

 

 

 

 

can .m woo. cozoﬁes _ m
32 ._ 3o. bores an: N
wmm .m . 8. 323220820 82. _
Hm cog—98 um cues—axe am .6225 mam

3 Sam 3 633:3

 

 

ill

35:20.22”. Eu: use 3355 £3: 2:0 ._ Co 5321?;—

 

N~ 03m...

112

Math anxiety did not have a signiﬁcant effect on 1, values, accounting for less
than 1% of the between-subjects variance (F1.153=1-32§ p>.05). The effect for the math
anxiety by item characteristic interaction was also nonsigniﬁcant (F1326=1.28;p>.05).
Thus, Hypothesis 12 was not supported.

Test anxietv
The results with respect to the relationship between conscientiousness and L

can be found in Table 13.

113

 

—V

_v

_V

ll-l

cam .m moo. 52222:— M

 

 

 

m2 ._ 8c. 3255 .5 . N
mam .m co. 85:82:20 Eu: ~
“In. 3.5530 «m uoEm—mxo «m 35:5 3%

mm .55 3 maﬁa»

 

 

 

mountaommso Eu: can 50?. :a 33 2514 he 533.3%

E 035,—.

114

Test anxiety also failed to produce a signiﬁcant effect on 1., values, accounting
for less than 1% of the between-subjects variance (F1.163=1'32; p_>.05). The effect for
the test anxiety by item characteristic interaction (Hypothesis 13) was also
nonsigniﬁcant (1713251283105).

Table 14 summarizes the ﬁndings of the present study with respect to the
Hypotheses presented on p.66. As can be seen, support was found for Hypotheses 1,
3, and 4, which dealt with the effects of item characteristics, carelessness, and math

anxiety on knowledge test scores. None of the other Hypotheses were supported.

l 15
Table 14

Summarv of hvpotheses and support
Knowledge Test 12

Main Effects Interactions Interactions

 

Hl-Item Characteristics Xa
112-Conscientiousness
HES-Carelessness

H4-Math Anxiety

O N N O

HS-Test Anxiety
H6-Conscientiousness
H7-Carelessness

H8-Math Anxiety

OOOO

H9-Test Anxiety
HID-Conscientiousness

H1 l-Carelessness

H12—Math Anxiety

OOOO

H13-Test Anxiety

 

‘ - X indicates support for the hypothesis while 0 indicates lack of support

Discussion

The purpose of the present study was to investigate the determinants of test
inappropriateness. Speciﬁcally, I investigated the effects of difﬁculty-based item
characteristics, math anxiety, test anxiety, conscientiousness, and carelessness on
knowledge test scores and the 12 index of test inappropriateness. The following
sections discuss the results of this study as they relate to the hypotheses that were
presented on page 67.

Hypothesis 1: Dﬁifficultv—based item characteristics and test scores

Difﬁculty-based item characteristics had a profound effect on knowledge test
scores. Specifically, difﬁculty-based item characteristics accounted for 20% of the
within-subjects variance in knowledge test scores. As hypothesized, respondents chose
the correct response option less often for items with many difﬁculty-based item
characteristics than they did for items with fewer difﬁculty-based item characteristics.
These results suggest that the responses to test items are a function not only of the
standing of the respondent on the construct of interest, but also of the format of the
item. This ﬁnding is consistent with previous research (e.g., Hughes & Trimble, 1965;

Dudycha & Carpenter, 1973).

116

117

vaothesis 2: Conscientiousness and test scores

Conscientiousness was found to have little effect on test scores, accounting for
less than 1% of the between-subjects variance. Although contrary to hypothesis 2 of
the present Study, this is nor entirely inconsistent with past research on the criterion-
related validity of the Big Five personality dimensions. Meta—analyses of the
relationship between conscientiousness and work-related outcomes have generally
yielded uncorrected validities of less than .15 (Barrick & Mount, 1992; Tett et al.,
1992).
vaothesis 3: Carelessness and test scores

Carelessness was found to have a considerable effect on knowledge test scores,
accounting for 21% of the between-subjeCts variance. As hypothesized, those
respondents who exhibited a large degree of carelessness received lower test scores
than did those who exhibited little or no carelessness. In fact, the mean knowledge
scores ("easy" items, "moderate" items, and "difﬁcult" items) for those respondents
who answered all six carelessness items correctly were .62, .60, and .53 respectively
whereas the scores for those respondents who missed at least one of the carelessness
items were .50, .47, .39. This suggests that responses to test items were due not only
to the standing of the respondent on the construct of interest, but also to the
carelessness of the respondent at the time of test administration. In Other words, there
was evidence of Type P inappropriateness due to an effect for carelessness.

One possible alternative explanation for this finding is that the carelessness
items used in the present study were in fact Statistical knowledge items and, therefore,

should be related to knowledge scores. While this is a possibility, it should be noted

118

that each of the items used in the carelessness scale were items which were answered
correctly by all Students who had responded to the item in previous tests in which
those items had been used. Participants in the present study were still taking an
Introductory statistics course at the time of testing, and the material contained in the
carelessness items was material that had been covered in the class. So, while the
content of the carelessness items was knowledge-related, the only viable cause of
variance on the items (given that they are administered to people familiar with the
rubric of statistics) was carelessness.
vaothesis 4: Math anxiety and test scores

Math anxiety was found to have a signiﬁcant effect on knowledge test scores,
accounting for 6% of the between-subjects variance. As hypothesized, respondents
higher in math anxiety had lower test scores than did respondents lower in math
anxiety. This suggests that there was also evidence of Type P inappropriateness due‘
to an effect for math anxiety.
Hypothesis 5: Test anxieg and test scores

T esr anxiety was found to have little effect on test scores, accounting for less
than 1% of the between-subjects variance. One explanation for this result is that the
experimental testing situation lacked those aspects of real testing situations which lead

to test anxiety. This possibility is discussed in more detail below.

Hypothesis 6: Conscientiousness by item characteristic interaction and test scores

 

Hypothesis 6 was not supported by the data. The conscientiousness by item
characteristic interaction did contribute signiﬁcantly to the prediction of test scores,

but the form of the interaction was not as expected. It was hypothesized that

119

respondents low in conscientiousness would be more adversely affected by difﬁculty-
based item characterisrics than would respondents high in conscientiousness. Instead,
the opposite was found. As can be seen in Figure 11, however, the extent of the
interaction is slight and may have been due only to chance.
va0thesis 7: Carelessness bv item characteristic interaction and test scores
Hypothesis 7 was not supported by the data. The carelessness by item

characteristic interaction did not contribute signiﬁcantly to the prediction of test
scores. In other words, there was no evidence of Type I inappropriateness resulting
from an interaction between carelessness and item characteristics.
Hyporhesis 8: Math anxietv by item characteristic interaction and test scores

. Hypothesis 8 was not supported by the data. The math anxiety by item
characteristic interaction did not contribute signiﬁcantly to the prediction of test
scores. In other words, there was no evidence of Type I inappropriateness resulting
from an interaction between math anxiety and item characteristics.
Hypoghesis 9: Test gnxiety by item characteristic interagtion and test scores

Hypothesis 9 was not supported by the data. The test anxiety by item

characteristic interaction did not contribute signiﬁcantly to the prediction of test
scores. In other words, there was no evidence of Type I inappropriateness resulting
from an interaction between math anxiety and item characteristics. As with the main
effects, one explanation for this result is that the experimental testing situation lacked

those aspects of real testing situations which lead to test anxiety.

120
Hypothesis 10: Conscientiousness by item characteristic interaction and l,

Hypothesis 10 was not supported by the data. The conscientiousness by item
characteristic interaction did not contribute signiﬁcantly to the prediction of 1,.
Although this is contrary to the hypothesis of the present study. it is not surprising
given the lack of effect for this interaction on test scores. The lack of effect for the
interaction on test scores suggests that there was little in the way of Type I
inappropriateness (at least as caused by the conscientiousness by item characteristic
interaction), therefore, any variance in L, was due to other factors or chance.
Hyp0thesis 11: Carelessness byitem characteristic interarction and L

Hypothesis 11 was not supported by the data. Although the carelessness by
item characteristic interaction did contribute signiﬁcantly to the prediction of 11, the
interaction was not of the form expected. It was hypothesized that respondents higher
in carelessness would have higher 1., values only for the tests composed of items with
difﬁculty-based item characteristics. Instead, respondents higher in carelessness
displayed less of an item characteristic-lz effect. So, while 12 was predicred by this
interaction, it was not predicted in a way that was consistent with Hypothesis 11.
Hypothesis 12: Math anxiety by item characteristg: interaction M

Hyporhesis 12 was not supported by the data. The math anxiety by item
characreristic interaction did not contribute signiﬁcantly to the prediction of 1,. As
with conscientiousness, however, the lack of effect for this interaction on test scores
suggests that there was little in the way of Type I inappropriateness (at least as caused
by the math anxiety by item characteristic interaction), therefore, any variance in 11

was due to other factors or chance.

121
Hypothesis 13: Test anxietv bv item characteristic inmcdon and l,

Hypothesis 13 was n0t supported by the data. The test anxiety by item
characteristic interaction did not contribute signiﬁcantly to the prediction of test
scores. In other words. there was no evidence of Type I inappropriateness resulting
from an interaction between test anxiety and item characteristics. As with the main
effects, one explanation for this result is that the experimental testing situation lacked
those aspects of real testing situations which lead to test anxiety.

Implications and conchrsions

The results of the present study have several implications for the interpretation
of ability and knowledge test scores. First, item characteristics, in particular difﬁculty-
based item characteristics, can have a profound effect on test scores. As was
suggested in the Introduction, these item characteristics force the respondent to use
abilities other than those related to the osrensible content of the items to arrive at the
correct answer. Insofar as this is the case, the items which possess these
characteristics are measuring constructs Other than those that they are intended to
measure.

The second implication for test scores has to do with carelessness. The present
study found that carelessness had a considerable relationship with test scores.
Speciﬁcally, the results suggest that those respondents who perform poorly on
carelessness items answer far fewer items correctly than do those respondents who
perform well on carelessness items. This is a particularly important issue for testing
situations such as those encountered in concurrent, criterion-related validity studies in

which respondents are existing employees who have nothing to gain from performing

122

well on the selection tests. If there is a signiﬁcant number of respondents who are
careless in their responding, then estimates of validity may be adversely affected.
Also, if incumbent test scores are to be used in some fashion to set cutoffs for job
applicants, carelessness may lead to cutoffs that are too low. In short, the results of
the present study suggest that, in those situations where there are few or no formal
rewards for test performance, test results can be interpreted in light of the effects of
carelessness.

A third implication with respect to test scores is that math anxiety may have a
considerable impact on math test scores, speciﬁcally, those respondents higher in math
anxiety can be expected to answer fewer math knowledge questions correctly than
respondents lower in math anxiety. The exact nature of this relationship, however, is
not entirely clear. As with general test anxiety, a person may be high in math anxiety
simply because of a lack of knowledge or ability (Naveh-Benjamin et al., 1987), in
which case it would appear that the lack of knowledge is causing both the math
anxiety and the lack of math performance. If, on the other hand, a respondent is high
in math anxiety but does not lack the requisite knowledge or ability for a given task,
then it is more likely that performance is determined by both anxiety and ability.
More research is needed which isolates these factors so that their independent
contributions to test performance can be identiﬁed.

To summarize, the results of the present study suggest that difﬁculty-based
item characteristics such as response option complexity and negative wording and
personality characteristics such as carelessness and math anxiety can have a substantial

effect on math test scores. Future research should investigate the speciﬁc testing

123

situations in which these effects hold. Also. the present study focused only on
Statistical knowledge tesrs. which is an example of a test of maximum performance.
Future research should endeavor to discover how these factors affect responses to tests
of typical performance.

The present study also has implications for 12. Interactions between item
characteristics and each of the four personality variables were predicted, but none were
found to be signiﬁcant and in the hypothesized direction. One explanation for these
ﬁndings is described in the Limitations section of this paper, namely, that a lack of
external motivators in the testing situation led to uninterpretable results with respect to
test scores. If this was the case, then 12 might also be uninterpretable.

Another possible explanation is that 12 simply does not reﬂect interactions
between item characteristics and respondent characteristics. Instead, It might simply
reﬂect nonsystematic response tendencies that, by definition, cannot be predicted.
Before we accept this explanation, however, the relationship between 12 and
interactions between item and respondent characteristics must be studied in testing
situations which contain the external motivators that were missing in the present study.
Only then can ﬁrm conclusions be drawn with respect to the hypotheses suggested in
this paper.

Limitations

As was mentioned earlier, the most plausible explanation for the lack of
hypothesized effects in the present study was that the testing situation lacked the
external motivators present in most real-world, evaluation-oriented testing situations.

Although efforts were made to encourage diligence on the part of the respondents,

124

grades, job opportunities. and orher outcomes which can drive performance were not
in any way contingent upon performance on the knowledge test. This lack of external
morivators may have led to a failure to elicit reaCtions to the testing situation that
would have been present if knowledge test performance were somehow tied to
rewards. For example, most of the previous research on test anxiety has examined test
anxiety in the context of an actual testing situation (i.e., research was not the only
function served by the administration of the test). It may be that respondents who
have an involuntary anxiety reaction to these true testing situations have little or no
such reaction to a testing situation which lacks important consequences.

This lack of external motivators may also have led to a uniformly low level of
effort on the knowledge test itself. Effects for variables such as conscientiousness
might have been washed out by such a general lack of effort.

There are two ﬁnal points to be made with respect to the issue of the role of
external motivators in testing. First, while the testing situation used in the present
study was unlike many real-world testing situations, it shared many of the
characteristics of the typical concurrent validation study. In concurrent validation, as
in the present study, participants respond to test items knowing that rewards are not
contingent upon test performance. In fact, many such participants view the collection
of concurrent data as a waste of their time. More research must be conducted which
compares the effects of anxiety, conscientiousness, carelessness, and their interactions
with item characteristics on item responses in situations which contain typical external

motivators versus situations which do not contain such motivators. Such research

125

could shed more light on the nature of these predictors as well as on the nature of
inappropriateness.

The second point to be made is related to the issue of the presence or absence
of external modvators. It may be that a test such as the statisrical knowledge test used
in the present study simply isn’t a valid measure of the construct of interest in a
testing situation such as that used in the present study. It was suggested earlier that
test appropriateness is an issue only for a test for which evidence of validity has been
provided. If a certain level of effort is a prerequisite for the validity of a test, and if
testing situations such as that used in the present study do not foster that level of
effort, then perhaps appropriateness is n0t an issue in such situations. This is a
question that the comparative research suggested above could address.

A ﬁnal limitation of the present study is that it tested only part of the model
presented on page 62. The relationships involving acquiescence, need for approval,
ﬁeld articulation, response sets, test wiseness, and omissiveness were not examined.
The relationships involving acquiescence and need for approval are particularly
promising for test of typical performance such as personality tests. As was discussed
in the Introduction of this paper, acquiescence and need for approval may interact with
item characteristics to affect test scores, and this interaction may be detectable with 1,.
Likewise, ﬁeld articulation and test wiseness show promise for tests measuring
maximum performance. These factors may also interact with item characteristics to
affect test scores, with the interaction being detected by 1,. As with the relationships
examined in the present study, these relationships should be investigated in a testing

situation which provides the external motivators typical of real—world testing situations.

LIST OF REFERENCES

References

Allport, G.W. (1928). A test for ascendance-submission. JournJaI of Abnormﬂ
Psychology, 2;, 118-136.

Anastasi, A. (1988). Psychological Tesggwth ed.) New York: Macmillan.

Anderson, KJ. (1990). Arousal and the inverted-U hypothesis: A critique of
Neiss’s "Reconceptualizing arousal". Psychological Bulletin, _1_Q’_7_, 96-100.

Benjamin, M., McKeachie, W.J., Lin, Y., & Holinger, DP. (1981). Test anxiety:
deﬁcits in information processing. Journal of Educational Psychology, ﬂ,
816-824.

Berg, I.A., & Collier, J.S. (1953). Personality and group differences in extreme
response sets. Educational and Psychological Measurement, _1_3_, 164—169.

Birenbaum, M. (1985). Comparing the effectiveness of several IRT-based
appropriateness measures in detecting unusual response patterns.
Educational and Psychological Measurement, _4_5_, 523-535.

Block, J. (1965). The Challenge of Response Sets: Unconfounding MeaningI
_qupiescence, gig! SocialDes'pability in ﬁe MMPI. New York: Irvington.

Bock, R.D., Dicker, C., & VanPelt, J. (1969). Methodological implications of
content-acquiescence correlation in the MMPI. Psychological Bulletin, 11,
127-139.

Boring, E.G. (1950). " A History of Experimental Psychology (rev. ed.). New

York: Appleton, Century, Crofts.

126

127
Broverrnan, D.M., Klaiber, E.L., Kobayashi, Y., & Vogel, W. (1968). Roles of

activation and inhibition in sex differences in cognitive ability.
Psvchological Review, _7_5_, 23-50.

Cady, V.M. (1923). The estimation of juvenile incorrigibility. Journal of
Delinquency Monograph, No.2.

Campeau, PL. (1968). Test anxiety and feedback in programmed instruction.
JLurnal of Educational Psychology, 2, 159-163.

Cohen, J ., & Cohen, P. (1983). Applied Mrmle RegressionLCorrelation AnalLsis
for the Behaviml Sciences. Hillsdale, NJ: Erlbaum.

Cortina, J.M. (1993). What is coefﬁcient alpha?: An examination of theory and
application. Journal of Applied Psychology, _7_8_, 98-104.

Costa, P.T., & McCrae, RR. (1988). From catalog to classiﬁcation: Murray’s
needs and the ﬁve-factor model. Jo_urnal of Personality aird Socia_l
Psychology, ﬂ, 258-265.

Couch, A., & Keniston, K. (1960). Yeasayers and naysayers: agreeing response
set as a personality variable. Journal of Abnormal Social Psycholggy, 69,
151-174.

Cronbach, L]. (1946). Response sets and test validity. Educational and
Psmhological Measarement. 6, 475-494.

Cronbach, L.J. (1950). Further evidence on response sets and test design.

Educational and Psychological Megurement, _1_O_, 3-31.

128

Crowne, D.P., & Marlowe, D. (1964). The Approval Motive: 8w
Evaluative Dependence. New York: Wiley.Jackson, D.N., & Messick, S.
(1958). content and style in personality assessment. Psvcholigcal Bulletin,
55, 243-252.

Damarin, F. (1970). A latent structure model for answering personal questions.
Psychological Bulletin, E, 23-40.

Dolly, J.P., & Williams, KS. (1986). Using test taking strategies to maximize
multiple choice test scores. Educational and Psychological Measuremeng,
_4_6_, 619-625.

Donlon, T.F., & Fischer, FE. (1968). An index of an individual’s agreement with
group-determined item difﬁculties. Educational and Psychologi_cal_
Measuremeng, 28, 105-113.

Drasgow, R, Levine, M.V., & Williams, EA. (1985). Appropriateness
measurement with polychotomous item response models and standardized
indices. British Joamal of Mghematical and Statistics Psychology, 3, 67-
86.

Drasgow, R, Levine, M.V., & McLaughlin, ME. (1991). Appropriateness
measurement for some multidimensional test batteries. Applie_d
Psychological Measuremeng, _1_5_, 171-191.

Drasgow, R, Levine, M.V., Williams, B., McLaughlin, M.E., & Candell, CL.
(1989). Modeling incorrect responses to multiple-choice items with

multilinear formula score theory. Applied Psychological Measurement, _l_3_,

285-299.

129

Dreger, R.G., & Aiken, LR. (1957). The identiﬁcation of number anxiety in a
college population. Journal of Educational Psychology, ﬂ, 344-351.

DuBois, PS. (1966). A test-dominated society: China 1115 BC. - 1905 A.D. In
A. Anastasi (Ed.), TeSting Problems in Perspective (pp. 29-36).
Washington, DC: American Council on Education.

Dudycha, A.L., & Carpenter, J .B. (1973). Effects of item format on item
discrimination and difﬁculty. Journal of Applied Psychology, 58, 116-121.

Edwards, AL. (1957). The SocijlﬁDesirability Vapiable in Personality
Assesement and Research. New York: Dryden.

Endler, N.S., & Hunt, J.M. (1966). Sources of behavioral variance as measured
by the S-R inventory of anxiousness. Psychological Bulletin, 6;, 336-346.

Fagley, N.S. (1987). Positional response bias in multiple-choice tests of learning:
its relationship to test wiseness and guessing strategies. Journal of
Educational Psychology, 19, 95-98.

Feldman, J .M. & Lynch, J .G. (1988). Self-generated validity and other effects of
measurement on belief, attitude, intention, and behavior. Journal of
Amalied Psychology, ﬂ, 421-435.

Forehand, G.A. (1962). Relationships among response sets and cognitive
behaviors. Educational and PsvchologipaLMeasprement, Q, 287-302.

Gaier, E.L., Lee, M.C., & McQuitty, LL. (1953). Response patterns in a test of
logical inference. Educational and Psychological Measurement, _1_3_, 550-

567.

130

Gehman, W.S. (1957). A study of ability to fake scores on the Strong Vocational
Interest Blank for men. Educational and Psychological Measurement, pl],
65-70.

Gibb, B.G. (1964). Test Wiseness as Secondary Cue Response. Unpublished
doctoral dissertation, Stanford University.

Goodenough, DR. (1976). The role of individual differences in ﬁeld dependence
as a factor in learning and memory. Psychological Bulletin. 8_3, 675-694.

Green, RF. (1951). Does a selection situation induce testees to bias their answers
on interest and temperament tests? Educational and Psychological
Measuremeng 1_l_, 503-515.

Hamisch, D.L., & Linn, R.L. (1981). Analysis of item response patterns:
Questionable test data and dissimilar curriculum practices. Journal of
MOE Measurement, _1_8_, 133-146.

Harper, F.B.W. (1974). The comparative validity of the Mandler-Sarason Test
Anxiety questionnaire and the Achievement Anxiety Test. Educational and
Psychological Measuremeng, 34, 961-966.

Hartshome, H., & May, M.A. (1928). Studies in Deceit. New York: Macmillan.

Henry, E.M., & Rotter, J .B. (1956). Situational inﬂuences on Rorschach
responses. JournJaLof Consaldngjswchology, 20, 457-462.

Herrmann, DJ. (1982). Know thy memory: The use of questionnaires to assess

and study memory. Psycholpggical Bulletin, 22, 434-452.

131

Hollenbeck, J. R., Ilgen, D.W., & Sego, D. (in press). Repeated measures
regression and mediational tests: Enhancing the power of leadership
research. Leadership Quarterly.

Hothersall, D. (1990). History of Psycholggy (2nd Ed.) New York: McGraw-
Hill.

Hough, L.M., Eaton, N.K. Dunnette, M.D., Kamp, J.D., & McCloy, RA. (1990).
Criterion-related validities of personality constructs and the effect of
response distortion on those validities. Journal of Applied Psychology, Z_5_,
581-595.

Hughes, H..,H & Trimble, W.E. (1965). The use of complex alternatives in
multiple choice items. Educational and PsychologicalkMeasuremeng, _2__5_,
117-126.

Humm, D.G., & Humm, K.A. (1944). Validity of the Humm-Wadsworth
temperament scale: with consideration of the effects of subject’s response-
bias. Journal of Psychology, L8, 55-64.

Humm, D.G., Storment, R.C., & Ioms, ME. (1939). Combination scores for the
Humm-Wadsworth temperament scale. Journal of Psychology, 1, 227-253.

Jackson, D.N. & Messick, S. (1958). Content and style in personality assessment
Psychological Bulletin, _5_5_, 245-252.

Jackson, D.N. & Messick, S. (1965). Acquiescence: the nonvanishing variance

component. American Psychologiit, 20, 498-502.

132

Kingston, A.J., George, C.E.. & Ewens, WP. (1956). Determining the
relationship between individual interest proﬁles and occupational forms.
Journal of Educational PsvchOIOgy, _4_7_, 310-316.

Klein, G.S., Barr, H.C., & Wolitzky, DC. (1967). Personality. Annual Review
of PsychOIOgy, _l_8_, 467-560.

Lawrence, P]. (1957). Some characteristics of incorrect responses to intelligence
test items. Australian Journal of Psvchology, 2, 1-11.

Levine, M., & Rubin, D. (1979). Measuring the appropriateness of multiple.
choice test scores. Journal of Educational Statisﬁti_c_s, 4, 269-290.

Lorge, I., & Diamond, L.K. (1954). The prediction of absolute item difﬁculty by
ranking and estimating techniques. Educational and Psycholoﬁal
Measurement, l4, 2-10.

Mandler, G. & Sarason, SB. (1952). A study of anxiety and learning. 1041M.
of Abnormal and Social Psychology, 41, 166-173.

Messick, S. (1966). The Psychology of Acguiescence: A Interpretation of
Research Evidence. Educational Testing Sevice: Princeton, NJ, April.

Metfessel, N.S., & Sax, G. (1957). Response set patterns in published Intructor’s
Manuals in education and psychology. California Journal of Educational
Research, 8, 195-197.

Metfessel, N.S., & Sax, G. (1958). Systematic biases in the keying of correct
responses on certain standardized tests. Educational and Psycholigcal

Measurement 18, 787-790.

 

133

Millman, J., Bishop, C.H., & Ebel, R. (1965). An analysis of test-wiseness.
Egational and Psychological Measuremeat, 25; 707-726.

Naveh-Benjamin, M., McKeachie, W.J., & Lin, Y.G. (1987). Two types of test
anxious students: support for an information processing model. Journal of
_E_d_rLcational Psychology, 22, 131-136.

Naveh-Benjamin, M. (1991). A comparison of training programs intended for
different types of test-anxious students: Further support for the information
processing model. Journal of Educational Psychology, 83;, 134-139.

Paulman, R.G., & Kennally, K.J. (1984). Test anxiety and ineffective test takers:
Different names, same construct? Lo_u:rnal of Educational Psvchology, 16,
279-288.

Peabody, D. (1966). Authoritarianism scales and response bias. Psychological
Bulletin, as, 11-23.

Peterson, R.A. (1961). A technique for the detection of blind checking in
questionnaire research. Educational and Psychological Measuremeng, _2_1_,
361-362.

Rapaport, G.M., & Berg, LA. (1955). Response sets in a multiple-choice test.
Educational and Psychological Measuremeng, _12, 58-62.

Rasch, G. (1960). Probabilistic Models for Some Intelligenceﬁand AW
_T_e_s_t_s_. Kobenhavn: Danmarks Paedagogiske Institut. (Reprinted by
University of Chicago Press, 1980).

Reise, SP (1990). A comparison of item and person-ﬁt methods of assessing

model-data ﬁt in IRT. Applied Psychological Measurement, 14, 127-137.

134

Rorer, LG. (1965). The great response style myth. Psychological Bulletin, 63,
129-156.

Rosenberg, N., Izard, C.E., & Hollander, ER (1955). Middle category response:
reliability and relationship to personality and intelligence variables.
Educgional and Psychological Measurement, L5, 281-290.

Rosenzweig, S. (1934). A suggesdon for making verbal personality test more
valid. Psychological Review, _4_1_, 400-401.

Ruch, EL. (1942). A technique for detecting attempts to fake performance on a
self-inventory type of personality test. In Q. McNemar and M.A. Merrill,
§£ufdies in Personm. (pp. 229-234), New York: McGraw-Hill.

Rudner, L.M. (1983). Individual assessment accuracy. Journal of Educational
Mea§_urement, 29, 207-219.

Sarnacki, RE, (1979). An examination of testwiseness in the cognitive test
domain. Review of Educational Reseaich, a, 252-279. '

Sato, T. (1975). The Construction and Interpreta_tion of S-P Tables. Meiji
Toosho.

Satterly, D.J., (1976). Cognitive styles, spatial ability, and school achievement
Journal; of Educational Psychology, _6_8, 36-42.

Saupe, IL. (1960). An empirical model for the corroboration of suspected
cheating on multiple-choice tests. Educational and Ps'Lchologic_al_

Measurement. 29, 475-489.

135

Schrnitt, N., Gooding, R.Z., Noe, N.A., & Kirsch, M. (1984). Metaanalyses of
validity Studies between 1964 and 1982 and the investigation of study
characteristics. Personnel Psychology, 31, 407-422.

Schuman, H., & Kalton, G. (1985). Survey methods. In G. Lindzey and E.
Aronson (Eds) Handbook of Social Psychology. Hillsdale, NJ: Erlbaum.

Tatsuoka, K.K. & Tatsuoka, M.M. (1980). Detection of abem response patterns
and their effect on dimensionality. Research Report 80-4-ONR Urbana, 11:
University of Illinois, Computer based Education Research Laboratory.

Tatsuoka, K.K. & Tatsuoka, M.M. (1982). Detection of aberrant response
patterns. Journal of Educational Statistics, '_7_, 215-231.

Tobias, S. (1979). Anxiety research in educational psychology. Journal of
Educational Psychology, ﬂ, 573-582.

vanderFlier, H. (1977). Environmental factors and deviant response patterns. In
Y.H. Poortinga (Ed.) Basic Problem_s in Cross Cultural Psychology.
Amsterdam: Swets & Seitlinger, B.V.

Wahsltrom, M., & Boersma, RI. (1968). The inﬂuence of test-wiseness upon
achievement. Erin—cational and Psychological Measurement, ga, 413-420.

Waterhouse, I.K. & Child, LL. (1953). Frustration and the quality of
performance: An experimental study. 103ml of Personality, 22, 298-311.

Watson, D, & Clark, LA. (1984). Negative affectivity: the disposition to

experience negative emotional states. Psychological Bulletin, _9_6_, 465-490.

136
Whitcomb, M.A., & Travers, R.M.W. (1957). A study of within-test learning

functions as a determinant of total score. Educationagnd Psychological
Measurement. _l_7_, 86-97.

Wiggins, J.S. (1962). Strategic, method, and stylistic variance in the MMPI.
Psychological Bulletin, 52, 224-242.

Witkin, HA. (1974). COgnitiye Styles. Essence and Origins: Field Dependence
and Field Indﬂendence. New York: International Universities Press.

Wright, B.D., & Panchapakesan, NA. (1969). A procedure for sample free item
analysis. @cational and Psychological Measgment, 22, 23-48.

Zimmerman, W.S. (1954). The inﬂuence of item complexity upon the factor
composition of a spatial visualization test. Educational and Psychological

Measprement, E, 106-119.

I LIST OF APPENDICES

APPENDIX A
Personality measures

Conscientiousness
Please use the following scale to answer questions 1 - 4:

A - Strongly disagree
B - Disagree
C - Neutral
D - Agree
E - Strongly Agree
Please mark your answers on the bubble sheet that was provided.
I try to perform all of the tasks assigned to me conscientiously.
When I make a commitment, I can always be counted on to follow through.
I am a productive person who always gets the job done.
I strive for excellence in everything I do.
Test Anxiety
Please use the following scale to answer questions 5 - 15:
A- The statement does not describe my present condition
B - The condition is barely noticeable
C - The condition is moderate
D - The condition is strong
E - The condition is very strong; the statement describes my present condition
well.
I feel my heart beating fast.
I feel regretful.
I am so tense that my stomach is upset.
I am concerned about others in the class seeing the results of this test.

I have an uneasy, upset feeling.

I feel that others will be disappointed in me.

137

138

I am nervous.

I feel I may not do as well on this test as I could.

I feel panicky.

I do not feel very conﬁdent about my performance on this test.

Math Anxiety

When my stats professor (or any math professor) asks questions to ﬁnd out how much
we know about a particular mathematical concept or approach, I worry that I will do
poorly in the class.

When my stats professor is showing the class how to do a particular problem, I worry that
the other students in the class migh understand the problem better than I do.

When I am in stats (or any math class), I usually feel relaxed and at ease.
When I am taking a math test, I usually feel relaxed and at ease.

Taking math tests scares me.

I dread having to do math.

The thought of taking a more advanced stats class (e. g., Psych 302, 304, or required stats
courses in graduate school) scares me.

In general, I worry about how well I am doing in school.

If I miss a given day of stats class, I worry that I will be behind the other students when
I come back.

In general, I worry about how well I am doing in math.

Compared to other subjects, I worry a great deal about how well I am doing in math.
Carelessness Item

In a one-way ANOVA, if the degrees of freedom between groups is equal to the number

of groups minus 1, and there are 4 groups, what would the degrees of freedom between
groups be?

139

Which of the following does ANOVA stand for?
a. ANalysis Of VAriance
b. Standard Deviation
c. Correlation
d. Repeated Measures Designs

The two types of errors that one can make in the sort of hypothesis testing exempliﬁed
by the above scenario are

a. Type I and Type II

b. Type III and Type IV

c. Type V and Type VI

(1. Type VII and Type VIII

Which of the following are within the range of possible correlations?
a. -.50
b. .40
c. .50
d. all of the above

(This question does not apply to the scenario that the previous questions applied to or any
other particular scenario) In general, which of the following are within the range of
possible z-scores?

a. 1.0

b. 2.0

c. -1.0

d. all of the above

 

Another word for the average is
a. mean
b. variance
c. sample
(1. population

APPENDD( B
Statistical knowledge items
For questions 1 - 75, use the following scenario.

Suppose I am interested in examining the effect of the number of times one watches
"Jeopardy" gag the number of times one watches "Wheel of Fortune" on the number of
Trivial Pursuit questions that one can answer. In an attempt to do this, I assign 30 people
to one of two levels of Jeopardy-watching and one of three levels of Wheel of Fortune-
watching. The data look like this.

JEOPARDY WATCHING
NONE 10 TIMES
58 45
75 51
NONE 68 75
37 72
27 69
265 312
WHEEL
OF
FORTUNE 61 55
WATCHING 41 58
10 55 42
TIMES 50 68
47 32
254 255
75 31
65 31
20 50 46
TIMES 35 32
30 40
255 180

140

141

(Diff) 1.) If each subject had received every level of one of the variables in the "Trivial
Pursuit" experiment, I could nor use

a. a three-way ANOVA

b. a Chi-squared test

c. a one-way ANOVA

d. all of the above

(Mod) 2.) If I had simply wanted to compare the number of people who watch JeOpardy
to the number of people who watch Wheel of Fortune, 1 would have to use which of the
following?

a. a three-way ANOVA

b. a Chi-squared test

c. a one-way ANOVA

d. any of the above

For questions 3-8, use the following scenario.

Suppose I am interested in the effects of the number of Jagerrneister shots that one does
on the number of questions that one can answer about analysis of variance. In an attempt
to assess these effects, I assign 15 people to one of three groups so that there are 5 people
in each group. GROUP 1 gets no Jager, GROUP 2 gets 3 shots of Jager, and GROUP
3 gets 8 shots of Jager. The data look like this:

GROUP 1 GROUP 2 GROUP 3
3 2 6

7 7 8

3 8 8

1 2 7

0 4 8

X2 = 482

(Diff) 3.) A statement that does not represent the null hypothesis for this test is
r = 2 = 3

r = 2

r = 2 = 3

both a and b

9.0g!»

(Diff) 4.) The degrees of freedom Between, Within, and Total for this test are not
a. 3, 11, 14
b. 2, 13, 15
c. 2, 12, 14
d. both a and b

142

(Mod) 5.) What are the Sums of Squares Between, Within, and Total?
a. 58.8, 61.2, 120
b. 53.73. -63.2. 116.93
c. 53.73, 63.2, 116.93
d. none of the above

(Mod) 6.) If the Sums of Squares Between and Within are 53.73 and 63.2 respectively
( they aren’t necessarily), what are the Mean Squares Between and Within?

a. 17.91, 5.74

b. 26.86, 5.27

c. 3.58. 4.21

d. none of the above

(Mod) 7.) Which of the following is true of the F-ratio?

a. F“, = 5.10
b. F112 = 5.10
c. F,”12 = .25

(1. none of the above

(Mod) 8.) With which of the following levels of signiﬁcance would you reject the null?
a. .05
b. .01
c. .005
(1. both a and c

(Easy) 9.) A Mean Square is a form of which of the following?
a. standard error
b. variance
c. standard deviation
(1. covariance

(Easy) 10.) Which of the following represents a factorial design? _
a. Each level of one variable is paired with one level of every other
variable
b. Each variable is paired with every other variable
c. One variable is paired with every level of one other variable
(1. Each level of every variable is paired with each level of every other
variable

143

For questions 11-12, use the following scenario

Suppose I am interested in assessing the effects of job satisfaction (1V1) and salary (1V2)
on job performance (DV). So, I collect data on these three variables for 100 people and
correlate the variables. The correlation between satisfaction and performance is .23, the
correlation between salary and performance is .45, and the correlation between salary and
satisfaction is .42.

(Mod) 11.) Which of the following is the type of regression analysis that would I use to
assess the effects of job satisfaction and salary on job performance?

a. multiple regression

b. repeated measures regression

c. simple regression

d. any of them depending on the nature of the variables

(Mod) 12.) Which of the following is the regression equation?
a Y = .OSXl + .43X2 + a
b. Y = .07Xl + .6lX2
c. Zy = .05Zl + .4322
d. none of the above

(Easy) 13.) In general, the F-ratio will be large if which of the following is true?
a. If the variance within the groups is larger than the variance between the
groups
b. If the variance within the groups is equal to the variance between the
groups
c. If the variance within the groups is less than the variance between the
groups
(1. If there is no variance between the groups

(Easy) 14.) Which of the following best represents the null hypothesis for a one-way
ANOVA with three groups?

a- " 2:0 r' 3:0 2' 3:0
= 0

l

l-
1
1

b.
c.
d.

2
2
3

(Easy) 15.) Which of the following is true of ANOVA?
a. It is relatively insensitive to violations of the normality assumption but
not the homogeneity assumption
b. It is relatively insensitive to violations of the homogeneity assumption
but not the normality assumption '
c. It is relatively insensitive to violations of both of its assumptions
d. It is greatly affected by violations of either of its assumptions

144

(Easy) 16.) Which of the following has to be true in order for the Central Limit Theorem
to be applicable?

a. the samples come from the same population

b. the samples come from different populations

c. the sample means are equal

d. the population variances are different

(Easy) 17.) If I wanted to examine the effects in an ANOVA further, which of the
following should I use?

a. a second ANOVA

b. an acid test

0. urinalysis

d. multiple comparison procedures

For questions 18 - 19, use the following scenario.

Suppose I am interested in knowing the difference in ACT scores between men and
women. In an attempt to investigate this, I draw a sample of 16 men and a sample of 16
women. The mean of the ACT scores for the men is 18.6 and the mean for the women
is 20.4. The variances of the individual scores are 6.1 and 8.9 respectively. Suppose
further that the standard error for the differences between means is .97.

(Mod) 18.) Calculate a t-score for the difference between these two means. (When using
the t-score formula, be sure to put the means in the order that they were presented)

a. 1.85
b. -1.85
c. -l.8

d. none of the above

(Diff) 19.) I would fail to reject the null with
a. a one-tailed test that examines the lower 5% of the distribution
b. a two-tailed test with a signiﬁcance level of .05.
c. a two-tailed test with a signiﬁcance level of .10.
d. both a and b

145

Use the following scenario to answer quesrions 20 - 21

Suppose I am interested in knowing the difference in ACT scores between men and
women. In an attempt to investigate this, I draw ﬁve samples of men with 12 in each
sample and 5 samples of women with 12 in each sample. For each pair of samples, I
record the mean ACT score for men, the mean ACT score for women, and the difference
between the means so that my data look like this

 

 

 

ACT (MEN) ACT (W OMEN) DIFFERENCE
X11: 18 X21: 24 X11- X21: '6

x,,=22 x3=22 x,,-x,,= 0

X,, = 17 x,, = 23 X1, - X2, = -6

x,, = 20 X, = 26 x,, - X, = -6

X15: 16 X25=27 X’s-X25='11
X,l = 18.6 xx2 = 24.4

52,2 = 5.8 32,2 = 4.3

(Mod) 20.) If we draw another sample of 15 men and 15 women, and their mean ACT
scores are 17 and 22 respectively, and the standard error of the differences between means
is 3.18, what is the t-score for the difference between these two means?

a. 1.73

b. -1.57

c. .49

d. none of the above

(Mod) 21.) What would the effect on the power of my t-test be if I doubled my sample
size?

a. My power would increase

b. My power would decrease

c. My power would be unaffected

(1. Either a or b depending on the situation

For questions 22 - 24, use the following scenario.

Suppose I am interested in knowing the difference in armpit hair between men and
women. In an attempt to investigate this, I draw a sample of 21 men and a sample of 21
women. The mean for pithair for men is 4.6 and the mean for women is 6.6. The
variances of the individual scores within these samples are 7.1 and 7.7 respectively.

146

(Diff) 22.) The standard error of differences is not
a. .70
b. .84
c. 14.8
d. This applies to both a and c

(Mod) 23.) Calculate the t-score for the difference between these two means.
a. -2.38 °
b. -2.0
c. 2.0

d. none of the above

(Mod) 24.) With which of the following would I fail to reject the null?
a. a one-tailed test that examines the upper 5%
b. a one-tailed test that examines the lower 1%
c. a two-tailed test with a signiﬁcance level of .05
d. both a and b

For questions 25 - 26, use the following scenario

Suppose I want to estimate the average number of hours per day that Dan Quayle spends
playing with his Legos. In order to do this, I take a sample of 12 days and record the
number of hours that he spends playing with his Legos on each of those days. They are
as follows: 4, 3.5, 6, 7, 3.5, 4.5, 6.5, 8, 1.5, 2.5, 6, 5. The standard deviation of this set
of scores is 1.93.

(Mod) 25.) What is the 95% conﬁdence interval around the mean? (Hint: You must use
the value for a two-tailed test)

a. .58 < < 9.08

b. 3.6 < < 6.06

c. 3.09 < < 6.57

d. none of the above

f

147

(Mod) 26.) What is the 99% conﬁdence interval around the mean? (Hint: You must use
the value for a two-tailed test)

a. .58 < < 9.08

b. 3.6 < < 6.06

c. 3.09 < < 6.57

(1. none of the above

(Easy) 27.) What is a conﬁdence interval?
a. It is a range of scores within which we expect the population mean to
fall.
b. It is the population mean
c. It is a range of scores within which we expect the sample mean to fall.
d. It is our two best estimates of the population mean.

(Diff) 28.) Suppose I was interested in knowing the relationship between goal difﬁculty
and goal commitment (a fairly common topic in my ﬁeld). In an attempt to investigate
this relationship, I draw a sample of 15 people, give them a goal for a task, and get the
correlations between goal difﬁculty and commitment. The correlation from these 15
people is -.48. The t-score for this correlation would not be

a. -1.73

b. 1.73

c -1.97

d. b and c

For questions 29 - 33, use the following scenario

Suppose I know that the mean of the population of grade point averages at MSU is 2.35.
Suppose further that I get a sample of GPA’S from an unknown source, and the scores
are: 3.3, 3.6, 2.2, 2.4, 3.1, 1.6, 1.4, 2.9, 3.9, 2.5, 3.4, 3.3. The mean of these scores is 2.8
and the standard deviation is .79.

(Diff) 29.) The t-score for this mean is not
a. .45
b. .57
c. -1.97
d. all of the above

(Mod) 30.) What would the degrees of freedom for this t-test be?
a. 12
b. 11
c. 132
d. either a or b depending on the situation

148

(Diff) 31.) We would fail to reject the null with
a. a one-tailed test which examines the lower 5% of the distribution
b. a two-tailed test with a level of signiﬁcance of .05
c. a two-tailed test with a level of signiﬁcance of .01
d. all of the above

(Mod) 32.) Which of the following would happen if the sample size were doubled?
a. The t-score that I calculate would be larger and the score that I use from
the Table would be smaller
b. The t-score that I calculate would be smaller and the score that I use
from the Table would be larger.

c. The t-score would be larger and the score from the Table would be
unaffected
d. Neither the calculated t nor the score from the Table would be affected

(Diff) 33.) If the sample variance were doubled, it is not the case that
a. the t-score that I calculate would be larger and the score that I use from
the Table would be smaller
b. the t-score that I calculate would be smaller and the score that I use
from the Table would be larger.
c. the t-score would be smaller and the score from the Table would be
unaffected
(1. Both a and 0 apply

(diff) 34.) Suppose I was interested in knowing the relationship between goal difﬁculty
and goal commitment (a fairly common topic in my ﬁeld). In an attempt to investigate
this relationship, I draw a sample of 15 people, give them a goal for a task, and get the
correlation between goal difﬁculty and commitment. The correlation from these 15
people is -.48. The degrees of freedom for the t-test for correlations is not

a.) 13

b.) N-l

c.) N-2 .

d.) This applies to none of the above

(Easy) 35.) Which of the following is true of an unstandardized regression weight (i.e.,
b in a regression equation with a single predictor)?

a. It is the amount of change in the dependent variable associated with a

unit change in the independent variable

b. It is the amount of change in the independent variable associated with

a unit change in the dependent variable

c. It is the correlation between the independent and dependent variables

(1. It is the covariance between the independent and dependent variables

149

(Diff) 36.) Suppose I know the number of times each Michigan resident has been
swindled by Gov. Engler (So, I have access to this population of scores). I then take
many different samples of 15 people each and calculate the mean for each sample. If the
mean of the means were 7.8 and the standard deviation of individual scores were 2.2, the
population mean and the standard error of the mean would not be

a = 2.79, = .57

b. = 7.8, = 2.2
c. either a or b
d. all of the above
(Diff) 37.) I am interested in predicting graduate school GPA from GRE scores. In an

attempt to do this, I take a sample of 10 graduate students, record their GRE scores and
their GPA’s. The data look like this

 

subject # GRE (X) _GLad School GPA (Y)
1 730 2.9
2 1120 3.1
3 1310 3.9
4 810 3.2
5 960 ' 3.0
6 1250 3.5
7 1180 3.3
8 1410 3.7
9 840 3.4
10 660 3.1

If the correlation between these two variables is .75, it could not be said that GRE scores
account for

a. 68% of the variance in GPA

b. 95% of the variance in GPA

c. 57% of the variance in GPA

(1. both a and b

 

150

(Diff) 38.) I am interested in examining the relationship between GRE scores and GPA
in graduate school. In an attempt to do this. I take a sample of 10 graduate students,
record their GRE scores and their GPA’s. The data look like this: (by the way, GRE
scores can range from 400 - 1600) '

 

subject # GRE (X) Grad School GPA (Y)
1 1230 3.4
2 1120 3.1
3 1310 3.4
4 1210 3.9
5 1360 3.8
6 1250 3.5
7 1180 3.3
8 1410 3.6
9 1380 3.4
10 1100 3.6

If the covariance between these two variables is 7.44, this (without any knowledge of the
variances) would not tell you

a. anything

b. that there is a Strong, positive relationship

c. that there is a weak, positive relationship

d. This applied to none of the above

For questions 39 - 40, use the following scenario.

Suppose I am interested in assessing the relationship between schizophrenia (measured
with a 10 point scale) and job performance (measured on a 10 point scale). To do this.
I collect data on both of these variables for 30 people.

(Mod) 39.) If the scatterplot for the data looked like the following, what kind of
relationship would it be? '

a. imperfect negative
b. imperfect positve
c. perfect positive

d. none of the above

(Diff) 40.) If the scatterplot looked like the following, it would not be a(n)

a. imperfect negative relationship
b. imperfect positve relationship
c. positive relationship

d. Both b and 0 apply

151
(Easy) 41.) Which of the following correlation coefﬁcients represents the suongeSt

relationship?
a. .68
b. .22
c. -.46
d. -.82

(Easy) 42.) Which of the following correlation coefﬁcients represents the weakesr

relau'onship?
a. .68
b. .22
c. -.46
d. -.82

(Easy) 43.) Which of the following covariances represents the strongest relationship?

a. 4.28

b. 17.71

c. -24.94

d. can’t tell

(Easy) 44.) What is the probability of rolling a 1 then a 2 then a 3 then a 4 in four rolls

of a die?
a. .005
b. .51
c. .00077
d. .67

(Easy) 45.) What is the probability of rolling a 1 or a 2 or a 3 or a 4 in a single roll of

a die?
a. .005
b. .51
c. .00077

d. .67
(Easy) 46.) What is the probability of rolling a 1 then a 2 pr; a three then a four in two

rolls of a die?
a. .11
b. .055
c. .67
d. .00077

152

(Diff) 47.) Suppose I know that the mean of the population of grade point averages at
MSU is 2.35. Suppose further that I get a sample of GPA’S from an unknown source,
and the scores are: 3.3. 3.6, 2.2, 2.4, 3.1, 1.6, 1.4, 2.9, 3.9, 2.5, 3.4, 3.3. The mean of
these scores is 2.8 and the standard deviation of these individual scores is .79. If I m
that the null hypothesis were actually false (This would never actually happen), and I used
a one-tailed test that examined the lower 5% of the disuibution, I would not commit

a. a Type 1 error

b. a Type 2 error

c. an error

(1. either a or c

For questions 48 - 50, use the following scenario

Suppose I know that the mean number of "beauty surgeries" that Phyllis Schlaﬂy has in
a day is 3.1 (a population mean), and the standard deviation of these individual scores
(sigma) is 1.25. I then receive data on the number of surgeries that an unknown nazi
fraulein has each day for ten days. The mean for these ten days is 3.7.

(Diff) 48.) The z-score for this mean is not
a. 4.8
b. 1.54
c. .6
d. both a and c

(Mod) 49.) What is the null hypothesis for this test?
a. The number of beauty surgeries that Phyllis Schlaﬂy has is from the
sample for the unknown nazi
b. The sample is from the population of Phyllis Schlaﬂy surgeries
c. The sample has a mean that is larger than the population mean
d. none of the above

(Diff) 50.) [fl knew that the null hypothesis were actually true (This would never actually
happen), and I used a one-tailed test that examined the upper 5% of the distribution, I
would not commit

a. a Type 1 error

b. a Type 2 error

c. an error

d. all of the above

153
For questions 51 - 52, use the following scenario.

Suppose I measured the height (in inches) of MSU football players and found that they
were normally disuibuted with a mean of 75 and a standard deviation of ﬁve.

(Diff) 51.) The percentage of MSU football players that can be expected to be between
70 and 80 inches tall is not

a. 68%

b. 95%

c. roughly two-thirds

d. This applies to none of the above

(Mod) 52.) Approximately what percentage of MSU football players can be expected to
be between 65 and 85 inches tall?

a. 68%

b. 95%

c. one-third

d. either a or b

(Mod) 53.) What would the Z-score be for an MSU football player who was 83 inches
tall?

a. 1.6

b. 16

c. .16

(1. none of the above

(Diff) 54.) Referring to the Z-score that you just calculated, the percentage of the heights
that you would expect not to fall above this score is

a. 5%

b. 95%

c. either a or b

(1. none of the above

(Easy) 55.) Which of the following is the term used to describe the extent to which the
scores in a set of scores congregate in the tails of the distribution?

a. skew

b. positive skew

c. potato Stew

d. kurtosis

154

(Easy) 56.) Which of the following is the term used to describe the extent to which a
distribution is asymmetrical?

a. Skew

b. positive skew

c. p0tato Stew

d. kurtosis

(Easy) 57.) Which of the following is a property of the normal distribution?
a. It is skewed
b. It is unimodal
c. It is leptokurtic
d. It is platykurtic

(Easy) 58.) The variance is basically which of these?
a. the average squared deviation from the mean
b. the sum of the squared deviations from the mean
c. the sum of the absolute deviations from the mean
d. the average absolute deviation from the mean

(Easy) 59.) Which of the following is an advantage of the mean as a measure of central
tendency?

a. It is greatly affected by extreme scores

b. It can be manipulated algebraically

c. It is not greatly affected by extreme scores

d. It is difﬁcult to calculate

(Easy) 60.) Which of the following is a disadvantage of the mean as a measure of central
tendency?

a. It is affected by extreme scores

b. It can be manipulated algebraically

c. It is not greatly affected by extreme scores

d. It is difﬁcult to calculate

(Easy) 61.) Consider the following set of numbers:
4, 15, 7, 5, 1, 5, 6, 8

Using these numbers, calculate the percentile for a score of 8.
a. .125
b. 8.75
c. .7
d. .875

155

For questions 62 - 63, use the following scenario.

Suppose I wish to assess people’s height and political afﬁliation. To do this,I take 100
people and record their height in inches. I also assign them a 1 if they are democrat, 2
if republican. and 3 if they are other.

(Diff) 62.) Height in inches is not a
a. nominal scale
b. ordinal scale
c. interval scale
d. all of the above

(Mod) 63.) What type of scale would political afﬁliation be?
a. nominal
b. ordinal
c. interval
d. none of the above

(Easy) 64.) Which of the following is clearly an interval scale?
a. Height in inches
b. Height in centimeters
c. Temperature (in Celsius)
(1. Gender

(Easy) 65.) Which of the following is clearly a ratio scale?
a. Weight in Kilograms
b. Class standing
c. Grade Point Average
d. Temperature (in Celsius)

For questions 66 - 69, use the following scenario.

For some strange and terrible reason, I am interested in knowing the average number of
white collar crimes committed per day by Sen. Bob Dole over the past 2000 days. In an
attempt to estimate this value, I randomly choose twenty days from these 2000, count the
number of white collar crimes he committed on each of those 20 days, and get the
average of those twenty numbers.

(Diff) 66.) My sample size is not
a. twenty
b. the average for the twenty days
c. ﬁfty Million
(1. either b or c

156

(Diff) 67.) The parameter that I am trying to estimate is not
a. the average number of white collar crimes per day committed by Dole
over the past 2000 days.
b. the average number of white collar crimes committed per day by Dole
over the 20 days that I measured.
c. the total number of white collar crimes committed by Dole divided by
2000
d. This applies to none of the above

(Mod) 68.) What is the Statistic that I have used?
a. The average number of white collar crimes per day committed by Dole
over the past 2000 days.
b. The average number of white collar crimes committed per day by Dole
over the 20 days that I measured.
c. 2000
d. all of the above

(Mod) 69.) What is the population of interest?

a. The white collar crimes committed per day by Dole over the past 2000 days
b. The white collar crimes committed per day by Dole over the past 20 days

c. The average number of white collar crimes per day committed by Dole
over the past 2000 days.
(1. Either a or c.

Use the following scenario for questions 70 - 74.

For some strange and terrible reason, I am interested in knowing the average height of
the ﬁfty million people who voted for Ross Perot (the second coming of Thurston
Howell). In an attempt to estimate this value, I ﬁnd twenty pe0ple who voted for Perot,

measure their heights, and calculate their average.

(Mod) 70.) What is my population of interest?
a. The cast of Gilligan’s Island
b. The twenty people whose heights I measured
c. All voters
(1. none of the above

(Mod) 71) What is my sample size?
a. Twenty
b. Not enough information provided
c. Fifty Million
(1. none of the above

157

(Diff) 72.) The group that is not the sample that I am using is
a. the ﬁfty million who voted for Perot
b. the twenty people whose heights I measured
c. both a and b
d. none of the above

(Diff) 73.) The numerical value that is not the parameter that I am trying to estimate is
a. The average height of the ﬁfty million people who plan to vote for Ross
Perot
b. The average height of the twenty people that I measured.
c. Fifty Million
(1. both b and c

(Mod) 74.) What is the statistic that I have used?
a. The average height of the ﬁfty million people who plan to vote for Ross
Peror
b. The average height of the twenty people that I measured.
c. Fifty Million
d. either a or b

(Easy) 75.) Which of the following is the term for numerical values used to make
generalizations about a large set of data from a subset of that set?

a. descriptive statistics

b. dependent Statistics

c. inferential statistics

(1. parametric statistics

APPENDIX C

Descriptives for statistical knowledge items

Variable Mean Sthev Minimum Maximum N Label

ITEMAI .73 .44 0 1 165
ITEMAZ .58 .49 O l 165
ITEM .92 27 0 1 165
ITEMA4 .64 .48 0 l 165
l'l'EMA5 .38 .49 0 l 165
ITEMA6 .65 .48 0 l 165
lTEMA7 .75 .44 0 1 165
ITEMA8 .33 .47 0 l 165
ITEMA9 .67 .47 O l 165
ITEMAIO .25 .44 0 l 165
l'I'EMAll .28 .45 0 l 165
I'I'EMA 12 .75 .43 0 1 165
lTEMA13 .71 .46 0 1 165
lTEMAl4 .23 .42 0 1 165
1'1'EMA16 .63 .48 0 1 165
1'1'EMA17 .67 .47 0 1 165
ITEMA18 .32 .47 0 1 165
ITEMA19 .54 .50 0 1 165
W .33 .47 0 1 165
ITEM! .44 .50 0 1 165
ITEMAZ’. .84 .37 0 1 165
1'1'EMA23 .46 .50 0 1 165
ITEMA24 .54 .50 0 l 165
W .67 .47 0 1 165
ITEMA26 .27 .45 0 1 165
1TEMAZ8 .78 .41 0 1 165
1TEMA29 .27 .44 0 1 165
lTEMA30 .15 .35 0 1 165
ITEMABI .27 .44 0 1 165
W .61 .49 0 l 165
ITEMA33 .75 .43 0 1 165
I'I'EMA34 .45 .50 0 1 165
l'l'EMA35 .68 .47 0 1 165
W .68 .47 0 1 165
l'I'EMA37 .60 .49 0 1 165
ITEMA38 .58 .49 0 1 165
I'I'EMA39 .32 47 0 1 165
ITEMA40 17 .38 0 1 165
1TEMA41 .36 48 0 l 165
I'I'EMA42 .28 45 0 1 165
ITEMAM 19 40 0 1 165
W45 68 47 0 1 165
ITEMA46 88 .33 0 1 165
lTEMA-W 43 .50 0 1 165
ITEMA48 44 .50 0 l 165
”EMA-89 .35 48 0 1 165
lTEMA50 65 48 0 1 165
W1 .50 .50 0 1 165
I'I'EMA52 .55 .50 0 1 165
1'1'EMA53 .54 .50 0 l 165
I'I'EMA54 63 48 0 1 165
ITEMASS 7O 46 0 1 165
W6 49 .50 0 1 165
1TEMA57 .39 49 0 1 165
FIEMAS9 43 .50 0 1 165
ITEMAGO 64 48 0 1 165
ITEMA61 47 50 0 1 165

158

ITEMAI3

1'1'EM1

1.006
.1”
. 1797‘
. 1994‘
.1566‘
.0807
.1825‘
.0409
.1051
.663
.1610‘
.2558“
. 1871‘
.1020
.2479“
.1344
.1213
.1027
.0701
. 1713‘
.1037
.1173
.0477
.1551‘
.615
.0796
.0847
.0544
.0227
..0019
4296
.0550
. 1722‘
.2694“
-.0168
.21 12“
.0256
.0170
. 1925‘

.2288“

159

.49 0 l
.46 0 1
.50 0 1
.44 0 l
.47 0 1
.5 0 0 1
.32 0 1
.50 0 l
.5 0 0 1
.48 0 1
.49 0 1
.50 0 1
50 0 1
.50 0 1
.50 0 1
.48 0 1
.46 0 1
.32 0 1
- - Corrdniou Coefﬁcients - -
ITEMZ 111-2MB ITEM4
.1000 .1797‘ . 1994‘
1.006 . 1625‘ .632
. 1625‘ 1.000) .2466“
.0232 .2466“ 1.006
.0996 -.0982 .0922
-.0474 .1187 .602
.0969 .1906‘ . 1946‘
.0938 .0122 .1513
. 1681‘ .3234“ .1172
.0441 .0160 -.07 89
.0887 .1317 .1328
.3086“ .1442 .610
.651 .1099 .1261
.0844 .1065 .0843
.0889 .690 .2041“
.1157 657 .0903
.1096 - 0397 .0343
.1286 .1811‘ .2114“
-.1157 - 0357 .0439
.623 - 0148 .1062
.1232 .1139 .0743
-.0547 .0446 .0414
.ﬂ65“ 1359 .0850
.0782 .0795 .0267
.0226 .0276 .0952
.147 l 2268“ .682
.0667 .0746 .1709‘
.168 -.0)70 .0260
.667 .1255 -.0285
. 1825‘ .1365 . 1998‘
.1380 .0401 .618
.0830 .2218“ -.690
.0746 .0397 .3164“
.0861 .0921 .368“
.0602 «692 .600
.1282 -.0199 . 1254
.0726 .0531 -.0567
.1214 .0123 .0732
.0172 .0774 .0382
.0995 .1347 .1142
-. 1435 -.0841 -.0116

-. 1202

1'1'EM5

.1566‘

-.0982

.0922

1'1'EM6

.0807
-.0474
. 1 187

.1771‘

.275700

.646
-.0439

.1289

. 1855‘

.0742

.1020

.0814

.2617“

.1176
-.0183
.1299
.1181
.0454
.1292
.0692
.1227
-.0252
-.(X)83
.0181
-.633
.1641‘

.325.—

.657
.0405
.0969
.1159
.1166
.1530‘
-. 1356

um: new
.0409 .1051
.0933 .1681‘
.0122 .3234“
.1513 .1172
.0989 ..0722
.1264 .4168“
.0518 .1855‘
1.0000 .1286
.1286 1.0000
.1262 .0014
.0560 .1456
.1918- .1668‘
-.0367 .0652
.0786 0134
.1596- 1883-
.0736 .1568“
.1341- .0925
.0144 .2106“
.0916 ..0090
.0635 .0668
.0990 .0755
.0651 ..0292
.1004 .1847-
.1096 .0274
.0659 .0211
.1133 .1006
.0175 .15-77-
.0420 -.0420
.0175 .0701
.0731 .1075
.0723 .1967‘
.0896 .0920
.1202 .0453
.1395 .1335
..0105 .1160
.0933 .0371
.0273 .1395
..0400 .1433
.0186 .1161
.2130" .0682
-.1788° -2132»

mm 1137111
.663 1610‘
.0441 0887
.0160 1317

-.07 89 .1328

-.612 .0200
. 1904‘ .1674‘

..0099 .1771‘
.1262 .660
.0814 .1456

1.0000 «620

a- 1.0000
.0784 294'
.667 .1115
.1100 3179'
.0728 .0842

-.0965 .692

-.0146 .644
.676 0593
.0372 .0560
.0469 .1070

-.08(X) .0193
. 1020 . 1034
.655 .1949‘
.0000 1242
.0795 - 0166
.655 .1321
.666 .0835
.0746 .0118
.652 .624
.0369 .1343

-.0181 .0447
.0813 .696
.1338 .0803
.1568‘ .0726
-.0909 .1214
-. 1251 -. 1031
.0229 .0437

0323 .1870‘
-.0296 .0720
.61 1 110‘
.0301 -.0999

FTENLA45
TTEDLA46
FTENLA47
TTELLA48

1T10812

1182419

FTE3421

YYELAZ4
fTE0425
1150426
1110828

.0548
.0700 -
.1366
«0147

YTEBLAl TTEBLAZ

.0421
.1301 -
.0311
«0202
.1301
.1628‘
.1180
.0713
.0374
.1089
.0570
.0421
.0330
.0280
«0604
.0393
.0629
.0805 -
.0458 -
.1280 -
.0640 .
.0208
.1830‘
.1998‘

.1273
.0513
.0171
.1862‘

.1352
.1095
.1157
.0013
.2026“
.2416“
.1481
.0952
.0046
.1412
.0743
.2020“
.1247
.0944
.0398
.1000
.0596
.0501
.0406
.0358
.0380
.1398
.0018
.1425

.0812 .0916
.1776‘ .0197

.1390

.0747

.0346 .1250
.0696 «0024

.0029

.2558“
.3086"
.1442
.0610
.0697
.0542
.2757.—
.1918‘

.0251
.1099
.1261
«1368
«0444
.0546
«0367
.0652

.0364

FTEL‘LB TTE0814
.1871‘

«0085
«1086
.0724
.0340

TTENLKB

«0203
«0740
«0207
«0828
.0908
.0556
.0069
.1072
.0516
.0724
.0128
.0932
.1422
.1053
.0967
.1289
«0047

.1700‘

.0350
.0314
«1214
.0090
.0053
.0897

2037--

.1323
.0551
.0705
«1009
.0350

.1020

.0844
.1065
.0843
«0083
.1249
«0439
.0786
.0134
.1100
.2699“
.1147

.2236“
L0000
.1505

.0134
.1170
.1012
.0173
«0169
.0863
.0721
.1300
.1425
.1499

.0450

.2479“
.0889
.0090
.2041“
.2053“
.1301
.1289
.1596‘
.1883‘
.0728
.0842
.1697‘
.0623
.1505
L0000

160

.0736
.1825‘

10575

- Ab-

«0369

.0288
.0240
.0550
.0023
«0414
«1091
.0326
.1375
.0422
.1226
«0214
.0505
.0627
.0602
«0413
.0285
.0838
.1582‘
.0359
.0733
.1535‘
.0736
.0378
.1425
«0046
.1263
.0940
«0072
«0498
.0359

.1344
.1157
.0357
.0903
.1411
.0909
.1855‘
.0736

«0965
.0592
.1967‘
.1220
.0134

.1202

.1562‘
.1709‘
.1454
.0485
.1070
.1644‘
.0211
.1319

.1585‘
.0964
.0587
.0647

ITENLA4 YTENLKS

.1627‘
.0732
. 204
.0958
«0613
.0757
.1482
.1393
«0109
.0587
.0402
.0518
.0981
«0161
«0829
.0717
.2031“
.1174
«1231
.2018“
.1487
.1276
.0077
.0362
.0081
«0017
«0172
.0416
.0602
«0839

.2568“

.6699“
.6699“ L0000
.0160
.1487
.1328
.1675‘
.1024
.0024
.0731
.0178
.0461
«0398

.2624“

.1213
.1096
«0397
.0343
.0291
.1176
.0742
.1841‘
.0925
«0146
.0644
.0351
.0120
.1170
.0160
.1202
L0000
.1149
«0649
.1014
.2341“
.0413
.2190“
.0734
.0159
.1748‘

.0189
.0817
«0379
.0569

.0010
.0257
.0171
«0401
«0321
.2093“
«0259
.0760
.0901
«0122
.0072
.0664
.0852
.0857
«1035
.0519
.1382
.1457
.0225
«1182
«0348
«0547
.0943
.0065
«0637
.1124
.0584
«0619
‘«0756
«0574

.1344
.0388
.1425
.0443

FTENhké 1TEBLA7

.0514
.0652
.0870
.1445
.1020
.0136
.0770
.1564‘
.1010
.1144
.1946’
.1283
.0971
.0465
«0598
.1825‘
.1427
.0527
«0507
.0177
.0936
.1577‘
.0924
.1771‘
.0301
.1785‘
.1980‘
.0442
«0385
.0800

.1027
.1286
.1811‘
.2114“
«1115
.0958
.1020
.0744
.2106“
.0376
.0593
.2565“
.0239
.1012
.1487
.2624“
.1149
L0000
.0744
.0776
.1829‘
.0489
.0974
.2235“
«0347

.0701
«1157
«0357

.0439
«0078

.0721

.0814

.0916

L0000
.4541“ L0000
.0292
«1522
«0033
.1918‘
.1529‘
.1890‘ .0870

.0372
.0612
.0721
.0548

TTENLAS

.0275
«0005
«0042
«1242

.0485

.0258

.1424

.1419

.0721
«0583

.1245

.2278“

.0352

.1424
«1430

.0701

.0561

.1514
«0898
«0728

.0728
«0010

£2211“

.1329

.0721

.1346
«0997

.0909

.0179

.0316

.1713‘
.0523
«0148
.1062
.0743
.0224
.2617“
.0635
.0668
.0469
.1070
m2. 0
.0793
«0169
.1675‘
.1709‘
.1014
.0776
.4541“

.2240"
.0450
.1021
.1815‘
.1472
.1097

.0734
«1007
«0199

.0752

.0742
«0814

.0397

«1097

.0799
.0234
.0376
.0728
«0465
«0172
«1010
.0542
«0210
«0167
.0175
«0161
.0319
«0252
«0828
.0309
.0071
«1291
«0101
«0136
«0350

.0542
.0725
«1423
«1319
.0083
.0071

.1578‘

.1382
.0238
.0875
.1537‘

TTENLAO 102001

.0518
.0897
.0773
.0987
.1136
.0282
.1083
.0383
.0796
.0875
.1328
J770‘
.1683‘
.1674‘
.0881
.0693
.1599‘
.0818
«0126
«1270
.0459
«0278
.1754‘
.0763
.0056
«0415
.0535
.0315
.0312
«0549

1'1'EMl6 1'1'EMl7 1'1'EM18 ITEMI9 ITEMZO 1'1'EM21 mzmz ITEMZ

.1037 .1173
.1232 «0547

.1139

0446

.0743 .0414
.0049 «0140
«0113 .0321
.1176 «0183

.0990
.0755 -
«0800
.0193
.1627‘
.1135
.0863
.1024
.1454
£41..
.1829‘
.0292
.2240“
L0000
«0843
.0843
.2085“
.0134
.2027“

.0551
.0292
.1020
.1034
.1093
.0029
.0721
.0024
.0485
.0413
.0489

«1522

.0450

«0843
L0000
«0730

.0860
.0074
.0760

 

1'1'EM81
ITEMSZ

l'I'EMl

1TEM3
1TEM4

1'1'EM6

1TEM8
1TEM9
ITEM 10
ITEM 11
11'EM12

161

.2199“ «660 .1259 .1779‘ .1577‘ .648 «0752
.0781 .0372 «624 .611 .0313 .1212 .619
«638 «0362 .682 .0927 .046 «0333 .0898
.628 .652 .614 .0603 «616 .2013“ .1128
.611“ .0640 «0186 «646 .0473 .651 .632
.0179 «0853 .1945‘ .688 .620 .1801‘ .622
.2052“ .0166 .2530“ .1722‘ .0734 .1119 .1715‘
.2439“ .625 .2471“ .0750 «0283 .1314 . 1583‘
.1603‘ «699 «623 «06 15 .0105 .618 .2134“
.1949‘ .0251 .0260 .1398 .1943‘ . 1622‘ .1040
.1184 «1687‘ .627 .0871 .0839 .662 «613
.1105 «0659 .1745‘ .1790‘ .2121“ .1731‘ .634“
. 1656‘ .624 .0424 .2046“ .1431 .166 .2328“
.2075“ «688 .1651‘ .661 «0177 .673“ .0713
«617 .0779 .0229 «1959‘ « 186‘ « 1405 «688
.1451 «0120 .680 .2529“ . 1841‘ . 1952‘ .1715‘
.0872 «635 .1149 .0233 .676 .666 .666
. 1032 .0446 . 1351 .1584‘ . 1627‘ .675 . 1646‘
.1734‘ .0870 .1214 .1514 .2052“ .1973‘ .1132
ITEMA12 1'1'EMAI3 l'1'EMAl4 I'I'EMA16 ITEMAN ITEMl8 lTEMl9
.0121 .644 .146 .0116 «0815 .644 «673
.0760 .695 .0409 .0147 .617 .1259 .1091
.1297 .673 .1406 .2432“ .2626“ .1386 .0542
.0173 .632 .602 .667 . 1242 .0201 «0999
.683“ .66 .136 .1487 . 1847‘ .146 .1950‘
.645 «0483 .614 .677 .610 .1235 «624
.1174 .618 .0405 .618 .2251“ .1346 .1445
.1438 .0150 .0963 .1242 .690 . 1553‘ .0562
«644 .621 .698 .665 «0192 . 1892‘ «1259
.1315 .685 .1351 .1838‘ . 1888‘ «0998 «673
.1484 .0151 .645 .614 .176‘ .1153 .1356
.66 «695 .1520 «638 .61 1 .11 10 .0845
.1342 .601 .647 .0725 .1083 .1264 .617
.3015“ .610 .0405 .0793 .638 .0778 .1977‘
.688 «617 .687 «641 «681 «634 .1444
.1290 «643 .669 .160 . 1928‘ .186‘ .1301
.0929 «611 .1232 .2912“ .2775“ .0755 .636
.1227 .086 .1497 «652 .0813 .698 .0879
« 161 « 1452 «620 «0777 «1530‘ .0771 «0856
.615 «607 «0742 .1111 .1245 «0858 «66
«615 «0194 «0122 .1903‘ .1598‘ «076 .0450
.645 «0483 «0880 .2457“ .1883‘ «016 .0227
.2489“ «610 .0975 .639 .689 .1049 . 1617‘
.1656‘ .629 .0432 .1284 .0744 .634 .0977
.1315 .1524 .1351 .617 . 1627‘ .0837 .0664
.3055“ «642 .612 .638 .1243 . 1492 . 1836‘
.0177 .629 .620 «0702 «696 .642 .681
.638 «0117 «639 .683 . 1264 .1280 .676
.0740 «0422 .0152 .652 .016 . 1677‘ «0257
«0122 .620 « 1522 «684 «0721 «642 «0476
ITEMZ4 ITEMZS ITEMZ6 ITEM28 ITEMZ9 1TEM30 1TEM31
.0477 . 1551‘ .615 .0796 .0847 .644 .627
.2765“ .0782 .626 .1471 .667 .168 .667
.1359 .0795 .676 .2268“ .0746 «670 .1255
.0850 .667 .652 .682 .1709‘ .660 «685
.0140 .0442 .1430 «0446 «106 .648 «0151
.1213 «0811 «0702 .0791 .1499 .0467 .658
.1299 .1181 .0454 .1292 .692 .167 «652
.164 .1096 .659 .1183 .0175 .0420 .0175
. 1847‘ .674 .611 .166 . 1577‘ «0420 .0701
.655 .66 .0795 .655 .666 .0746 .652
. 1949‘ . 1242 «0166 .1321 .0835 .0118 .624
.262“ .2182“ .687 .2735“ .2199“ .07 81 «638

.105 1
.653
.1051
«645
.0424
«620
.2032“
. 1673‘
.016
.0152
«661
«656
«622
.0749
.0499
.672
.616
.0460
.688

1'1'EM20

«0807

.607
.616
.616
.066
«612
«1120
.0644
« 1394
«623

. 1998‘
«601
.0756
«683
.0781
.1075
.669
.1343
.628

.0774
. 1569‘
.0497
.633
.652
.658
.6 89“
.686“
.1447
.1761‘
.607
.680
.620
.1216
«1534‘
.1604‘
.647
.0745
«0702

ITEMZl
«647
«0177
.1 169
.66
.262“
.0410
«0433
«673
.046
.0745
.608
.633
.0479
.1439
«688
.632
.0971
.605
.654
.071 1
.667
.0156
.666
.641

«696
.1380
.0401
.618

«0172
.647
.0181
.0723
.1967‘

«0181
.0447
.611“

.674
«0498
.0445
.613
.0868
«668
. 1869‘
.1231
.667
«697
.685
.690
. 1591‘
.677
«1145
«638
«0137
.666
.642

ITEM6
.612
.661

«0137

«695
.6 14

«1012
.07 11

«644
.613

«0457
. 1424
.654
.6 18
.652
.651

«W4
.123 1
. 1664‘
.656
.626

«698
.164

«687
.0472
.666
.0788
. 1616‘
.086
«692
.669
“M4
.650
.630
.618“
«690
.121 1
.0745
«633
.696
.0920
.613
.696
.0179

. 1027

.626
«623

.1367

. 1937‘

.1088
«0153

.649

.0439
.636

:m9--
.664

«674
. 1569‘

.3164“

.1641‘
.126
.0458
.1338

22052--

1119813
FTE5414
11E5416
rr25417
1113818
11E5119
1113820
FTE3421
1119822
FFEBI23
11E3824

ITE5426
1110828
1113829
FTE3430

FTE3133

1113838

1110841
rr25442
1112844
1110845
1112846

11E3848

YTEA€76

11Ebf78
1110879
FTEhd81
ffEhdSZ

.0T74
.1300
.0731
.1070
.2190“
.0974
«0033
.1021
.0843
«0730
L0000
.0430
.1564‘
.2184“
.1448
.2088“
.0348
.2126“
.0877
.0866
.2236“
.2106“
.0894
.1533‘
.0772
.15 86‘
.1313
.1522
«1003
.2236“
.1039
.0909
.1377

1112824
«0582
.0073
.1029
.2424“
.1950‘
.0983
.0913
«0168
.1478
.0664
.1608‘
.0601
.1573‘
.0381
.0713
.1851‘
«1034
.0636
.0667
«0693
.0450
.0479
.0362
.0489
.0418
.0374
«0392
.0576
.0273
.1048

.1415 .0626
. 1425 . 1499
.0178 .0461
.1644‘ .0211
.0734 .0159
.2235“ «0347
.1918‘ .1529‘
.1815‘ .1472
.2085“ .0134
.0860 .0074
.0430 .1564‘
L0000 .0289
.0289 LOOOO
.1868‘ .0599
.0775 «0308
.0365 «0211
«0678 .0000
«0088 .0127
.0694 .1002
.1291 .1242
.0092 .0715
.0461 .0932
.0262 «1389
.1303 .0777
.0092 .0240
«0913 .0132
.0179 «1162
.0190 .0959
.0542 «1283
.0642 .1007
.1707‘ .1023
.1212 «0100
.0863 «0249
rrEmazs 11E1426
«0180 «0518
.1795‘ «0337
.1200 «0718
.0345 «0497
.1977‘ .1291
«0089 .0461
.1032 .0704
.0514 «0025
«0614 «1038
.1731‘ «0100
.1871‘ .0669
.1203 «1091
.0817 «0157
.1313 .0406
.0258 «1165
.0388 «1231
«0092 .0053
.0944 «0396
«0269 «0930
.0086 «1683‘
.1200 .0866
.0444 «0103
.2212?‘ «0587
.2149“ «0472
.1212 .0999
.1890‘ .1364
.0601 «0149
.0270 «0156
«0093 «1077
«1880‘ «1356

.1140
.0450
«0398
.1319
.1748‘
.1890‘
.0870
.1097
.2027“
.0760
.2184“
.1868‘
.0599
L0000
.1195
.1347
«0796
.1818‘
.1716‘
«0188
.0137
.0523
«0120
.0579
.1057
.1215
.1186
.1383
«0749
.202360
«0163
.2220“
.0865

1113828
.0816
.1336
.0619
.1137
.1890‘
.2034“
.2026“
«0683
.0956
.0738
.1192
.1118
.1223
.2990“
.0005
.1128
.0838
.0224
«0393
«0721
.0427
.0818
.1052
.1643‘
.2813“
.1530‘
.0069
.1060
«0348
«0853

162

«0060
.1259
.1779‘
.1577‘
.0548
'«0752
.1051
.0774
.0074
.1027
.1448
.0775
«0308
.1195
LOOOO
«0155
«0847
.1144
.0930
.0826
.2094“
.2321“
«2350“
.1778‘
.1514
.0925
.0648
.0142
«0185
.0626
.1820‘
.1402
.1251

rrEmmz9
«1856‘
.0421
.0786
«0349
.1723‘
«0208
«0280
.0658
.0748
.1402
.0285
.1502
.1080
.0320
«1043
.1777‘
«0039
.1939‘
«0458
«0183
«0091
.0927
.1000
.0477
.1679‘
.1245
.0805
«0058
«0398
«0887

.0372
«0624
.0311
.0313
.1212
.0019
.0053
.1569‘
«0498
.0326
.2088“
.0365
«0211
.1347
«0155
LOOOO
.0622
.0109
.0383
«0314
.0629
.0579
.1614‘
.1058
.1642‘

.1275
.0132

1119830
.0923
.0157
.1006
.1301
.1398
.1379
.1177
.0419
.1599‘
.0234
.0617
.1309
.0224
.1177
.0225
.0155
.0209
«0863
.0666
«0469
.2533“
.0311
.0716
.1016
.0234
.1103
«0513
«0105
.1394
.0127

«0362
.0282
.0927
.0409

«0333
.0898
.1051
.0497
.0445

«0623
.0348

«0678
.0000

«0796

«0847
.0622

1110831
.0727
«1014
«1133
.0202
«0752
.0360
.0320
.0658
.1028
«1089
.0570
.0403
.0516
«1180
.0055
.0847
«1219
.0293
«0458
.0914
«0091
«0776
«0698
.0752
«0258
«0952
«0293
«0346
«0398
«0887

.0329

1110832
.0650
«0129
.0297
.0574
.25254o
.2664“
.0271
.2094“
.0818
.0889
«0846
.0465
.1788‘
.0815
.0313
.0544
«0045
.0836
.0923
«0754
.0008
«0686
.1272
.0619
.1140
.1030
«0089
.0552
«1517
.0144

.0640
«0186
«0046

.0473

.0351

.0032

.0424

.0252

.0868

.1937‘

.0877

.0694

.1002

.1716‘

.0930

.0383
«0655

.2906“
L0000

.1306
«0351
«0278

.1603‘

.1380

.0580

.1105

.1364
«0411
«1081

.1150

.0872
«0101
«0525

1113833
.0121
.0173
.1017
.0455
.1721‘
.1407
.1788‘
.0316
.0905
«0101
.0318
«0525
.0866
.1174
.0388
«0296
«0882
.0385
«1001
.0595
.0527
.0245
.1620‘
.0249
.1598‘
.1368
«0104
«0247
«0786
«0122

«0853
.1945‘
.0688
.0920
.1801‘
.0622
«0920
.0558
«0568
.1088
.0866
.1291
.1242
«0188
.0826
«0314
.1376
.2271“
.1306
L0000
«0237
«0357
.0745
.0583
.0881
.1710‘
.0046
.1520
.0140
.1327
.0407
.0179
«0045

TTELC&4
.0672
«0162
.0797
«0089
.1110
.1948‘
.0339
.1749‘
.0611
.0670
.1334
.0732
.2291“
.0605
«0842
.1651‘
.0429
«0022
«0624
«0155
.0886

' -0825

.0548
«0377
«0067

.0976

.0266

.0023

.0072
«0624

.0166
.2530“
.1722‘
.0734
.1119
.1715‘
.2032“
.2389“
.1869‘
«0153
.2236“
.0092
.0715
.0137
.2094“
.0629
.0333
«0149
«0351
«0237
LOOOO
.8744“
.1007
.2588“
.0755
«0002
.1341
.204130
.0092
.0549
.1820‘
.2571“
.1163

1119835
«0101
.0644
.0950
.0582
.0934
«0966
.0074
.0264
.1296
.0211
.1545‘
.1231
.0079
.0358
.1314
.0548
.1200
.3540“
«0365
.0338
«0079
.1185
.0827
.1670‘
.1260
.0069
.1137
«0189
.0582
.0042

11E341
11E342
111083
111084
111085
11E346
11E517
11E548
111389
1110810
11E3411
1110812
11Ehdl3
11E3414
1113816
11E5417
11E3818
11Eh419
11E3420
1110821
11Eh£22
11Eh£Z3
1113424

1110836

“2694..
.0861
.0921
.3008“
.0415
.1930‘
.2325“
.1395
.1385
.1568‘
.0726
.2439“
.0825
.2471“
.0750
«0283
.1314
.1583‘
.1673‘
.2286“
.1231
.0249
.2106“

«0744
.1017
.0302

.0798
«0601
.0445
.0399
.1464
.0362
.1109
.1377
.0244
.0159
.1197

1113437 11Eh438
«0168 2112“
.0602 1282
«0092 -0199
.0000 .1254
«0562 «0018
.1093 «0733
.0057 .0405
«0105 .0938
.1160 .0371
«0909 «1251
.1214 «1031
.1603‘ 1949‘
«0599 .0251
«0823 0260
«0615 1398
.0105 .1943‘
.0318 .1622‘
.2134“ 1040
«0105 0152
.1447 .1761‘
.0067 «0097
.0099 .0439
.0894 1533‘
.0262 .1303
«1389 .0777
«0120 .0579
«2350“ 1778‘
.1614‘ .1058
.0727 «0445
.0102 «0697
.1603‘ .1380
.0745 .0583
.1007 .2588“
.0586 .2448“
L0000 .0602
.0602 LOOOO
«0053 «1125
.1714‘ .0887
.1704‘ «0084
«0877 .0995
.0250 «0503
.1007 .1799‘
.0379 .2498“
.0350 .0668
.0797 . 1615‘
1110837 1110838
.0311 «1479
«1088 .0449
.1287 .1894‘
.1592‘ .1249
.0645 .1779‘
«0103 .0889
.0650 .0675
«0396 «0277
1266 .1052
.0350 .0171
.1800‘ .0998
.0198 .1034
«0827 .0644
«0162 .2019“
.1784‘ .0398

«0561

.0885
.0536
.0772

.0240
.1057
.1514
.1642‘
.2104“
.0313
.0580
.0881
.0755
.0671
«0053
«1125
L0000
.1104
.1471
.0343
«1018
.1314

.0164
.1312

1110839
.0744

«0041
.0871
«0013
«0480
.1268

.0405
.0164
«0296
«0331
.1264
.0983
«0413

163

1110840 1110841 1113442

.0170
.1214
.0123
.0732
.0160
.1247
.1159
«0400
.1433
.0323
.1870‘
.1105
«0659
.1745‘
.1790‘
.2121“
.1731‘
.2234“
«0056
.0580
.0690
.1005
.1586‘
«0913
.0132
.1215

«0002

«0061
.1714‘
.0887
.1104

LOOOO
.1343
.0366
.0233
.2073“
.0195
.0962
.0199

1110840

.0053
.1299
.0618
.0506
.0291
.0452
.1525
.0405
.1642‘
.0310
.0732
.0626
.1504
.0465
.0247

.1925‘
.0172
.0774
.0382
.0217
.1697‘
.1166
.0186
.1161
«0296
.0720
.1656‘
.0324
.0424
.2046“
.1431
.1096
.2328“
«0622
.0320
.1591‘
.2239“
.1313
.0179
«1162
.1186
.0648
.0150
.0076
.0749
.1364
.0046
.1341
.1523
.1704‘
«0084
.1471
.1343
L0000
.0054
«0141
.0799
«0329
.1178
.0738

1110841

.1393
.0461
.0334
«0391
.1059
.0737
.1528
.1527
.0455
.2199“
.0645
.1386
.0880
.0421
«0732

.2288“
.0995
.1347
.1142
.1480
.0349
.1530‘
.2180“
.0682
.0011
.2065“
.2075“
«0688
.1651‘
.0661
«0177
.2273“
.0713
.0749
.1216
.0977
.0364
.1522
.0190
.0959
.1383
.0142
.0443
.0749
.1166
«0411
.1520
.20410.
.2258“

.0995

.0978
.2115“

«0249
.1252
.0383

«0012
.2398“
.1232

.0863
.2440“
.1841‘
.1751‘
«0597

11E1844

«1202
«1435
«0841
«0116
«1590‘
«1272
«1356
«1788‘
«2132“
.0301
«0999
«0017
.0779
.0229
«1959‘
«1806‘
«1405
«0388
.0499
«1534‘
«1145

.0562
«1525
«0336

.0108
«0695
«2594“
«1509
«0217
«1759‘

.0381

«0901
«0785
«0838

.0882

1113845

.0548
.1273
«0085
.0736
.1585‘
.0189
.1344
.0372
.0734
.0742
.1382
.1451
«0120
.0680
.2529“
.1841‘
.1952‘
.1715‘
.0372
.1604‘
«0938
«0674
.2236“

.2508“
.1104
.1976‘
.0109
.1494
.0524
.1030
.1260
.1545‘

.1252
.1494
.1054

1110846

.0700
«0513
«1086
.1825‘
.0964
.0817
.0388
.0612
«1007
«0814
.0238
.0872
«0335
.1149
.0233
.0576
.0566
.0666
.0216
.0647
«0137
.1569‘
.1039
.1707‘
.1023
«0163
.1820‘
.1006
«0700
.0092
.0872
.0407
.1820‘
.1877‘
.0379
.2498“
«0279
.0195
«0329
.1521
«0527
.1820‘
L0000
.0227
.1065
11Ehd46

«1155
.1544‘
.0765
.0385
.1411
.0618
.2057“
.0675
.0334
.0602
.0667
.1613‘
.0553
.0837

«0203

1113847

.1366
.0171
.0724
.1226
.0587
«0379
.1425
.0721
«0199
.0823
.0875
.1032
.0446
.1351
.1584‘
.1627‘
.0575
.1646‘
.0460
.0745
.0866
.0564
.0909
.1212
«0100
.2220“
.1402
.1275
.0572
.0638
«0101
.0179
_257100
.2734“
.0350
.0668
.0164
.0962
.1178
.0210
«0857
.1522
.0227
L0000
.0884
1158647

«2040“
.0758
.2763“
.1930‘
.0664
.1331
.1362
.1015
.2262“
.2089“
.1480
.3157“
.0831
.1094
.0107

 

1113819

1113421

1110824

1118445
1113446
1113847
1118448

.0924
.0733
.3422“
«0414
«0316
.0055
.0210
.1740‘
.1819‘
.1153
.0715
.2062“
«0284
.0215
«0414

1118848

.0863
«0249
.0865
.1251
.0132
«0129
.0329
«0525
«0045
.1163
.0790
.0797
.1615‘
.1312
.0199
.0738
.0867
«0357
.2730“
.1065
.0884
LOOOO

«0168
.0320
«0396
.0233
.0000
«0248
«0359
.0562
.1092
«1399
.0546
.0149
«0052
.1346
.0620

1110849

.0421
.1352
«0203
.0288
.1627‘
.0010
.0514
.0275
«0817
«1097
.0518
.0121
.0244
.1400
.0116
«0005
.0644
«0073
«0807
«0847
.0512
.0327
«0582
«0180
«0518
.0816
«1856‘
.0923
.0727
.0650
.0121
.0672
«0101
‘«0744
.0311
«1479

.0562

.0443
«1155
«2040“
«0680

«0111
.0067
.1467
.0364
.0134
.2079“
.1398
.1287
.0686
.0668
.1675‘
.0747

«0559
.1312
.0364

1110850

.1301
«1095
«0740

.0240

.0732

.0758
.1702‘

.0551
.1513
«0288
«0404
«1772‘
.0728
.0331
«0663
«0249
.0955
«0453
«0234
.0284
«0499
«0404

1113851

.0311
.1157
«0207
.0550
.1204
.0171
.0870
«0042
.0817
.0799
.0773
.1297
.0573
.1406
.2432?‘
.2626“
.1386
.0542
.0216
.1169
«0137
.0430
.1029
.1200
«0718
.0619
.0786
.1006
«1133
.0297
.1017
.0797
.0950
.0302
.1287
.1894‘
«0041
.0618
.0334
«0441
«0336
.2508“
.0765
.2763“
.1532‘

164

.0901 .1068
.0634 .1523
«0838 «0950
«0619 «1503
«0460 «0353
.1106 .1365
.0452 .1261
.0841 .0566
«0614 .0716
«0342 .0667
.0669 .0642
.1162 .0950
.0451 .0962
.0532 .0584
.0392 .0082
1115152 1113153
«0202 .1301
.0013 .2026“
«0828 .0908
.0023 «0414
.0958 «0613
«0401 «0321
.1445 .1020
«1242 .0485
.0203 .0551
.0234 .0376
.0987 .1136
.0173 .2283“
.0932 .0506
.0302 .1300
.0667 .1487
.1242 .1847‘
.0201 .1409
«699 .1950‘
.0316 .0226
.0809 .2002“
«0695 .0514
.0754 .0977
.2424“ .1950‘
.0345 .1977‘
«0497 .1291
.1137 .1890‘
«0349 .1723‘
.1301 .1398
.0202 «0752
.0574 .2625“
.0455 .1721‘
«0089 .1110
.0582 .0934
.0440 .0798
.1592‘ .0645
.1249 .1779‘
.0871 «0013
.0506 .0291
«0391 .1059
«0249 .1252
.0108 «0695
.1104 .1976‘
.0385 .1411
.1930‘ .0664
«0064 «0092

.0162
.1391
.0942
.0247
.1018
«0212
«0730
.1847‘
.1711‘
.0210
.1866‘
.0671
.1910‘
.1098
«1015

1110854

.1628‘
.2416“
.0556
«1091
.0757
.2093.-
.0136
.0258
.0545
.0728
.0282
.0245
«0483
.0014
.0377
.0010
.1235
«0024
«0812
.0410
«1012
.1536‘
.0983
«0089
.0461
.2034“
«0208
.1379
.0360
.2664“
.1407
.1948‘
«0966
«0601
«0103
.0889
«0480
.0452
.0737
.0383

«2.594.-

.0109
.0618
.1331
.1008

«0855
«1292
.0405
«0329
«0149
«0465
«1642‘
«1575‘
«1457
«1167
«0942
«1019
«0985
.0435
.0632

1113855

.1180
.1481
.0069
.0326
.1482
«0259
.0770
.1424
«0293
«0465
.1083
.1174
.0218
.0405
.0518
.2251“
.1346
.1445
«1120
«0433
.0711
.1482
.0913
.1032
.0704
.2026“
«0280
.1177
.0320
.0271
.1788‘
.0339

«0012

«1509
.1494
.2057“
.1362
.1784‘

.2602“
.2597“
.0682
«0771
«0181
.1739‘
.0647
.1095
.1149
.0473
.1110
.0617
.1449
.1147
«0365

1110856

.0713
.0952
.1072
.1375
.1393
.0760
.1564‘
.1419
.0648
«0172
.0383
.1438
.0150
.0963
.1242
.0390
.1553‘
.0562
.0644
«0573
«0244
.1627‘
«0168

.1527
.2398“
«0217
.0524
.0675
.1015
.1504

.1120 1919‘
.0279 .1153
.1273 .0979
«0405 «0067
«0259 0594
.2117“ .1856‘
.1772‘ .2852“
.2103“ .1941‘
.0824 .2038“
.0227 .2583“
.2109“ .1260
.0957 .1227
.0746 .0379
.0024 .1461
«0987 «0834
1110857 1110859
.0374 .1089
.0046 .1412
.0516 .0724
.0422 .1226
«0109 .0587
.0901 «0122
.1010 .1144
.0721 ««0583
.0072 ««0199
«1010 .0542
.0796 .0875
«0244 1315
.0521 0985
.0598 .1351
.0265 .1838‘
«0192 .1888‘
1892‘ «0998
-1259 «0073
«1394 «0323
.0409 .0745
.0213 «0457
.0264 .1301
.1478 .0664
«0614 1731‘
«1038 «0100
0956 .0738
0748 .1402
.1599‘ .0234
.1028 -1089
.0818 .0889
.0905 «0101
.0611 .0670
.1296 .0211
.1464 0362
.1266 0350
.1052 0171
.0405 .0164
.1642‘ .0310
.0455 .2199“
.1232 «0603
«1759‘ .0381
.1030 .1260
.0334 .0602
.2262“ .2089“
.1559‘ .0391

1110849

1113810
1113811
1113812
1110813
1110814
1113816
1113817
1113818
1113819

1113832

1113848

«0680
.1702‘
.1532‘

«0092
.1008
.1784‘
.1504
.1559‘
.0391

.0717
.0380
.1249
«0124
.0681
.2891“
.1942‘
«1684‘
«0392

.1766‘
.1620‘
.1316
.1377
«0228
.0012
.0714
.1561‘

1110860

.0570
.0743
.0128
«0214

.1946‘
.1245
«0708
«0210
.1328
.1484
.0151
.0245
.0214
.1709‘
.1153
.1356
.1245
.0808
.1424
«0345
.1608‘
.1871‘

.1192
.0285
.0617
.0570
«0846

1110849

10000
«0694
«0045
.0003
«0837
«0147
.0896
.0896
«0220
«0246
.0288
«0017
«0327
«0216
«0106
.1282
.1169

«2091“

«1463
«0477
.0731
«0673
«0054
«0182
«1271
« 1764‘
«0704
.0257
.0117
«0270

1110861

.0421

.2020“

.0932
.0505
.0518
.0664
.1283

.2278“

.0052
«0167
.1770‘
.0600
«0695
.1520
«0638
.0311
.1110
.0845
«0570
.0833
.0854
.0374
.0601
.1203
«1091
.1118
.1502
.1309
.0403
.0465

1110850

«0694
L0000
«0209
.0252
.0073
.1461
.0771
.0882
.1519
.0246
.1032
.0526
.0852
.0216
«0911
«0708
.0744
.0567
«0923
.0477
.1047
.1461

.3199“

.1455
.1784‘
.1764‘
.0704
.0544
«0670
«0525

1110862

.0330
.1247
.1422
.0627
.0981
.0852
.0971
.0352
.0285
.0175
.1683‘
.1342
.0201
.0347
.0125
.1083
.1264
.0917
«0817
.0479
.0518
.1840‘
.1573‘
.0817
«0157
.1223
.1080
.0224
.0516
.1788‘

.1679‘
.1549‘
.1403
.1049
.1150
.0791
.0848
.1543‘
«0591

1113864

.0280
.0944
.1053
.0602
«0161
.0857
.0465
.1424
.1120
«0161
.1674‘
.3015“
.0510
.0405
.0793
.0838
.0778
.1977‘
.0858
.1439
.0352
.0684
.0381
.1313
.0406
.2990“
.0320
.1177
«1180
.0815

«0617

«0041
«0381
«0534
.1444
.0898
«0988

«0469
.0713
.0258
«1165
.0005
«1043
.0225
.0055
.0313

1113853

«0837
.0073
.1758‘
.1691‘
L0000
.0479
.1445
.0075
.0234
.2523.-
.0850
.1089
.0917
.1179
.0956
.0752
«0249
.0879
«0856
«1423
.0693
.0227
.1115
.0733
.0909
.2323“
.0094
«0191
«0521
«0095

1113866

.0393
.1000
.1289
.0285
.0717
.0519
.1825‘
.0701
.0175
«0252

.0155
.0847
.0544

1110854

«0147
.1461
.1427
.0415
.0479
10000
.2441“
.1493
.1806‘
.1077
«0830
.0369
.0627
.0243
«1047
«0360
«0061
«0554
.0403
«0396
.0647
.1677‘
.1576‘
.1536‘
.1584‘
.1141
«0200
.0547
«0414
«0777

1110860

.0629
.0596
«0047
.0838
.2031“
.1382
.1427
.0561

.2912"
.2775“
.0755
.0536
.0561
.0971
.1231
.0712
«1034
«0092
.0053
.0838
«0039
.0209
«1219
«0045

.0879
.2548“
.0605
.1664‘
«0392
.0636
.0944
«0396
.0224
.1939‘
«0863
.0293
.0836

«0930
«0393
«0458
.0666
«0458
.0923

.2018“
«1182
.0177

«0564
«1291
«1270

.0315

«0742
.1111
.1245

«0858

«0206

.0711
.0626

«0693
«1683‘
«0721
«0183

.0914
«0754

.2533“

1110833

1113835
1110836

1113838
1113839
1110840
1110841
1110842
1113844
1113845
1110846
1113847
1113848

1113849
1110850

1110879

111382

1113811
1113812
1113813
1110814

.0318
.1334
.1545‘
.1109
.1800‘
.0998
«0296
.0732
.0645
.0863

.1545‘

.1480

1110860

.0288
.1032
.0802
.0023
.0045
«0830
.1429
.0367
.0938
«0301
LOOOO
.1515
.0627
.0602
«0918
.2279“
.0838
.1330
«0825
«0275
.0779
.0997
.0899
.0919
.0717
_2273oo
«0573
.0988
.0598
.0754

1113873

.0208
.1398
.0090
.0736
.1276
«0547
.1577‘
«0010
.0277
«0136
«0278
.0245
«0483
«0880

«0525
.0732
.1231
.1377
.0198
.1034

«0331
.062
.1386
.2440“

«0901
.0971
.1613‘
.3157“
.0717

1110861

«0017
.0526
.1037

«0114
.0850
.0369
.1560‘
.2236“
.1160
.2666"
.1515

10000
.2342“
.1028

«0584
.1245

«0192
.1913‘

«1091
.0081
.0891
.0621
.3495“
.2567“
.1685‘
.1201
.1978‘
.1124
.0705

«1091

1113874

.1830‘
.0018
.0053
.0378
.0077
.0943
.0924
.2211“
.0456
«0350
.1754‘
.2489“
«0010
.0975

.0866
.2291“
.0079
.0244
«0827
.0644
.1264
.1504
.0880
.1841‘
«0785
.1252
.0553
.0831
.0380

.0917
.1043
«0752
«0853
.0790

1113875

.1998‘
.1425
.0897
.1425
.0362
.0065
.1771‘
.1329
.0485
.1578‘
.0763
.1656‘
.0029
.0432

.1174
.0605
.0358
.0159
«0162
.2019“
.0983
.0465
.0421
.1751‘
«0838
.1494
.0837
.1094
.1249

1110864

«0216
.0216
.1764‘
«0260
.0917
.0243
«0450
«0781
.0897
«0245
.0602
.1028
.1300
L0000
«1551‘
.0880
.1301
«0122
.0267
«0201
.1793‘
.0243
.0983
.1216
.0826
.1099
.0653
.0538
.1111
«0980

1110876

.0812
.0916
.2087“
«0046
.0081
«0637
.0301
.0721
.0583
.0542
.0056
.1315
.1524
.1351

166

.0388
«0842
.1314
.1197
.1784‘
.0398
«0413
.0247
«0732
«0597
.0882
.1054
«0203
.0107
«0124

1110865

«0106
«0911
.0428
.1460
.1179
«1047
.0309
«1042
«1174
.0352
«0918
«0584
«0360

.1776‘
.0197
.1323
.1263
«0017
.1124
.1785‘
.1346
.1502
.0725
«0415
.3055“
«0642
.0212

«0296
.1651‘
.0548
.0924
«0168
«0111
.0551
.0901
.1068
.0162
«0855
.2602“
.1120
.1919‘
.0681

1110866

.1282
«0708
.1133
.0625
.0956
«0360
.1480

1110878

.1390
.0747
.0551
.0940
«0172
.0584
.1980‘
«0997
«0038
«1423
.0535
.0177
.0529
.0520

«0882
.0429
.1200
.0733
.0320
.0067
.1513
.0634
.1523
.1391

«1292
.2597“
.0279
.1153
.2891“

1113867

.1169
.0744
.1346
«0347
.0752
«0061
.1016
.1442
.0396
.0626
.0838
«0192
.0616
.1301
«0371
.1219
L0000
.1071
«0414
.0728
.1622‘
.1291
«0145
.0772
«0164
«0070
.1018
.1636‘
.1351
«0005

«0117
«0039

.0385
«0022
.3540“
.34 O.
«0396
.1467
«0288
«0838
«0950
.0942
.0405
.0682
.1273
.0979
.1942‘

1110869

«2091“
.0567
.1393

«0498

«0756
«0385
.0179
«1303
.0083
.0312
.0740
«0422
.0152

«1001
«0624
«0365
«0414
.0233
.0364
«0404
«0619
«1503
.0247
«0329
«0771
«0405
«0067
«1684‘

.0029

.0350
.0359
«0839
«0574

.0316
«0121

«0549
«0122

.0220
«1522

.0595
«0155
.0338
«0316

.0134
«1772‘

«0353
.1018
«0149
«0181
«0259
«0392
1113871

«0477

«0617

«0275

.0527

«0079
.0055

«0248
.2079“

.1106
.1365
«0212

.1739‘

.2117“

.1856‘
«0053

.0731
.1047
.0911
.1190
«1423

22324”
.1389
«0128
.1366

.0891
.0543
.1793‘
‘«0442
.1188
.1622?
.0074
«0081
L0000
.1401
.1017

1113816

1113878

1113881
1113882

.2457“
.1883‘
.0109
.0227
.0525
.0156
.1024
.0024
.0479
.0444
«0103
.0818
.0927
.0311
«0776
«0686
.0245
«0825
.1185
.0210
«0359
.1398
.0331
.0452
.1261
«0730
«1642‘
.0647
.1772‘
.2852“
.1766‘

.0539
.0989
.1049
. 1617‘
.0344
.0266
«0387
.1646‘
.0362
.2212“
«0587
. 1052
.10W
.0716
«0698
.1272
.1620‘
.0548
.0827
.1740‘
.0562
.1287
«W63
.0841
.W66
.1847‘
«1575‘
.1W5
.2103“
.1941‘
.1620‘

113874

«W54
.3199“
. 1549‘

.1115
.1576‘
.1804‘
.1611‘
.1901‘
.1941‘
.0899
.3495“
.0824
.W83
.0328
.0415
«0145
.1331
«1121
.W35
.1017
.0539

.3153“
.1688‘
.2274“
.1174
.1426
«W58
«1121

.1234
.0744
.0934
.0977
.1070
.0941
.0472
.0242
.0489
.2149“
«0472
.1643‘
.0477
.1016
.0752
.0619
.0249
«0377
.1670‘
.1819‘
.1092
.0686
«0249
«0614
.0716
.1711‘
«1457
.1149
.0824
.2038“
.1316

113875

«0182
.1455
.1403
.0754
.0733
.1536‘
«0115
.0654
.1757‘
.1546‘
.0919
.2567“
.1088
.1216
«0226
.0348
.0772
.1068
.0856
.1423
.0037
.1536‘
.3153“
10000
.3020“
.2064“
.1123
.0958
.0786
«0286

.0317
.1627‘
.0837
.0664
.0199
«0489
.0866
.1301
.0418
.1212
.W99
.2813“
.1679‘
.W34
«W58
.1140
.1598‘
«W67
.1260
.1153
«1399
.W68
.W55
«W42
.W67
.W10
«1167
.0473
.W27
.2583“
.1377

1113876

«1271
. 1784‘
.1049
.0453
.W
.15 84‘
. 1094
.W80
.1761‘
.1594‘
.07 17
.1685‘
«0178
.0826
«0138
«0295
«0164
.2204“
«0834
.0349
«W94
. 1331
. 1688‘
.3020“
1WW
. 1505
.W47
.2696“
.0662
«1218

167

.063 8 «0702 .W83
.1243 «W96 . 1264
.1492 .0942 . 1280
. 1836‘ .0581 .0576
.0828 .0555 «0178
. 1862‘ .2086“ «0224
.07 88 . 1616‘ .0802
.2064“ .0149 «0321
.0374 «0392 .1576
. 1890‘ .1501 .0270
.1364 «0149 «0156
. 1530‘ .W69 .1060
.1245 .0805 «W58
.1103 «0513 «0105
«W52 «0293 «0346
. 1030 «W89 .0552
.1368 «0104 «W47
.W76 .0266 .W23
.W69 .1137 «0189
.0715 .2062“ «W84
.0546 .0149 «W52
.1675‘ .0747 «0559
«0453 «W34 .W84
.0669 .1162 .0451
.W42 .W50 .W62
.1866‘ .W71 .1910‘
«W42 « 1019 «W85
.1110 .1517 .1449
.2109“ .W57 .0746
.1260 .1227 .1379
«0228 .W12 .0714
113877 113878 113879
«1764‘ «0704 .W57
.1764‘ .0704 .644
.1150 .0791 .0848
«1107 «WZ7 .0144
.2323“ .W94 «0191
.1141 «WW .0547
.1W9 «0143 .W59
. 1895‘ .1266 .0515
«0166 .0715 .1446
.0769 .W82 .1379
.2273“ «0573 .W88
.1201 .1978‘ .1124
.Wl7 .1043 «0752
.1099 .0653 .0538
«1363 «1299 .0524
«0147 .0841 .09z2
«W70 .1018 .1636‘
.1492 .0686 .(XTM
«0812 .0723 .0174
«W81 .W74 .W72
. 1296 .0169 .W93
.0134 .0303 .0811
.2274“ .1174 .1426
.2064“ .1123 .W58
. 1505 .W47 .2696“
1WW . 1670‘ .W20
. 1670‘ 1WW .0691
.W20 .1591 1WW
«0969 .W 16 .1034
«1192 «W37 «W25

.W52 «W 84
.0 [W «(7121
. 1677‘ «W42
«W57 «0476
.0179 « 13W
«0483 «0494
«W92 .05 69
«1595‘ .W95
.W7 3 .1048
«W93 «1880‘
« 1077 « 1356
«0348 «085 3
«0398 «0887
.1394 .0127
«0398 «0887
« 1517 .0144
«07 86 «0122
.W72 «(£24
.05 82 .W42
.W15 «0414
. 1346 .1520
.1312 .0364
«0499 «0404
.0532 .W92
.0584 .W82
. 1098 «1015
.0435 .0632
.1147 «0365
.W24 «W87
.1461 «0834
.1561‘ «W20
1113881 1138W
.0117 «W70
«W70 «(52.5
. 1543‘ «0591
.W43 «W46
«0521 «W95
«0414 «0777
.111 1 .W67
.1175 «0884
.241“ «W66
«0404 « 1218
.0598 .07 54
.0705 -. 1W1
«0853 .0790
. 1 1 1 1 «W80
«0432 «07 54
.0398 .0458
.1351 «W05
«W16 «0723
«0313 .1078
.0592 .0460
.W92 «0460
«0141 .W10
«W58 « 1121
.07 86 «W86
.0662 -. 1218
«W69 «1 192
.W16 «W37
.1034. «022.5
1.0000 .1340
. 1340 1WW

APPENDD( D

Item parameter estimates for statistical knowledge items

ITEM INTER SLOPE THRESH DISPER ASYMP CHISQ DF
3.13. 8.13. SE. 8.13. 3.13. (PROB)

 

0001I 0.495 I 0.822 I -0.602 I 1.216 I 0.192 I 2.9 4.0
I 0169* | 0179* I 0255* I 0265* I 0084* I (0.5795)
I I I I I I

0002I -0099 I 0.703 I 0.141 I 1.423 I 0.201 | 5.7 5.0
I 0201* I 0191* I 0273* I 0387* I 0084* I (0.3373)
I I | I I I

0003I 1.575 I 0.862 I -1.827 I 1.160 I 0.205 I 1.9 3.0
I 0255* I 0255* I 0442* I 0342* I 0091* I (0.6008)
I I I I I I

0004l 0.063 I 0.674 I -0093 | 1.484 I 0.217 I 0.9 5.0
I 0194* I 0173* I 0298* I 0381* I 0089* I (0.9691)
I | I | I I

0005I -1.132 I 0.639 | 1.772 I 1.566 I 0.246 | 5.4 6.0
I 0440* I 0223* I 0534* I 0546* I 0069* I (0.4947)
I I | I I I

0006I 0.054 | 0.551 I —0098 I 1.816 I 0.253 I 6.9 5.0
I 0208* I 0148* I 0386* I 0488* I 0099* I (0.2241)
I I I I I I

0007I 0.536 I 0.925 I -0580 I 1.082 I 0.208 I 2.2 4.0
I 0179* I 0238* I 0245* I 0278* I 0089* I (0.7047)
I I I I | I

0008I -1.125 I 0.821 I 1.370 I 1.217 I 0.179 I 3.7 6.0
I 0386* I 0272* I 0328* I 0403* I 0059* I (0.7199)
I | I I I I

0009I 0.167 I 0.696 I -0240 I 1.436 I 0.231 I 2.1 5.0
I 0194* I 0184* I 0302* I 0380* I 0094* I (0.8397)
l I | I I |

0010I -2.358 | 0.904 I 2.609 I 1.106 I 0.222 I 6.8 6.0
I 0946* I 0403* I 0826* I 0494* I 0042* I (0.3371)
I I I I I I

0011I -l.558 I 1.170 I 1.331 I 0.854 I 0.164 I 3.0 5.0
I 0552* I 0458* I 0280* I 0334* I 0050* | (0.7076)
I I I I I I

0012I 0.749 I 1.297 I 0.577 I 0.771 | 0.162 I 5.5 3.0
I 0184* I 0320* I 0175* | 0190* I 0074* I (0.1370)
I I I I I I

0013I 0.309 I 0.439 I -0705 I 2.278 I 0.218 | 8.3 5.0

168

0014I -1.801 I

0017I -1.298 I

0018I -0164 I

0019I -1.167 |

0020I -0533 I

0022I -1.264 I

0023| -0.416 I

0026| -1.305 |

00271 -2.198 I

0028I -2.661 I

169

0608* I 0094* I (01407)

I I

0.888 I 0.148 I 11.1 5.0
0339* I 0044* I (0.0499)

I I

1.178 I 0.206 I 3.5 5.0
0305* I 0084* I (0.6296)

| I

1.026 I 0.183 I 3.9 5.0
0257* I 0079* I (0.5655)

I I

0.900 I 0.176 | 3.8 6.0
0301* I 0053* I (0.7016)

I I

1.373 I 0.164 I 7.9 5.0
0347* I 0072* I (0.1592)

I I

1.731 I 0.206 I 11.7 7.0
0592* I 0066* I (0.1107)

I I

1.494 I 0.168 I 13.1 6.0
0364* I 0069* I (0.0406)

I I

1.552 | 0.206 | 3.8 4.0
0381* I 0091* I (0.4408)

I I

0.954 I 0.324 I 5.5 6.0
0376* I 0064* I (0.4798)

I |

0.974 I 0.239 I 4.0 5.0
0322* I 0080* I (0.5496)

I I

1.377 I 0.207 I 2.4 5.0
0301* I 0087* I (0.7898)

I I

1.086 I 0.196 I 1.9 4.0
0258* I 0086* I (0.7651)

I I

1.103 I 0.138 I 4.0 5.0
0339* I 0049* I (0.5458)

I I

1.122 | 0.114 I 4.7 5.0
0463* I 0036* I (0.4601)

I I

1.096 I 0.243 I 2.5 6.0

0029I -0148 I

0031I -1.317 I

0032I -0.065 I

0033I -0023 I

0034I -0101 I

0035I -0086 I

0036I -1.610 I

0037I -2.654 I

0038I -1.053 I

0039I -1.296 I

0042| -0.804 I

0043I -0741 I

170

0510* I 0040* I (0.8745)

I I

1.522 I 0.269 I 8.8 5.0
0445* I 0096* I (0.1177)

I I

1.760 I 0.211 I 6.5 5.0
0444* I 0092* I (0.2604)

I I

0.884 I 0.318 | 8.1 6.0
0349* I 0062* I (0.2335)

I I

0.593 | 0.322 I 7.8 4.0
0213* I 0082* I (0.0993)

I I

0.536 I 0.319 | 6.5 4.0
0203* I 0080* I (0.1617)

I I

2.304 I 0.239 I 11.4 6.0
0640* I 0096* I (0.0762)

I I

1.390 | 0.195 I 4.1 5.0
0338* | 0082* I (0.5309)

I I

1.252 I 0.235 I 2.1 7.0
0490* I 0055* I (0.9545)

I |

0.634 I 0.124 I 2.8 5.0
0308* I 0032* I (0.7331)

I |

1.127 I 0.191 I 4.2 6.0
0383* I 0064* I (06494)

I I -.
0.992 I 0146 I 3.8 5.0
0324* I 0052* I (0.5785)

I |

1.187 I 0.213 I 1.6 5.0
0310* I 0088* I (0.8979)

I I

1.289 I 0.218 I 1.2 2.0
0388* I 0095* I (0.5556)

I I

0.836 I 0.192 I 11.6 5.0
0284* l 0063* I (0.0397)

I I

1.141 I 0.219 I 2.8 6.0

I

I
0044I

I

I
0045I

I

I
0046!

I

I
0047I

I

I
0048I

I

I
0049I

I

I
0050I

I

I
0051I

I

I
0052i

|

I
0053I

I

I
0054I

I

I
0055I

I

I
0056I

I

|
0057I

I

I
0058I

0332* I
I
0.004 I
0220* I
I
-0523 I
0295* |
I
-0677 I
0381* |
I
-0243 I
0220* I
I
-0025 |
0213* I
I
0.314 I
0175* I
I
-0694 I
0350* I
I
-1.261 I
0506* I
|
-0815 I
0348* I
I
0.027 I
0208* I
I
-0856 |
0397* I
I
-1.629 I
0726* I
I
0.231 I
0205* I
I
-2.576 I
1383* I
I
0.431 I

0293* I
I
0.639 I
0192* I
I
0.957 I
0314* I
I
0.785 I
0288* I
I
0.807 I
0222* |
I
0.561 I
0158* I
I
0.674 I
0174* I
I
0.851 I
0298* I
I
1.033 I
0391* I
|
0.861 I
0277* I
I
0.719 I
0194* I
I
1.235 I
0431* I
I
1.525 I
0684* I
I
0.757 I
0204* I
I
0.952 I
0455* I
I
0.607 I

0274*
I
-0007
0345*
I
0.547
0232*
I
0.862
0371*
I
0.301
0239*
I
0044
0376*
I
-0466
0303*
I
0.815
0302*
I
1.220
0296*
I
0.946

0283* I

I
0.038
0293*

I

0.693

0194* I

I
1.068

0217* I

I
-0306

0302* I

I
2.707

1247* I

I
-0710

171

0382* I 0074* I (0.8326)

I I

1.565 I 0.264 I 16.4 5.0
0470* I 0099* I (0.0060)

I I

1.045 I 0.228 I 1.4 5.0
0343* I 0077* I (0.9224)

I I

1.273 I 0.341 I 8.2 6.0
0467* I 0084* l (0.2224)

I I

1.239 I 0.193 I 2.7 5.0
0341* I 0078* I (0.7429)

I I ‘
1.782 I 0.249 I 5.3 5.0
0503* I 0097* I (0.3752)

I I

1.483 I 0.213 I 7.3 5.0
0383* I 0091* I (0.1982)

I I

1.175 I 0.267 I 3.9 6.0
0412* I 0079* | (0.6881)

I I

0.968 I 0.254 I 3.1 6.0
0366* I 0061* I (0.7985)

I I

1.161 I 0.224 I 4.0 6.0
0374* I 0071* I (0.6741)

I I

1.391 I 0.233 I 1.6 5.0
0375* I 0092* I (0.9063)

I I

0.810 I 0.244 I 3.7 5.0
0283* I 0066* I (0.6004)

I I

0.656 I 0.257 I 4.2 6.0
0294* I 0053* I (0.6566)

I I

1.320 I 0.258 I 4.2 5.0
0356* I 0099* I (0.5180)

I I

1.051 I 0.428 I 4.6 7.0
0503* I 0045* I (0.7128)

I I

1.646 I 0.208 I 7.2 5.0

 

I

I
0059I

I

I
0060I

I

I
0061I

I

I
0062I

I

I
0063I

I

I
0064I

|

|
0065I

I

I
0066I

I

I
0067I

I

I
0068I

I

I
0069I

I

I
0070|

I

0167* I
I
0.203 I
0190* I
I
0.280 I
0216* I
I
-0636 I
0352* I
I
-0.573 I
0299* I
I
-0087 I
0237* I
I
0.010 I
0210* I
I
-0620 I
0297* I
I
-0742 I
0321* I
|
-0179 I
0188* l
I
-0719 I
0349* I
I
-1.379 I
0520* I
I
-2.140 I
0876* I

0143* I
I
0646 I
0172* I
I
0.655 I
0159* I
I
0.442 I
0139* I
I
0.605 I
0188* I
|
0.682 I
0205* I
I
0944 I
0270* |
I
0.992 I
0319* I
I
0.873 I
0282* I
I
0.723 I
0171* I
I
0.668 I
0224* I
I
0.818 |
0301* I
I
0.842 I
0365* |

-0314
0326* I
I
0.428
0290* I
|
1.441
0696* I
|
0.947
0404* I
|
0.127
0335* I
I
-0011
0224* I
|
0.625
0217* I
I
0.850
0261* I
I
0.247
0238* I
I
1.076
0408* I
I
1.685
0435* I
I
2.541
0818* I

172

0388* I 0090* I (0.2040)
I I

1.548 I 0.235 I 4.5 5.0
0412* I 0095* I (0.4766)

I I

1.528l 0190I 1.1 5.0
0372* I 0079* I (0.9512)

I I

2.264 I O.312| 8.3 6.0
0.12.7 *I 0093*I(02137)

I I

1.654I 0.246 I 8.2 6.0
0515* I 0085* I (0.2250)

I I

1.466 I 0.272 I 5.0 5.0
0441*1 0.098*I(04138)

I I

1.060I 0.218 I 7.4 5.0
0303* I 0085* I (0.1902)

I I

1.008I 0.199I 5.2 5.0
0324* I 0071* I (0.3929)

I I

1.145I 0.205 I 1.8 6.0
0369* I 0071* I (0.9358)

I I

1.382I 0.161 I 3.4 5.0
0327* I 0071* I (0.6395)

I I

1.497 I 0.274 I 10.2 6.0
0501* I 0082* I (0.1170)

I I .
1.222 I 0.238 I 6.7 6.0
0450* I 0060* I (0.3469)

I I

1.187I 0260I 5.8 7.0
0515* I 0047* I (0.5613)