a...“ ,.<.r;v - ’3‘”..- Jab—nu «1 .fi c: . .. z“ r V‘ 4’” ‘ .u w. , ‘ ‘ ., JSA V "R —‘ ‘;J:':..£“$J-~'~Z%§ £931.”: “2‘ “a": 5’.”- " .1. J- .“ xix-:23}: &7_'”r ‘ ‘v 27‘“ RABIES illlmlljiljlw \\\”l\\\\°\\\\\\\\\\\\\l This is to certify that the dissertation entitled On the Meaning and Measurement of Test Appropriateness presented by Jose Manuel Cortina ‘s \v has been accepted towards fulfillment of the requirements for Ph . D . degree in Psychology Neal Schmitt Major professor Date 7/12/94 MS U is an Affirmatiw Action/Equal Opportunity Institution 0-12771 LIBRARY Michigan State University 4‘s ‘- PLACE II RETURN BOXto monthl- chockommm your record. To AVOID FINES Mum on or More data duo. DATE DUE DATE DUE DATE DUE ,_._._—_A#i ON THE MEANING AND MEASUREMENT OF TEST APPROPRIATENESS By Jose Manuel Cortina A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Psychology 1994 ABSTRACT ON THE MEANING AND MEASUREMENT OF TEST APPROPRIATENESS By Jose Manuel Cortina Recent research has shown that a test can be psychometrically valid and yet be inappropriate for certain individuals such that the test scores for these individuals cannot be interpreted as accurately indicating their standing on the construct of interest. This body of research, however, has been largely statistical in nature, with a focus on indices of appropriateness such as the 12 index developed by Drasgow, Levine, and their colleagues. The purpose of the present paper was to examine appropriateness as a construct, and develop and partially test a model of its determinants based on literature from educational, social, personality, and quantitative psychology. Specifically, the effects of item characteristics, math anxiety, test anxiety, carelessness, and conscientiousness on statistical knowledge test scores and the 12 index of test appropriateness were examined in a sample of 165 undergraduate statistics students. The results showed that item characteristics, math anxiety, carelessness, and the item characteristic by conscientiousness interaction were significantly related to knowledge test scores while none of the hypothesized predictors of 12 were significantly related to it. Implications for appropriateness and testing are discussed. ACKNOWLEDGEMENTS I’ve been waiting a long time to write my acknowledgements, if for no other reason than because it would suggest that I have something to acknowledge. And I do. It appears that I have finally managed, over the shrill objections of the administration, to annex a series of degrees up to and including Doctor of Philosopy from Michigan State University. How and when this happened, I have no idea, but there are many people who deserve recognition: accessories after the fact, if you will, and I think you will. Thanks Kim for things too numerable to mention, but specifically for reminding me that the writing of my Results section was not an insurmountable task. You were right. Thanks Ron for Sega and Scotch and other friendship-related esoterica. Thanks Stephen and Dale for not letting trifles like work get in the way of beer and golf. Thanks Mick for reminding me not to take life too terribly seriously. After all, there’s a 50-50 chance that I won’t survive the day anyway. Thanks Sher for friendship unconditional upon the level of my idiocy. Thanks Stan and Jean (Jan and Stean?) for Biscotti and access to D’s closet. Thanks Tim and M0 for French wine, artichoke dip, and kindness. Thanks John for playing up my golf skills in the letter of rec. Thanks Mike, Steve, Dan, and Kevin for sitting through all of those defense meetings without laughing, at least not in my presence. Thanks Mary for not getting too upset when I wallpapered your desk. Thanks Suzy, Greta, and Cheryl for always making life easier. Also, special appearances by Jeff 8., Sandy L., JoAnn S., Whit, El, Jen, Dennis, J.T., Smitty, Rob, Barker, Dan. W., Hatrack, Sandy T., Kara 8., Rick D., Gordon W., and a player to be named later. Thanks to my family for love and the absence of worry. And thanks most of all to Neal. There aren’t many people in this field or any other who could have put up with me through a thesis, comps, a dissertation, 818, Personnel Selection, 818 again, and everything in between (e.g., undocumented book and journal theft, unannounced interruptions, incessant questions, an overloaded computer account, typos, etc.). You are not only the finest mentor I have ever seen or heard about, you are, as far as I can tell, the best possible mentor, especially for a person with my particular eccentricities. My career goal, though unattainable, is to be the Academic that you are. JMC 6/94 P.S. Special thanks to Ronald Wilson Reagan, who never let me down. iv TABLE OF CONTENTS Page LIST OF TABLES ............................................ viii LIST OF FIGURES ............................................ ix INTRODUCTION .............................................. 1 Test Appropriateness as it has been Studied ....................... 2 Foundations of Test Appropriateness ............................ 6 Sources of Type P Inappropriateness ........................... 12 Acquiescence/Denial ................................. 12 Need for Approval .................................. 13 Extreme Response Set/Central Tendency ................... 13 Test Anxiety ....................................... l4 Cognitive Controls .................................. 16 Response Bias/Test Wiseness ........................... 17 Carelessness/h/Iotivation ............................... 19 Omissiveness ...................................... 20 Section Summary ................................... 21 Sources of Type I Inappropriateness ........................... 21 Test Anxiety by Item Characteristics ...................... 22 Motivation by Item Characteristics ....................... 26 Omissiveness by Item Characteristics ...................... 29 Field Articulation by Item Characteristics ................... 31 Responses Bias by Susceptibility to Bias ................... 32 Test Wiseness by Susceptibility to Wiseness ................. 35 Topic Irrelevant Ability by Topic Irrelevant Item Content ....... 38 Need for Approval by Item Characteristics .................. 41 Section Summary ................................... 44 Measures of Type P Inappropriateness .......................... 46 Acquiescence/Denial and Extreme Response Set/Bias .......... 46 Need for Approval .................................. 47 Test Anxiety ....................................... 48 Cognitive Controls .................................. 48 Response Bias/1‘ est Wiseness ........................... 49 Carelessness ....................................... 50 Section Summary ................................... 50 Measures of Type I Inappropriateness .......................... 51 Non-IRT based Indices of Inappropriateness (U nstandardized) . . . . 51 IRT-based Indices of Inappropriateness (U nstandardized) ........ 55 Standardized, Non-[RT based Indices of Type I Inappropriateness . . 57 Standardized, [RT-based Indices of Type I Inappropriateness ..... 59 Overall Summary ........................................ 64 The Present Study ........................................ 67 METHOD ................................................... 71 Sample ................................................ 71 Design ................................................ 71 Measures .............................................. 72 Conscientiousness ................................... 72 Math Anxiety ...................................... 72 Test Anxiety ....................................... 72 Carelessness ....................................... 73 Statistics Knowledge Test .............................. 74 Procedure .............................................. 77 Data Analysis ........................................... 78 RESULTS ................................................... 79 Tests Measuring Respondent Characteristics ...................... 79 Statistical Knowledge Test .................................. 82 I2 .................................................... 86 Tests of Hypotheses ....................................... 87 Difficulty-Based Item Characteristics and Knowledge Test Scores ....... 94 Conscientiousness and Knowledge Test Scores .................... 94 Math Anxiety and Knowledge Test Scores ...................... 100 Test Anxiety and Knowledge Test Scores ....................... 102 Difficulty-based Item Characteristics and 11 ...................... 104 Conscientiousness and I1 ................................... 104 Carelessness and 1, ....................................... 106 Math Anxiety and L ...................................... 110 Test Anxiety and l, ...................................... 112 DISCUSSION ............................................... l 16 Hypothesis 1: Difficulty-based Item Characteristics and Test Scores ............................................ 116 Hypothesis 2: Conscientiousness and Test Scores ............ 117 Hypothesis 3: Carelessness and Test Scores ................ 117 Hypothesis 4: Math Anxiety and Test Scores ............... 118 Hypothesis 5: Test Anxiety and Test Scores ................ 118 Hypothesis 6: Conscientiousness by Item Characteristic Interaction and Test Scores ............................ 118 Hypothesis 7: Carelessness by Item Characteristic Interaction and Test vi Scores ................................ l 19 Hypothesis 8: Math Anxiety by Item Characteristic Interaction and Test Scores ................................ 1 19 Hypothesis 9: Test Anxiety by Item Characteristic Interaction and Test Scores ................................ 1 19 Hypothesis 10: Conscientiousness by item Characteristic Interaction and 11 ................................... 120 Hypothesis 11: Carelessness by Item Characteristic Interaction and 12 ................................... 120 Hypothesis 12: Math Anxiety by Item Characteristic Interaction and i1 ................................... 120 Hypothesis 13: Test Anxiety by Item Characteristic Interaction and 11 ................................... 121 Implications and Conclusions ............................... 121 Limitations ............................................ 123 LIST OF REFERENCES ....................................... 126 APPENDD( A: Personality Measures ............................... 137 APPENDD( B: Statistical Knowledge Items .......................... 140 APPENDIX C: Descriptives for Statistical Knowledge Items .............. 158 APPENDIX D: Item Parameter Estimates for Statistical Knowledge Items ..... 168 vii Table 1 Table 2 Table 3 Table 4 Table 5 Table 6 Table 7 Table 8 Table 9 Table 10 Table 11 Table 12 Table 13 Table 14 List of Tables Page Sources of Both Type P and Type I Inappropriateness for Both Maximum and Typical Tests ................................... 65 Descriptive Statistics for All Tests and lz’s .................. 80 Factor Analysis of Knowledge Test Items .................. 84 Variance of Dependent Variables Attributable to Between- and Within- Subjects Effects .................................... 88 Intercorrelations Among All Terms Used in Regression Analyses . . 91 Regression of Percentage Correct on Knowledge Test onto Conscientiousness and Item Characteristics .................. 95 Regression of percentage correct on knowledge test onto carelessness and item characteristics .................................. 99 Regression of percentage correct on knowledge test onto math anxiety and item characteristics ................................. 101 Regression of percentage correct on knowledge test onto test anxiety and item characteristics ................................. 103 Regression of II onto conscientiousness and item characteristics . . 105 Regression of 11 onto carelessness and item characteristics ...... 107 Regression of 12 onto math anxiety and item characteristics ..... 111 Regression of 1, onto test anxiety and item characteristics ...... 113 Summary of Hypotheses and Support .................... 115 viii Figure 1 Figure 2 Figure 3 Figure 4 Figure 5 Figure 6 Figure 7 Figure 8 Figure 9 Figure 10 Figure 11 Figure 12 List of Figures Page A General Model of the Determinants of Item Responses as They Relate to Test Appropriateness ................................ 8 A General Model of the Determinants of Item Responses with Interaction Effects ........................................... 10 Proposed interaction between item characteristics and test anxiety . . 25 Proposed interaction between item characteristics and motivation . . 28 Proposed interaction between response bias and susceptibility of items to bias ............................................. 34 Proposed interaction between test Wiseness and susceptibility of items to test Wiseness ........................................ 37 Proposed interaction between topic irrelevant ability and topic irrelevant content ........................................... 40 Proposed interaction between need for approval and opportunity to display need for approval ................................... 43 A detailed model of the determinants of item responses as they relate to appropriateness ..................................... 45 A model of the determinants of II ........................ 63 Plot of the effect on test scores of the conscientiousness by item characteristics interaction .............................. 97 Plot of the effect on L of the interaction between carelessness and item characteristics ..................................... 109 Introduction The topic of this paper is test appropriateness. In general terms, a test is appropriate for a given individual to the extent that it measures the construct or constructs that it is supposed to measure and nOthing else. Although there is a wealth of research on the determinants of test scores (e.g., test anxiety, response biases, motivation, item wording, etc.), there is a relative paucity of research on the determinants of test appropriateness. The goal of this paper is to develop and partially test a model of test appropriateness based on literature from 1/0 psychology, educational psychology, education, and quantitative psychology. Although some of the issues that are discussed . could be applicable to tests. of any kind, I focus only on multiple choice tests. Nevertheless, I make an attempt to include a wide range of test content, both maximum performance measures (i.e., tests composed of items with possible responses that are either absolutely correct or absolutely incorrect such as mathematics knowledge, reading ability, paragraph comprehension, spatial relations, etc.) and typical performance measures (i.e., tests composed of items with responses that are not necessarily right or wrong, such as personality inventories, interest inventories, etc. It should be noted, however, that typical performance tests can have right and wrong answers in a sense when they are used for selection purposes). At the outset, one note of clarification is in order. Although discussions of appropriateness are perhaps best directed at the individual item (since this is where our attempts to measure constructs with tests begin), the terms "test 2 inappropriateness" and "item inappropriateness" are often used interchangeably in this paper. The reason for this is simply that a test is norhing more than a set of items. To the extent that those items are inappropriate, the test composed of them is obviously inappropriate. I begin with an overview of appropriateness as it has been studied, and follow with a review of the relevant literature, the purpose of which is to develop a model of test appropriateness. Test appropriateness as it has been studied A tesr is inappropriate to the extent that it measures constructs other than the construct of interest. A test may be inapprOpriate, however, only for certain respondents. For example, consider a psychometrically sound paper and pencil test of English comprehension. For most respondents, this test will yield scores that accurately reflect the English comprehension of the respondents. In other words, the test would be appropriate for these respondents. Now consider the performance of a visually impaired individual on this paper and pencil test. This respondent would almost certainly score very poorly on this test, not because of a lack of English comprehension, but because this respondent can not cope with the format of the test. For this reason, this test would be inappropriate as a measure of English comprehension for this individual. Research on appropriateness has focussed primarily on the development of techniques that identify respondents for whom a given test is inappropriate. Specifically, these techniques involve the identification of response patterns that are aberrant and, therefore, suggest inappropriateness. One way of describing the logic of these indices is in terms of Guttman vectors. A Guttman vector is simply a vector of zeroes and ones in 3 which all of the ones precede all of the zeroes. If a respondent were to respond to items in a way that matched perfectly with the difficulties of the items (and if there were no possibility of guessing), then the responses of that respondent, when ordered in terms of item difficulty, would form a perfect Guttman vector. The idea is that the respondent answers all items at or below a certain difficulty level correctly. At some point, however. the difficulty becomes too great for that respondent, and all items above that level of difficulty are answered incorrecrly. If this were the case, then this respondent should receive a perfect score on an appropriateness index. If, however, some of the items measured constructs other than the construct of interest, then the responses of a given individual might depart from a Guttman vector, and the responses of this person would then be "flagged" by an index of inappropriateness. Consider again the example of a visually impaired individual taking a test of English comprehension, except that now there are some paper and pencil items and some items that are asked and answered in an interview format. For those respondents without impaired vision, we would expect little difference between the written questions and the interview questions. As a result, we could order all of the dichotomously scored item responses for these respondents in terms of their difficulty values and expect them to form something resembling a Guttman vector such as this 1111111100000000 The visually impaired individual, however, would almost certainly do much better on the interview items regardless of their group-determined difficulty values. This individual, therefore, would have an "aberrant" pattern of responses such as the following 4 001000110010110 Almost all of the 1’s for this individual could be expected to represent interview items. Since these interview items as a group should have levels of difficulty similar to those of the paper and pencil items. there should be items of both types at all levels of group-determined difficulty. For the visually impaired individual, however, the most prominent source of "difficulty" is whether or not the items must be seen to be answered. In other words, there is a source of item difficulty for the visually impaired respondent that does nor apply to the group on which the item difficulties were determined. While this is a useful example for explication. it is an exaggeration. More realistic examples would be mathematical word problems (or word problems of any kind) given to people who cannot read well, items with "culture-loaded" content given to someone unfamiliar with the culture, and any knowledge, ability, or personality test given to someone with extreme test anxiety. Many indices have been developed that identify aberrant response patterns, such as Sato’s Caution Index (Sato, 1975), the Dependability Index (Kane & Brennan, 1980), and the 11 index (Drasgow, Levine, and Williams, 1985). The 12 index, however, has received the most recent attention and is, therefore, the appropriateness measure to be used in the present study. Although the specifics of the index are described later in the paper, the general purpose of 12 is to assess the extent to which the responses of a given respondent conform to the three-parameter Item Response Theory (IRT) model. This is analogous to the Guttman-based indices such as those of Sato (1975) and Kane & 5 Brennan (1980), except that the L index assesses the congruence between the responses of an individual and the item parameters and ability estimates from IRT. As I mentioned earlier, most of the work on these indices has been statistical in nature. This work has established the fact that these indices are reasonably effective in detecting departures from expected response models. By contrast, very little work has been done on the determinants of such departures. Many possible sources of departure have been suggested, such as cheating, response coding errors, and fatigue. But little empirical work has been done to establish these factors as sources of departure from expected response models. In other words, the construct validity of these indices has not been firmly established. As a result, we know that these indices detect something, but we have no clear idea about what this something is. The present paper attempts to address this issue of the construct validity of measures of inappropriateness. The first step is to treat inappropriateness as a construct by exploring its meaning and its implications. The second step is to discuss factors that might be expected to lead to inappropriate responses and build these factors into a model of inappropriateness. The third step is to begin testing the model to see if the determinants that are included in the model actually do have an impact on measures of inappropriateness. To this end, I begin with a discussion of the foundations of mental testing in general and test appropriateness in particular, and how early concerns over appropriateness led to modifications in the conceptual model used to describe item responses. I then explain how specific individual and item characteristics might combine to affect both the level of test scores and the appropriateness of test scores for certain people. Finally, I describe a test of parts of this model. Mfions of test appropriateness Although the history of mental testing in general can be traced back thousands of years to the ancient Chinese and Greeks (DuBois. 1966: Anasrasi, 1988), the roots of contemporary testing can be found in the early nineteenth century. The work of Galton. Cattell, Binet, Terman, Goddard. and Others is well documented and need not be reiterated here (see Hothersall, 1990 or Boring, 1950 for thorough reviews). One theme that runs through the work of all of these early testing experts is an assumption that any given mental test measures the same constructs (although the term "construct" wasn’t used) for all people. In other words. item responses are determined only by individual differences on the trait of interest and, where appropriate, item difficulties. The possibility of test inappropriateness for certain individuals was not considered. One of the more striking examples of this assumption at work comes from the testing of immigrants at Ellis Island in 1914. At Ellis Island, immigrants were asked in their own language several trivia questions developed by Goddard and his Staff. The trivia questions consisted, for the most part, of bits of Americana such as "Who is Christy Matthewson?" and "What is Crisco?" and were designed to assess intelligence. With the benefit of hindsight, it is obvious that these questions, while perhaps valid as measures of intelligence for an American sample, were utterly inappropriate for immigrants from Italy, Hungary, Russia, etc. In other words, these items assessed different constructs for different respondents depending on whether the respondents were American or not. This contamination, however, was not identified by Goddard. Because he assumed that the item responses were caused by the level of intelligence of the respondent and the difficulties of the items 7 and norhing more. he concluded that over 80% of immigrants to the United States were "feebleminded" (Hethersall, 1990). Yerkes was among the first to identify individual differences other than the CODSU'UCI of interest that affect item responses. In the preliminary testing of the Army Alpha test of intelligence or "native wit" in 1917, he recognized that many of the respondents were not sufficiently literate to follow the instructions for the test (Hothersall. 1990). In other words, Yerkes recognized that individual differences other than native wit, namely reading skills, were determining item responses. For this reason, the Army Alpha test was inappropriate as a measure of intelligence for the illiterate. It was in response to this issue that the Army Beta was developed. Cady (1923), Allport (1928), and Rosenzweig (1934) were among the first to identify item characteristics other than difficulty (or the personality-test equivalent of item difficulty, item popularity) that affect item responses. These authors suggested that item characteristics such as social desirability would also have an effect on item responses, at least for some respondents. What I have presented above are the components of a very general model of mental test item responses. Item responses are determined by the respondent’s level on the construct of interest, the degree to which various extraneous constructs affect the reaction of the respondent to items, the difficulty of the construct-relevant content of the item, and other construct-irrelevant factors that influence item responses. This model is presented in Figure l. 32.23529; a2 2 222 >2: mm 822:2 Eu: Lo mEnEEBon o... be 3—5:. 3.55“. < ._ vim."— 8553355 5:525 . ,2. . . .. >SzoEa Satin . IIWEIII! mmmzoawmz s Em:— .. wofimEth’ES—o 4SDOEH=Q bzmcizoo mofimeKZES .0 4._.m_xz< hmmb. IQ: I. 33 >E_XZ< hmw... 30... .i :9: mmwzoammm Em: 26 As can be seen, it is proposed that item difficulty in the form of various item construction characteristics has a slight, negative impact on item responses for low Test Anxious respondents and a considerably larger negative impact on the responses of high Test Anxious respondents. Although only one interaction is presented in Figure 3, it is intended to represent all of the item construction principles listed in Proposition 1. All of these interactions, with the possible exception of that associated with content difficulty, could apply to both maximum and typical tests. For example, respondents high in test anxiety are less likely to provide responses that accurately reflect their true standing on the construct of interest on items that are high in ambiguity than are respondents low in test anxiety, while this difference does not exist (or is less profound) for items low in ambiguity. If this proposition is supported, then a test which contains items that vary with respect to any of the above mentioned characteristics is inappropriate (i.e., Type I) as a measure of the construct of interest to the extent that respondents vary with respect to test anxiety. Motivation by item clmgcterm It was mentioned earlier that a lack of motivation to provide responses that reflect one’s true level on the construct of interest can lead to carelessness, which implies inappropriateness. Careless responding has often been identified as a contaminant of test scores in applied psychology. For example, one of the criticisms of concurrent validation designs is that incumbents, because they have nothing to gain by performing well on selection tests administered to them in a validation context, may respond carelessly to some items (Schmitt, Noe, Gooding, & Kirsch, 1984). It seems that they would be most likely to respond carelessly to those items that would 27 require more effort. i.e., the more difficult items. In other words, we would expecr a motivation by item difficulty interacrion. It has been shown that a variety of test takers are able to accurately estimate item difficulties, with correlations between true and esrimated difficulty ranging from .56 to .77 (Diamond & Lorge, 1954). Likewise. any factor that contributes to item difficulty would be expected to interact with morivation to affect item responses. This sugoests the following proposition: Proposition; - Motivation interacts with item content difficulty, ambiguity of item meaning, positive/negative stem wording, open/closed stem format, response option complexity, Stem complexity, and topic irrelevant item content to affecr item responses such that respondents who are low in motivation reflect more carelessness on items that possess characteristics such as item content difficulty, negative wording, and complex response options than they do on items that do not possess these characteristics (e. g., items with simple content, positive wording, and simple response options). Respondents high in motivation reflect little carelessness in either case. The nature of this interaction is depicted in Figure 4. 28 52538:. Ea 3522855 Eu: 5253 530225 3835 .v 9.23”. 3.2055 10.: 30.. _ _ o .............. 20.2252 26.. I 39 o I .- U o o a o u c o D c O D C. C O o a I O n o o o O . c C .0 O l c O . o C . O o c C U o v a o I O C o o I o C o o c o o O o n o o o n on .. //./ 20:.<>_PO_2 :9: .l 10.: $058660 29 The form of the interaction presented in Figure 4 is similar to that of the interaction presented in Figure 3. For highly motivated test takers, item construction- based difficulty is expected to have a slight negative impact on item responses. For low motivation test takers, this negative effect should be more pronounced. Again, the form of this interaction holds for all of the principles lisred in Proposition 2. All of these interactions. with the possible exception of content difficulty, could apply to both maximum and typical tests. For example, respondents low in motivation are less likely to provide responses that accurately reflect their true standing on the construct of interest on items that are negatively worded than are respondents high in motivation, while this difference does not exist (or is less profound) for items that are positively worded. Motivation is not expected to interact with item position because, while it might be expected that the difficulty associated with items early in a tesr would affect low motivation respondents more severely than it would respondents high in motivation, fatigue effects over time would be expected to show similar effects for the two motivation groups, so that the position effects would be washed out. Omissiveness by item characrerisrics. It was mentioned earlier that respondents vary with respect to their reactions to items when they are unsure of the answer. Some respondents will guess at any item even if they have no idea of the correct response (what Cronbach (1946) called the gambler’s mentality), while others will tend to leave such items blank. Omissiveness may not be common among the generation of school-goers that grew up with standardized tests, since these respondents would generally know to guess at every item unless told to do otherwise. The older generation, however, along 30 with the less-educated, are less likely to possess such test wiseness. These respondents may reason that they should leave items for which they have no response blank since they should, in fan. get the item wrong. Since it is quite possible to guess correctly on multiple choice tests, differences in omissiveness will lead to differences in test/item scores. Furthermore, if respondents are more likely to guess or fail to guess at the more difficult items, any factor that contributes to item difficulty should contribute to the omissiveness by item interaction. This suggests the following proposition: Proposip'on 3 - Omissiveness interacts with ambiguity of item meaning, difficulty of item content, positive/negative stem wording, open/closed stem format, response option complexity, stem complexity, topic irrelevant item content, and item position to affect item responses such that the responses of respondents high in omissiveness are more adversely affected by these item characteristics than are the responses of respondents low in omissiveness. In typical test terms, respondents high in omissiveness are less likely to provide responses that accurately reflect their true standing on the construct of interest on items that possess characteristics such as stem complexity and topic irrelevant content than are respondents low in omissiveness, while this difference does not exisr (or is less profound) for items that do not possess these characteristics. The nature of these interactions should be similar to that presented in Figure 3 and will therefore not be depicted graphically. 31 Field articulagtiognbv item chacteristics. It was said earlier that cognitive controls are involuntary ways of approaching and interpreting complex situations or stimuli, and that one of these controls, field articulation, is the extent to which a person is able to pick out certain relevant aspects of a complex stimulus or situation to the exclusion of other, superfluous aspects. If respondents vary with respect to field articulation, then field articulation by item characteristic interactions would be possible for those item characteristics that serve to distinguish among respondents with different levels of field articulation. In particular, those item characteristics that contribute to the complexity of the item/stimulus should interact with field articulation to affect item responses. For example, it might be expected that respondents low in field articulation (i.e., field dependent respondents) would have more difficulty with word problems (i.e., items embedded in a context) than would respondents high in field articulation. This suggests the following proposition: Proposition 4 - Field articulation interacts with item stem complexity, topic irrelevant item content, and response option complexity to affect item responses such that the responses of respondents low in field articulation are more adversely affected by these item characterisrics than are the responses of those persons high in field articulation. In typical tesr terms, respondents low in field articulation are less likely to provide responses that accurately reflect their true standing on the construcr of interest on items that possess these characteristics than are respondents high in field articulation, while this difference does not exist (or is less profound) for items that do not possess these characteristics (e. g., items with 32 simple stems and no topic irrelevant content). The nature of these interactions is also expecred to be similar to that presented in Figure 3 and will therefore not be depicted graphically here. Response bias bv susceptibilitv to bier; It was said earlier that response biases are response tendencies (such as central tendency, extreme response bias, etc.) of a respondent unsure of correct answers based on whim or habit. and that if correCt response options were evenly distributedabout the response positions, then the response bias will affect item responses only in a random way, and there should be no effect on test scores. Research has shown, however, that correct responses often are not evenly distributed (Metfessel & Sax, 1957, 1958). Therefore, a given response bias, although whimsical, can lead to higher or lower test scores if the items in the test are susceptible to such bias. This suggests the following proposition: Proposition 5 - Response bias interacts with susceptibility of a test to response bias to affect item responses such that respondents who possess a particular response bias receive test scores that are higher than those of respondents who do not possess the bias if the test is loaded with items whose correct options are in positions that are likely to be chosen by the respondent with the bias. If the test is loaded with items whose correct options are in positions that are not likely to be chosen by the respondent with a particular bias (e.g., many items with correct answers in the extreme options given to a respondent with a central tendency bias), then that respondent receives a lower test score than the respondent without the bias. In typical test terms, a respondent who possesses a particular response 33 set is less likely to provide responses that accurately reflect her/his true standing on the construct of interest on a test that is loaded with items whose options that are "correct" for that person (i.e., that best reflect the respondent’s true standing on the construct of interest) are in positions that are unlikely to be chosen by the respondentwith the particular bias than is a respondent who does not possess the set, while this difference does not exist (or is less profound) on a test that is not loaded with such items. The nature of this interaction is presented in Figure 5. 34 10.: was 3 38: he 3:538?” .23 SE 8.8%“: 283.2. 338.25 3.835 .m 2%.". «Em o. .Emomsm 3%.. ...mm... It; mSm o o- o o o o o o o o o a o o o o o o O o o o O o o o o o o o o o o. o C. O. O ham... ...mz.<0< wfim I. 25.. i. 20.: mmmcoammm Em: 35 Figure 5 suggests first that, for respondents with no response biases, susceptibility of test items to bias has no effect on item responses. For those test takers whose response bias is contradictory to the susceptibility of the items (i.e., a respondent with a predilecrion for extreme option positions who takes a test with a preponderance of correct options in the middle positions), susceptibility should have a negative effect on item responses, while the opposite should occur for respondents whose bias is in line with the bias of the test. Test wiseness by susceptibilitv to wiseness. It was said earlier that if the response tendencies of a respondent unsure of correct answers are based on the characteristics of the test or testing situation, then they reflect test wiseness. Any characteristic of items that tends to elicit this test wiseness in those respondents who possess some degree of test wiseness should interact with wiseness to affect item responses. For example, consider the following item from a test of metric system knowledge (assume that respondents are instructed to select the single "best" answer to each question): Which of the following is the unit of measurement closest to a unit in the English system? a. one—thousandth of a liter b. one milliliter c. one centiliter (1. one decaliter 36 Now consider the possible answers of two respondents of equal ability, one of whom is high in test wiseness. the Other low. with neither possessing the content knowledge necessary to answer this question. The respondent low in test wiseness has no recourse but to guess blindly, with a probability of a correct response equal to .25. The respondent high in test wiseness. however. can use the fact that options a and b are equivalent to discard both of them as possible correct responses. Thus, this respondent has only two options to guess from. with a probability of correct response equal to .50. It can be said that this item is susceptible to test wiseness because one of the more commonly identified aspects of test wiseness, deduction, can be used to increase one’s chances of responding correcrly to the item. If response b had been "one liter" instead of "one milliliter", then the susceptibility to test wiseness of the item would be removed, and the probabilities of correct responses for the two respondents would be equal. This suggests the following proposition: Proposig'pn 6 - Test wiseness interacts with the susceptibility of items to test wiseness to affect item responses such that test wiseness leads to higher probabilities of correct responses on items that are susceptible to test wiseness, but is unrelated to the probability of correct response for those items that are not susceptible to test wiseness. To the extent that some form of benefit accrues to respondents with particular profiles on typical tests (e.g., a personality test used as a selection instrument), this interaction is expected to hold for typical tests as well. This interaction is presented in Figure 6. 326$? 32 2 2:0: .6 E53383 95 $2823 58 503.2. 5382:. c8255 .0 2:wE menace? 2 .Eoomam 37 :9: 25.. _ _ c E. 26.. l :9: newconmom Em: 38 As can be seen in Figure 6. the proposed relatiOuships among wiseness, susceptibility of items to wiseness. and item responses are identical to those among bias with the test. susceptibility tq bias. and item responses. Specifically, test wiseness is beneficial to the "test wise" on items that are susceptible to wiseness and has no effect on items that are not. Topic irrelevant abilitv bv topic irrelevant item content. It was discussed above that a test is inappropriate as a measure of the consu'uct of interest to the extent that it measures a construct orher than the construct(s) of interesr. Any item characteristic which affects the relationship between the topic irrelevant ability of a respondent (e.g., verbal ability on a math test) and item responses can be said to moderate that relationship. For example, there is no reason to suspect that verbal ability would have an effect on a respondent’s answer to a calculation problem in mathematics (i.e., a problem with no words, just numbers, such as 2 X 4). There is, however, reason to suspect that verbal ability would have an effect on a respondent’s answer to a verbally phrased math problem (e.g, What is the product of two and four?). This suggests the following proposition: Proposition 7 - Topic irrelevant ability interacts with topic irrelevant item content to affect item responses such that the responses of respondents low in topic irrelevant ability are more adversely affected by topic irrelevant item content than are the responses of respondents high in topic irrelevant ability. In typical test terms, respondents low in standing on the irrelevant construct are less likely to provide responses that accurately reflect their true standing on the construct of interest on items that are high in topic irrelevant content (i.e., content that matches 39 the irrelevant construct than are respondents high in the irrelevant construct, while this difference does not exist (or is less profound) for items low in topic irrelevant content. This interacrion is presented in Figure 7. 40 52:00 “Ego—0.: .538 285.2: can; :9: as“: . ..=m< rimm— OEOH 254 ..=m< imam: CECE. 10.: _ 9E2 23 bags Ego—2: 032 5833 c2832.: cacao:— o .N. 2:?”— 125.— .I 10.: mmmcoamom Em: 41 Figure 7 shows that topic irrelevant content may have a slight negative impact on item responses for test takers high in topic irrelevant ability, but a large negative impact on responses of test takers low in topic irrelevant ability. Need for approval bv item characteristics. The final form of Type I inappropriateness to be discussed here involves need for approval. Specifically, certain types of items are more likely than others to elicit responses that reflect the need for approval of the respondent (This body of literature is discussed in more detail in the section titled, "Measures of Type P Inappropriateness). While need for approval is usually studied as one potential determinant of responses to typical test items, the responses to maximum test items that are the result of cheating seem to be psychologically equivalent to responses on typical test items that are the result of need for approval (as it has been conceptualized). In the maximum domain, certain items on a test might be more susceptible to cheating than others. For example, items at the ends of columns on bubble sheets might be easier to identify for someone copying answers than would items surrounded on all sides by other items. In the typical domain, items have been found to vary with respect to social desirability (Marlowe & Crowne, 1964). For both maxinium and typical tests, items may vary in the extent to Which they reflect need for approval. This suggests the following proposition: Proposit_ion 8 - Need for approval interacts with the characteristics of an item that might serve to reflect such a disposition to affect item responses. In the maximum domain, items which provide more of an opportunity to cheat result in inflated scores for respondents who are high in need for approval but not for 42 respondents low in need for approval. Items which provide little or no opportunity to cheat reflect no such difference across respondents. In‘the typical domain, respondents high in need for approval are less likely to provide responses that accurately reflect their true standing on the construct of interest on items that are high in social desirability than are respondents low in need for approval, while this difference does not exist (or is less profound) for items low in social desirability. This interaction is presented in Figure 8. 43 ma . ch 30 c an Em. .—u o. h 55: ammo 6:5 335% BO: . .2— = . 0:: or—fim . n m& _¢sz_ > — m_n_m_—U O _ o E n_< mo“. ommz 2,0 ._ OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO .._<>O Emm< m on. 952 :9 I .l :9: C Gamma E 0: 44 Figure 8 shows that opportunity to display need for approval, whether the opportunity be a clear view of a neighbor’s paper or an item high in social desirability on a typical test. has no effect on the responses of test takers low in need for approval but a large positive impact on the responses of test takers high in need for approval. Seg’on sucmmarv. In this section, I reviewed some of the forms that Type I inappropriateness can take and suggested specific propositions about the form of interactions between personal characteristics of respondents and item characteristics. If these propositions are mapped onto the model presented on page 6 along with the sources of Type P inappropriateness discussed earlier in this paper, a new model (Figure 9) emerges. 45 32.222595 2 .222 a2: 3 82o%£80.: he 355.520.. 9... .8 .25.: 20:53. < .m 2:3... . . haw—.200 2W: h2<>m4wc§ 0.8» .h Egon. Empu .o 20:30.". 2w: .u Suva OmmOagmmO .v thm w). pfiOEw>F5.ng .. wogmamhogzzo 2w: c930 > 5:032: 525200 2m : mmmZOmmmm Em: omewawio d mmwzmwwawso .s mmemmi' hmwSSQ meOmec .o 20:<50:=< Cami .u >.—w_xz< bmmh .v hum wmzonmmw: mfiwc—Xw .a 45,019"? con. .9wa .« 4<2w302wowm500< .— memepogg .220“ch «892:an .— wwmmhz. ".0 #0295200 46 In this model, all personal characteristics of respondents have direct effects on item responses, and all of these except the impact of the standing of the respondent on the construct of interest represent Type P inappropriateness. Also, many of the item characteristics moderate the effects of the personal characteristics, and these moderating effects represent Type I inapprOpriateness. The next section of this paper deals with the measurement of Type P and Type I inappropriateness. Measures of Type P inappropriateness The sources of Type P inappropriateness have direct, simple effects on test scores. They can be studied as main effects. As such, they can often be measured directly. I now describe the various ways that sources of Type P inappropriateness have been measured. Where relevant, I discuss differences between measures for Maximum and typical tests. Acquiescenceldenigl and eigeme response set/bias. Since these two sources of inappropriateness have been measured in similar ways, they will be treated together. The general method for assessing acquiescence/denial or extreme response set and their effects on test scores (both Maximum and typical) has been to compare the number of response options in question that were endorsed (e.g., the number of "agree" responses for acquiescence, the number of extreme options for extreme response set) to that expected by chance (Humm & Humm, 1944; Jackson & Messick, 1958). The problem with this method is that deviation from a chance model may simply reflect the actual standing of the respondent on the construct of interest. The solution to this problem for acquiescence was to reverse the wording of some items such that a person could contradict him/herself by agreeing categorically to all items. 47 Although no such solution has been devised for extreme response set. it is generally accepted that extreme response set. because it reflects an exaggeration of the true level of the respondent on the construct of interest instead of a complete distortion (as is the case with acquiescence), is mm as serious a problem (Cronbach, 1950). Need for approval. One of the earliest response sets to be identified was the tendency to "Fake good" on personality tests (Ruch, 1942), interest batteries (Gehman. 1957), and even projective tests (Henry & Rotter, 1956). This tendency was often studied but little understood until the work of Crowne and Marlowe (1964). These authors linked the tendency to Fake good to an involuntary need for approval that led certain respondents to display themselves in as favorable a light as possible. Their measure of need for approval (the Crowne-Marlowe Social Desirability Scale), which has been incorporated into many of the more prominent personality tests, such as the MMPI, involves True-False questions that reflect large amounts of social desirability but which should be answered in only one direction by virtually all respondents who are responding honestly. As Crowne & Marlowe aptly describe such items, " First, they are "good", culturally sanctioned things to say about oneself, and second, they are probably untrue of most people (p.210.)." Also included are items which are undesirable but probably true. An example of the former type would be, "Before voting, I thoroughly investigate the qualifications of all the candidates." An example of the latter would be, "I sometimes feel resentful when I don’t get my way." Although the first item contains a most admirable quality in a person, it is assumed to be false for the vast majority of respondents who are responding honestly. Likewise, although the second item may not be admirable, it is probably true of most people. In this way, the Crowne-Marlowe Scale 48 and others like it (e.g., MMPI F-Scale; Edwards, 1957; Hartshome & May, 1928) seek to identify those people who are attempting to provide a profile of themselves that is "socially desirable" instead of accurate. The question that remains is, What do we do once we have identified a person who may be responding in this fashion? One approach has been to retest them in an attempt to get better measures of the constructs of interest. This approach can be used for both typical and Maximum tests. A second approach for typical tests has been to try to correct scale scores based on Social Desirability scores. Test Anxieg. Although test anxiety has existed as a concept in psychology for at least forty years, it has been measured almost exclusively with the Mandler & Sarason (1952) measure, which has withstood much scrutiny (see discussion above on Test Anxiety). Nevertheless, Morris, Davis, & Hutchings (1981) developed a measure of test anxiety which appears to improve upon the Mandler & Sarason (1952) measure by tapping both the cognitive and emotional aspects of test anxiety. When test anxiety is identified as having an impact on a given individual’s test score, several methods for decreasing the anxiety can be employed. For example, test anxiety has been shown to decrease as a function of instructional method (Tobias, 1979), study habits training ( Naveh-Benjamin et al., 1987), feedback (Campeau, 1968), and training in positive affective responses (Watson & Clark, 1984). Cog_n_itive controls. Because there are several different cognitive controls that have been identified in the literature, there are dozens of cognitive control measures. The present study focusses solely on field articulation. For this reason, only measures of field articulation will be considered. The two most commonly used measures of field 49 articulation are the Embedded Figures Test and the Rod and Frame test. Both tests involve the identification of a figure of some kind that is embedded in a larger visual context. Thus, the high field articulation individual can sort through the irrelevant, contextual features and identify the figure. The low field articulation individual ( or field dependent individual) has difficulty separating figure from context. One additional measure of field articulation was discussed in Broverman et a1. (1968). These authors explained sex differences in field articulation with certain neurological differences that are, in turn, caused by hormonal differences between the sexes. Although these neurological differences could, perhaps, be used as measures of field articulation, there has been no such attempt reported in the literature. Although there is a small amount of research which suggests that field articulation does respond to training (Klein, 1967), it is difficult to assess the extent to which such training would really be helpful in sorting these effects out of scores on tests designed to measure other constructs. One option would be to partial out scores on tests such as the Embedded Figures from scores on tests of interest. Another option would be to eliminate items that are likely to contain a field articulation component, such as word problems. This second option, however, involves the interaction between field articulation and item characteristics, and will therefore be dealt with in the section on Type I inappropriateness. Response Biasfljest wiseness. The distinction made earlier between response bias and test wiseness involves the rationale behind one’s response strategy. Although the results of such a rationale (or lack of) can sometimes be identified through analyses similar to those used to identify extreme response set (of, Fagley, 1987; Lawrence, 1957; Gaier et al., 1953), the rationale has been identified only with measures that are external 50 to the test of interest. One such measure has been that of Gibb (1964). This measure simply asks a respondent about the test strategies that he/she uses when taking a test, such as time-using strategies, error-avoidance strategies, guessing strategies, deductive reasoning strategies, estimation of instructor intent, and cue usage. Sarnacki (1979) used this measure to identify individual differences with respect to many of these strategies. Carelessness. There are a variety of carelessness measures, but most of them have a similar form. Such measures contain items which, in one way or another, can be considered nonsense for the respondent. The nonsense content suggests that every respondent who is paying attention should respond in a particular way. For example, items from the MMPI K-scale or the Comrey Validity Check Scale might ask the question, "Have you ever been to the movies? Yes, Not sure, No" (Nonrandom Response Scale, Hough etal., 1990), to which, it is assumed, anyone who is paying attention should respond Yes. Such measures should "catch" any respondent who is responding carelessly regardless of the reason for the carelessness, be it lack of motivation, miscoding of responses, misunderstanding of the question, etc. Carelessness, insofar as it is a function of the motivation of the test taker, can also be manipulated indirectly by manipulating motivation. The higher the motivation of the respondent, the less carelessness the respondent will exhibit. Section sum. In this section, measures of the sources of Type P inapprOpriateness were briefly reviewed. These measures fall into two categories. The first category consists of those measures that are separate from the tests or items of interest. Examples are the Crowne-Marlowe Social Desirability Scale and the "Lie" scales 51 of the MMPL For the most part, the measures of Need for approval, test anxiety, field articulation, test wiseness, and carelessness fall into this category. The second category consists of those measures that result from reanalysis of data from the tests or items of interest. For example, extreme response set is typically measured by comparing the position of one’s item responses to a chance model. There is no separate measure involved. For the most part, the measures of Acquiescence/denial, extreme response set/central tendency, and response bias fall into this category. Measures of Tm I inappropriateness Because Type I sources of inappropriateness are interactions, they are more subtle than Type P sources and, therefore, more difficult to detect than Type P sources. As a result, the measures of Type I inappropriateness must also be more subtle, or at least more complex. The measures must be able to detect changes in the effects of personal characteristics on item responses that are due to changes in item characteristics. No measures with these properties have been recognized, but such measures do exist, they simply haven’t been recognized. These measures can be divided into two groups: those based on Item Response Theory and those based directly upon the pattern of right and wrong answers (Harnisch & Linn, 1981). These groups can be further divided into those for which some attempt to standardize has been made and those for which no such attempt has been made. More accurately, it can be said that both [RT-based and non-IRT based indices vary with respect to the extent to which they have been standardized relative to the total score (or theta) of the respondent. The meanings of these groupings are discussed in the sections 52 that follow. For more thorough reviews, see Hamisch & Linn (1981), Rudner (1983), and Birenbaum (1985). Non-[RT based indices of Type I inappropriateness (unstandardized). One example of an unstandardized, non-IRT based index of Type I inappropriateness is Sato’s Caution Index (1975). The formula for Sato’s index is “L J 2(1—upnr X uvn’ In] 1'31.” C .. ‘ J "1 [231": 2224-71 — y-r where i = 1,2,...I, indexes the examinee j = 1,2,...I, indexes the item uij = 1 if examinee i answers item j correctly and 0 if examinee i answers item j incorrectly n1L = total correct for the ith examinee n]. = total number of correct responses to the jth item The name of the index comes from the idea that a large value indicates an unusual response pattern and, therefore, that caution should be used in interpreting the total score of this respondent (Harnisch & Linn, 1981). There are other such non-IRT based indices of Type I inappropriateness (e.g., the agreement/disagreement indices and the dependability index of Kane & Brennan, 1980; van der Flier’s U (van der Flier, 1977) and its equivalent, the Nonconformity index of Tatsuoka & Tatsuoka (1980)). but they are 53 generally highly correlated with one another (Hamisch & Linn, 1981; Rudner, 1983) and the rationale is similar for all of them. Essentially, these indices answer the question, To what extent do the item responses of a given respondent conform to the item difficulties (as calculated from the total sample of examinees)? In terms of Type I inappropriateness, these indices answer the question, To what extent is there something about this respondent that renders the item difficulties invalid for this respondent? In other words, To what extent is there an interaction between the personal characteristics of the respondent and characteristics of the items? This question applies to both Maximum and typical tests. The only difference is that the notion of item difficulty in Maximum tests should be replaced in typical tests with some measure of the percentage of the sample endorsing a given response option. Another way of describing the logic of these indices (as well as the IRT-based indices) is in terms of Guttman vectors. A Guttman vector is simply a vector of zeroes and ones in which all of the ones precede all of the zeroes. If a respondent were to respond to items in a way that matched perfectly with the difficulties of the items (and if there were no possibility of guessing), then the responses of that respondent, when ordered in terms of item difficulty, would form a perfect Guttman vector. The idea is that the respondent answers all items at or below a certain difficulty level correctly. At some point, however, the difficulty becomes too great for that respondent, and all items above that level of difficulty are answered incorrectly. If this were the case, then this respondent should receive a perfect score on the appropriateness index. One reason for a departure from this perfect Guttman vector is that a respondent is guessing some items correctly. In a four Option, multiple choice test, we would expect 54 a respondent to guess correctly 25% of the items that are too difficult for them. A second reason for a departure from a perfect Guttman vector is an interaction between personal and item characteristics. Consider the following. Item difficulties are calculated on an entire sample of scores. If the responses of a given respondent do not conform to those difficulties (for reasons other than guessing), then there is some characteristic of that individual respondent that is giving that person an advantage over the group on some items and/or a disadvantage over the group on other items, with the result being that the person answers correctly some items that should be too difficult for that person while answering incorrectly some of the easier items. The result is a vector of item responses (ordered by difficulty) like the following: 11111111100001111010 On the one hand, it would appear that the items became too diffith for this respondent after the ninth item in this order. On the other hand, this respondent did very well on the last seven items, items that the sample on which the difficulties were based found to be most difficult. There are two possible explanations (other than guessing). The first is that the content of the items beyond the ninth item was too difficult for this respondent, but something about this respondent (e.g., possession of a cheat sheet, a quick view to a neighbor’s paper) gave him/her an advantage over the rest of the sample on the last seven items. The second explanation is that this respondent is actually of very high ability (or whatever construct is supposed to be measured with these items) but was at a disadvantage on the items in the middle of this row (perhaps because of coding alignment 55 errors, misinterpretation of items, etc.). Either way, there is an interaction between the respondent and some characteristic of the items. The problem is that we have no way of knowing which is the correct explanation. In other words, we have no way of knowing the true standing of this respondent on the construct of interest, i.e., the test is inapprOpriate as a measure of the construct of interest. It is important to note that, if the advantage or disadvantage of the respondent does not produce inconsistency (i.e., does not force a departure from a Guttman vector), then none of these indices (IRT-based or otherwise, standardized or otherwise) will detect it. However, if no inconsistency is produced, then there cannot be a person by item interaction. Instead, there would be a simple main effect for the personal characteristic, and the inappropriateness would be Type P inappropriateness and not Type I inappropriateness. Sato’s Caution index and others like it are designed to detect just this sort of interaction. The main problem with these indices is that they are highly related to total score. Specifically, respondents with very high or very low total scores are more likely to be identified as aberrant because there is more room for aberrance. For example, a respondent with a very low total score who happens to answer one or two of the more difficult items correctly will likely receive a large score on any index that is not well- standardized simply because those one or two item responses are so inconsistent with the total score of the respondent whereas the same situation applied to a respondent with an average score will produce an index value that is not nearly as extreme. Since the goal of inappropriateness measurement is to measure inappropriateness independent of total score (or theta), this is seen as a disadvantage to poorly standardized measures. 56 IRT-bgsed indices of Type I inappropriateness (unstandardiped). There are many [RT-based indices of Type I inappropriateness, and they are usually based either on the Rasch model (one-parameter, 1960) or the three-parameter model (Hambleton & Cook, 1977). These indices address the question: To what extent do the responses of a given respondent conform to the Item Characteristic Curves of the items in a test? In terms of Type I inappropriateness: To what extent is there something about this respondent that renders the Item Characteristic Curves invalid for this respondent? In other words, To what extent is there an interaction between the personal characteristics of the respondent and characteristics of the items? This question also applies to both Maximum and typical tests. In the Rasch model approaches, the ICC’s differ only with respect to the difficulty parameter, whereas in the three-paremeter model approaches, the ICC ’s differ with respect to difficulty, discrimination, and the pseudo-guessing parameter. An example of an index based on the Rasch model is the unweighted total fit mean square (U I) discussed in Wright and Panchapakesan (1969). The formula for U1 is P 1-P U1,-" ”(N V) where i indexes the examinee, j indexes the N items, Pij is the probability of a correct response predicted by the Rasch model, and uij is the observed item response. 57 This is essentially a measure of the average discrepancy between the observed responses of a given examinee and the responses predicted by the model. The larger the discrepancy is, the more caution should be used in interpreting the total score, i.e., the greater the degree of inappropriateness. An example of an index of Type I inappropriateness based on the 3-parameter model is the 10 index described by Levine and Rubin (197 9). The formula for 10 is N WEI] PHI 4’9“”) 1-1 where Pi]. is the probability of a correct response based on the three parameter model and uij is the observed response. This is the log of the compound probability of the observed response pattern for a maximum likelihood estimate of ability (Rudner, 1983). The rationale for this index is similar to that of the U1 index: 10 is a measure of the discrepancy between the observed responses and the responses predicted by an IRT-model, specifically, the three-parameter model. 10 is perhaps the most widely cited index of Type I inappropriateness. The problem with these two indices, as with the non-[RT based indices discussed above, is that they are poorly standardized, that is, they are highly related with total score (Rudner, 1983; Birenbaum, 1985). The solution is, of course, to attempt to develop indices that are well-standardized and, therefore, relatively unrelated to total score. The next two sections are devoted to just such indices. 58 Standardized non-IRT based indices of Type I inappropriateness. One example of a standardized, non-IRT based index is the personal biserial of Donlon and Fischer (1968). Although this index has been shown to be related to total score, it is useful as an illustration of the meaning of standardization. The personal biserial coefficient is simply the biserial correlation between the dichotomously scored item responses of a given respondent and the difficulties of those items. The reason I claim that this index is standardized is that it is the Mal correlation. Since the biserial correlation is insensitive to the variances of the variables being correlated, the total score of a given respondent, which is based on the proportion of correct responses given by the respondent, should have little effect on the personal biserial (as opposed to the point biserial correlation between responses and difficulties). As was mentioned above, the personal biserial has been found to be highly related to total score, which means that it is not well standardized. A better example of a standardized, non-IRT based index is the Modified Caution Index (MCI, Hamisch & Linn, 1981). In‘ I =20— '- 2 Iu .C,‘ 91-1 WT? 7-»th a": if +2 - n - . :n 131. 1 war». I where the symbols are the same as those in Sato’s original index (Equation 1) 59 This is simply Sato’s Caution Index (Sato, 1975) modified to yield a lower bound of O and an upper bound of 1, thus eliminating the extreme scores that can be obtained on the caution index for very high scoring examinees who miss a single very easy item or for very low scoring examinees who answer correctly a single very difficult item. The MCI index has been found to have little or no relationship with total score (Harnisch & Linn, 1981; Rudner, 1983). It is, therefore, considered to be a well-standardized index. Standardized IRT-baped indices of Type I inappropriateness. These are by far the most commonly used and studied statistical indices of appropriateness and, therefore, the most common indices of Type I inappropriateness. Two examples of such indices are the standardized extended caution index of Tatsuoka and Tatsuoka (1982) and the standardized 10 index (1,) of Drasgow, Levine, and Williams (1985). Since these two have been found to be highly correlated (Birenbaum, 1985), and since 11 has been applied to a wider variety of situations (e. g., Drasgow, Levine, & McLaughlin, 1991; Drasgow, Levine, McLaughlin, Williams, & Candell, 1989), the 1, index will be the focus of the present paper. The formula for 11 is ,3 Irma " [Var(l.)] where 60 E(IQ=§[P,(9)1nPi(6) +[1-P,(6)]rn[1 -P,(e)]] and P,(9) 1 -P,(é) Vera.) =§P.(e)t1- .(en In 12 is approximately normally distributed with a mean of 0 and a standard deviation of l. A negative L indicates inconsistency of the pattern of responses. The literature on L has focussed exclusively on the negative form of the index, with an 12 value of -l.65 (i.e., the score from the standard normal distribution corresponding to a level of significance of .05 for a one-tailed test) indicating an aberrant response pattern. A positive 12 indicates hyperconsistency, or a pattern of responses that fits the IRT model so well that it is suspicious. Positive L’s have received virtually no attention in the appropriateness literature, and their meaning is largely unclear. The 11 index is a measure of goodness of fit of an IRT model to a particular response pattern. In other words, 12 measures the extent to which a given response pattern is determined by factors other than ability (or the noncognitive equivalent) and the parameters of the three-paremeter model. The study of appropriateness measurement with 12 has been almost completely statistical. There has been virtually no attempt to assess the construct validity of 11 or any 61 of the measures of Type I inappropriateness. It is known that L detects departure from the IRT model, and there have been numerous suggestions as to the possible causes of such a departure (e.g., cheating, coding errors, test anxiety), but no attempt has been made to model these causes as determinants of L. We know only that these indices measure the consistency of response patterns relative to item characteristics. As Reise (1990) points out, however, these indices tell us pf. response inconsistency, not 1111 the responses are inconsistent. As a result, we have no clear idea of what L and other similar indices are measuring. I maintain that these indices are reflective of person by item interactions and that viewing inappropriateness in this way will lead to a greater understanding of the construct that one intends to measure as well as the nature of inappropriateness. I have offered various sources of Type I inappropriateness and rationale for their effects on item responses vis a vis appropriateness, and suggest further that it is precisely these sources that are captured by L. In this way, I offer an assessment of the construct validity of L. Specifically, I suggest the following extensions of my earlier propositions: Proposition 1A - 8A - The L index becomes more extreme as the interaction between characteristics of the respondent (e. g., test anxiety, test wiseness, etc.) and characteristics of the items (e. g., ambiguity of meaning, complexity of response options, etc.) becomes more pronounced. Specifically, for all of the interactions involving respondent characteristics other than omissiveness, L becomes more extreme in the negative direction as the interaction effect becomes stronger. For the interactions that do 62 involve omissiveness, L becomes more extreme in the positive direction as the interaction becomes stronger. The interactions involving omissiveness are expected to produce positive L values because they are expected to produce hyperconsistency. Those respondents high in orrrissiveness are expected to omit those items that are too difficult for them, which means that they have no chance of answering them correctly, which in turn would produce a near-perfect Guttman vector (i.e., a hyperconsistent response pattern). To the extent that this set of propositions is borne out, the construct validity of the L index and other indices like it will be more firmly established. Also relevant to the issue of construct validity is the fact that L is not hypothesized to detect any of the sources of Type P inappropriateness. Since the sources of Type P inappropriateness alone _d_o_pg§ lead to inconsistency of response patterns, they should not be detected by L except insofar as they are related to their respective interactions. This suggests a final model of appropriateness as measured by L. 63 454k :1 AL J .3 3:32.508 2: we 35.: < .2 2:me .53 .500 SN: hz<>mam§= 030—. .5 P5393200 impm .o 20—bm0m 2m: .m Empw owwaniwn—O .v 2w 5 wan—wonthzowz .o >:xw.E200 20:35 meOQMm: .N #25200 2w: ".0 >.:=0.02< .— m0=mEmho_mm_20 .o wmwzmmw EC<0 .N mmewm_>> bmmthm wm20dmw: .0 zQ»<.50:.c< 04W“. .m >pw.xz< hwmp .c hum meOmmwfi waK—xw .o ._<>0m&< m0... ow! .u ._<.zw 0602m0mm500< .— mo. mewpogzo 4(20wzwm mDOwZéhxw BEE; do 52528 64 Overall summg Sources of test inappropriateness and research on these sources were discussed. Two types of sources were identified: those involving main effects for respondent characteristics on item responses (Type P) and those involving interactions between respondent characteristics and item characteristics (Type 1). Measures of these sources were also discussed. Sources of inappropriateness were discussed in terms of both Maximum and typical tests. Table 1 summarizes the sources of inappropriateness that were discussed. 5 6 0088080830 80: 050088 N .008 280 88.83088 N 30080 00.8 80. 0889000 8 88: .8 .5088 x 80 00.00800 0880 .8 .80 308 x 9?: H mm>H E8000 :8: #8885 0:0 052,800 000380 0200586 0880 9080005 330% @0389 08:00.80 E 30090 00 0800800 038088 080 808 8800 2880080 0300088 1005880013.:I0 8.808% m 05 a 8:80 00 0008080820 3800806 X 00003008 300 008888830 E80 09.283088 N .0858 008 83 0889000 08 $5088 80 338830 X and}. 280385..” gm El E8000 880 0588.5 05 88,800 80303 00805080 0.88 88 80800000 330% 8823a 0030003 5 88000 .8 «.3088 038088 mauaono 0:830 08885 .00 .850 06 .8 0:0 00 0080800 038088 185t8008n0. a. e. .o. 0.0. 0. 81m. am 4% 98 898882 son 8..“ 800880 2 an: 88303. 2.08 .882 080 88008 8820088300 00009000 080bxm 8889‘ 8.0 000 Z 3000 \00=00003_00< 100088 NUMDOm H 050.0. .8820 E0: . 83880808 88 839-8880 88 888 2 083-8880 88 82.80 9 6 X 0883888 080888 .8 M02 X 8883888 08098.— .8 #08 8883885 6 888.0 :8: 85880808 88 88-8880 032-8880 80882 X 88:60:. 88:08.8 80.080 X 888608 mama—0988 88080 8888889 8808 88mm? 9 E "8:083 :0 085 8 8808 08:00.? 8 8288 8 55080088 0089 880888 .8 002380088 :38 :0 .0080 80:08? 88h. X 80:85 .8 88080» X «8:08.303 880nt h8 88088 \83 8:088”— 888:0.810108550. 183880018. .:.0.. 30808084“ H m m X H 8.880% m an. €08088mw H mg 8.080% m an. 308d EDEHXSZ mUMDOm P800 ~ 035. 67 The present study The purpose of the present study was to test a part of this model. Specifically, I examined the effCCts on knowledge test scores and test inappropriateness as measured by l, of test anxiety, math anxiety (which can be viewed as a specific application of test anxiety. See p.15 for the reference to "number anxiety"), conscientiousness, carelessness. difficulty-based item characteristics such as positive negative wording, open/closed stem format, and response option complexity, and the interaction between each of the four respondent characteristics mentioned above and difficulty-based item characteristics. The details of the present study are described below. The specific hypotheses tested in this study are as follows. Hypothesis 1 - Difficulty-based item characteristics have a deleterious effect on knowledge test performance. Hypothesis ; - Respondents who are higher in conscientiousness have higher knowledge test scores than do respondents who are lower in conscientiousness. Hypothesis 3 - Respondents who are higher in carelessness have lower knowledge test scores than do respondents who are lower in carelessness. Hypothesis 4 - Respondents who are higher in math anxiety have lower knowledge test scores than do respondents who are lower in math anxiety Hypothesis 5 - Respondents who are higher in test anxiety have lower knowledge test scores than do respondents who are lower in test anxiety Hypothesis 6 - The effect of conscientiousness on knowledge test scores is moderated by the extent to which the knowledge test items contain difficulty: based item characteristics such that the knowledge test responses of those 68 respondents higher in conscientiousness are less adversely affected by difficulty- based item characteristics than are those of respondents lower in conscientiousness. Hypothesis 7 - The effect of carelessness on knowledge test scores is moderated by the extent to which the knowledge test items contain difficulty-based item characteristics such that the knowledge test responses of those respondents higher in carelessness are more adversely affected by difficulty-based item characteristics than are those of respondents lower in carelessness. Hypothesis 8 - The effect of math anxiety on knowledge test scores is moderated by the extent to which the knowledge test items contain difficulty-based item characteristics such that the knowledge test responses of those respondents higher in math anxiety are more adversely affected by difficulty-based item characteristics than are those of respondents lower in math anxiety. vaothesis 9 - The effect of test anxiety on knowledge test scores is moderated by the extent to which the knowledge test items contain difficulty-based item characteristics such that the knowledge test responses of those respondents higher in test anxiety are more adversely affected by difficulty-based item characteristics than are those of respondents lower in test anxiety. Hymthesrs 10 - The effect of conscientiousness on 11 values is moderated by the extent to which the knowledge test items contain difficulty-based item characteristics such that the 11 values will be negatively related to the presence of difficulty-based item characteristics in test items for those respondents lower in conscientiousness but relatively unrelated for those of respondents higher in 69 conscientiousness. In other words. difficulty-based item characteristics will lead to inappropriateness for those respondents low in conscientiousness but n0t for those respondents high in conscientiousness. Hypothesis ll - The effect of carelessness on 12 values is moderated by the extent to which the knowledge tesr items contain difficulty-based item characteristics such that 12 values will be negatively related to the extent to which difficulty-based item characteristics are present in test items for those respondents higher in carelessness but relatively unrelated for those of respondents lower in carelessness. In other words, difficulty-based item characteristics will lead to inappropriateness for those respondents high in carelessness but not for those respondents low in carelessness. Hypothesis 12 - The effect of math anxiety on 12 values is moderated by the extent to which the knowledge test items contain difficulty-based item characteristics such that 11 values will be negatively related to the extent to which difficulty—based item characteristics are present in test items for those respondents higher in math anxiety but relatively unrelated for those of respondents lower in math anxiety. In other words, difficulty-based item characteristics will lead to inappropriateness for those respondents high in math anxiety but not for those respondents low in math anxiety. Hypothesis 13 - The effect of test anxiety on 12 values is moderated by the extent to which the knowledge test items contain difficulty-based item characteristics such that 1, values will be negatively related to the extent to which difficulty-based item characteristics are present in test items for those respondents higher in test 7O anxiety but relatively unrelated for those of respondents lower in test anxiety. In Other words, difficulty-based item characteristics will lead to inappropriateness for those respondents high in test anxiety but not for those respondents low in test anxiety. Method Sample Subjects were 165 undergraduates from a large, midwestern university. 67% of the subjects were women. No other demographic data were collected. They were recruited from Introductory Statistics classes towards the end of the semester so that they had had an opportunity to learn most of the course material. Subjects were given extra credit for participation. This sample, while convenient, was also quite appropriate for the variables examined in this study. The general focus of this study was testing, with emphasis on the relationships among respondent characteristics, item characteristics, and construct validity of items. Since testing is a common part of most university educations, a sample of college students was ideal for the examination of factors that affect testing. 261311. The present study used a repeated measures regression design with four between subjects factors, one within subjects factor, and two dependent variables. The between- subjects predictor variables were Conscientiousness, Math Anxiety, Test Anxiety, and Carelessness. The within-subjects factor was Item Difficulty as determined by item construction principles. The dependent variables were scores on tests of statistical knowledge and the consistency of responses to items from those tests. Thus, eight 71 72 repeated measures regression analyses were performed, one for each combination of between-subjects variable and dependent variable. Measures Conscientiousness. Conscientiousness was assessed with four items from the twelve-item conscientiousness scale of the NEO-PI personality inventory (Costa & McCrae, 1991). These four items were the items in the scale which related directly to dependability (as opposed to organizational skills and goal orientation; see Appendix A). The Cosra & McCrae (1991) measure was used because it is one of the few questionnaire measures designed specifically to assess conscientiousness as defined by proponents of the Big Five theory of personality (e. g., Digman, 1990). Internal consistency reliability for the four items was estimated to be .68, suggesting that uniquenesses for the four items were acceptable (Cortina, 1993). Math Anxieg. Math Anxiety was assessed with the 11-item Math Anxiety Questionnaire developed by Wigfield & Meece (1988), which in turn was based on a measure which was originally developed by Richardson & Suinan (1972). This measure was used because it taps both the emotional and cognitive components of anxiety. Internal consistency reliability for the four items was estimated to be .85, suggesting that uniquenesses for these items were also acceptable (Cortina, 1993). Test Anxiety. Test Anxiety was assessed with the 10 - item Test Anxiety Scale developed by Morris, Davis, & Hutchings (1981). This measure improves upon earlier test anxiety 73 scales (e.g., Mandler & Sarason, 1952) which failed to tap both the emotional and cognitive components of anxiety. Internal consistency reliability for the four items was estimated to be .83, suggesting that uniquenesses for these items were also acceptable (Cortina, 1993). Carelessness. Carelessness was assessed with a six-item scale constructed by the author. These items were similar to the "nonsense" items included in many noncognitive tests in that they were designed to produce a particular response from any respondent who pays attention to (i.e., reads) the item. The unique aspect of the items that make up the carelessness scale used in the present study is that they are not easily recognized as items which tap carelessness. Typical carelessness items are absurdities which can be recognized by respondents who are merely scanning items. Such identification can have a deleterious influence on test taking motivation. The items used in the present study were statistical knowledge items that were answered correctly by all respondents during all pretesting situations (details are described below). Items such as Another word for the average is a) mean b) variance c) standard deviation d) range 74 should be answered correctly by any Introductory Statistics student who reads the question, as was the case during pretesting. Any variability in responses to such items should be due only to carelessness. Mfics knowledge test. All subjects were administered the 75-item test of statistical knowledge contained in Appendix B. The items on the statistical knowledge test were items typically found on exams for Introductory Statistics classes. Items with three levels of content-irrelevant difficulty were developed. 24 items were open—stemmed, negatively worded, contained complex response options, and had complex stems (e. g., word problems), using the definition of stem complexity from Zimmerman (1954). These were the "Difficult" items. The following is an example of one of the "difficult" items: Difficult item - Suppose I know the number of times each Michigan resident has been swindled by Gov. Engler (So, I have access to this population of scores). I then take many different samples of 15 people each and calculate the mean for each sample. If the mean of the means were 7.8 and the standard deviation of individual scores were 2.2, the population mean and the standard error of the mean would not be a. mu = 2.79, sigma = .57 b. mu = 7.8, sigma = 2.2 c. either a or b d. all of the above 75 25 Moderately difficult items were similar to the Difficult items except that they were positively worded and closed stemmed. The following is an example of one of the "moderate" items: Moderate item - For some strange and terrible reason, I am interested in knowing the average number of white collar crimes committed per day by Sen. Bob Dole over the past 2000 days. In an attempt to estimate this value, I randomly choose twenty days from these 2000, count the number of white collar crimes he committed on each of those 20 days, and get the average of those twenty numbers. What is the statistic that I have used? a. The average number of white collar crimes per day committed by Dole over the past 2000 days. b. The average number of white collar crimes committed per day by Dole over the 20 days that I measured. c. 2000 d. all of the above 26 Easy items were similar to the Moderately difficult items except that the stems were noncomplex and there were no complex response options. The following is an example of one of the "easy" items: Easy item - Which of the following is an advantage of the mean as a measure of central tendency? a. It is greatly affected by extreme scores 76 b. It can be manipulated algebraically c. It is not greatly affected by extreme scores d. It is diffith to calculate Participants’ scores on the items within each level of difficulty were collapsed to form single variables for the regression analyses involving knowledge test scores described below. Difficulty as defined in this paragraph refers to aspects of the item format that are thought to decrease the proportion of correct responses independently of the examinees’ knowledge of the domain being assessed. An attempt was made to equate items with respect to content difficulty (i.e. construct relevant difficulty). The reason for this is that the item characteristics that are of the most concern to test constructors are those over which they have direct control. While most tests should and do vary with respect to content difficulty, other item characteristics such as option complexity and stem complexity can and should be controlled, especially if they foster aberrant response patterns. Items for the statistical knowledge test were chosen from a pool of test items that had been administered to undergraduates as items in actual tests. Specifically, 75 items which contained none of the difficulty-inducing item characteristics mentioned earlier (e.g., complex response options, negative wording, etc.) were chosen and distributed randomly into one of the three groups. Inspection of the item difficulty values (percentage incorrect) calculated from these previous testing situations showed that the average item difficulties (proportion answering the item incorrectly) within the three groups were almost identical (.26 for items which were to be used for the "easy" test, .23 for items which were to be used for 77 the "moderate" test, and .25 for the items which were to be used for the "difficult" test), suggesting that these three groups of items were virtually identical with respect to content-relevant difficulty. Differences in test scores across difficulty levels as defined above can then be attributed to the manipulations of item characteristics. The measure of statistical knowledge used was proportion of correct responses. Response Consistency - Response consistency was assessed with the 11 index of Drasgow et al. (1985). The 12 index requires the calculation of the item parameters of the three-parameter IRT model. These parameters were calculated from the responses of subjects to the 75 test items. Procedure Subjects were first approached during their statistics classes and asked if they would be willing to participate in the experiment. Those who agreed were asked to sign up for a testing date as well. The 75-item statistics knowledge test and the four tests measuring respondent characteristics were administered to large groups of subjects at a time. The measures of conscientiousness, math anxiety, and test anxiety were administered first, followed by the knowledge test. Two forms of the test were created. The two forms differed only in that the order in which the items were presented was reversed. Within each form, item order was random. The purpose of generating two forms was to allow an examination of order effects. Neither knowledge test scores nor 11 values differed across the two forms. The items measuring carelessness were embedded within the knowledge test. Since all 75-items were administered to all subjects, all three levels of difficulty were experienced by all subjects. In an attempt to increase motivation to respond carefully, the 78 test administrators explained to subjects that the test results would be used by the Instructor to evaluate his own teaching performance. They were also told that the test provided an opportunity to practice for upcoming tests in the class. Finally, $100 was awarded to each of three of the top performers on the knowledge test. Data Analysis After establishing the unidirnensionality of the knowledge test, the responses of subjects to the test were analyzed with the BILOG IRT computer program (Mislevy & Bock, 1990). This analysis yields both ability estimates for respondents (9) and item parameter estimates that are necessary to compute 12 as outlined above in the introduction. Hypotheses were tested with repeated measures hierarchical regression (RMI-IR: Cohen & Cohen, 1983; Hollenbeck, Ilgen, & Sego, in press). As there was no a_p_r3>;i_ rationale for investigating the effects of the four between-subjects predictors in conjunction with one another, separate regression analyses were performed for each of the four between-subjects variables (conscientiousness, math anxiety, test anxiety, and carelessness) and each of the two criteria (percentage of knowledge items answered correctly and 1,) for a total of eight regressions. In each regression, the dependent variable (knowledge test scores or 1,) was regressed onto one of the between-subjects factors and the within-subjects factor. The details of this procedure are described below. It was expected that the interaction between item characteristics and each of the four between-subjects factors would explain a significant portion of the relevant variance in knowledge test scores above and beyond that explained by the main effects for the predictor variables, and that insofar as these interactions were significant, they would also explain relevant variance in 1,. Results Tests measuring respondent chargctersitics Table 2 contains means, standard deviations, intercorrelations, and intemal consistency estimates for the Conscientiousness, Math Anxiety, Test Anxiety, and Carelessness Scales. Means, standard deviations, and intercorrelations are presented for 1,. The values presented for the knowledge tests refer to the knowledge tests composed of the items that remained after the initial BILOG analysis (see below for details of this analysis). 79 80 Table 2 Descriptive stgtistics for all tests and L’s lflL__..§__ s! .l 1. Conscien. 15.86 2.39 .68 2. Math Anx. 35.33 8.16 .85 -.05 3. Test Anx. 18.51 5.94 .83 -.16* 4. Careless. 5.32 .87 .28 .01 5. Easy Test .57 .16 .69 .01 6. Mod. Test .54 .17 .70 .11 7. Diff. Test .46 .16 .64 -.06 8. 12 (easy) .09 1.01 -.07 9. 12 (mod) .05 .92 .04 10. 12 (diff) -.06 1.02 .08 IN .48* -.O6 -.21* -.22* -.l7* .08 -.03 .08 I03 -.04 -.12 -.1O -.08 -.02 -.05 .03 A. .5 .37* .36* .57* .44* .51* 415 410 a14 .04 mos 406 Table 2 cont’d 7. Diff. Test 8. 12 (easy) 9. 12 (mod) 10. 12 (diff) Q l 55* -.26* -.01 .04 -.05 .03 -.17* 81 loo 1&0 -.O7 -.14 .10 * - p<.05 82 There are several points to be made with respect to this table. Regarding the measures of respondent characteristics. there was reasonable variability on the Conscientiousness, Math Anxiety, and Test Anxiety scales. Also. the means for these scales were comparable to those reported in previous literature (e.g., Morris et al., 1981; Wigfield & Meece, 1988; Costa & McCrae, 1988). There was considerably less variability in the Carelessness measure, but this is not surprising given the simple nature of the questions in the scale. Internal consistency estimates for the conscientiousness. math anxiety, and test anxiety scales were adequate suggesting acceptable levels of item uniqueness (Cortina, 1993). Although the estimate for the conscientiousness scale in the present Study was lower than those presented in previous research, this is not surprising given the fact that the estimate in the present study was based on only four items. Internal consistency for the Carelessness scale, however, was quite low (0t=.28). Again, this is not surprising given the fact that a substantial portion of respondents answered all of these items in the same way. Table 2 also contains information about the statistical knowledge items and 11 values. This information is discussed below. Statistical knowledge test The "easy", "moderate", and "difficult" tests were composed of 26, 25, and 24, items respectively. Item means, standard deviations, and intercorrelations can be found in Appendix C. Before IRT analyses were performed, the dimensionality of the items was assessed with a factor analysis of the interitem correlation matrix after that matrix was transformed 83 into a matrix of polychoric correlations. Table 3 contains the eigenvalues and percentage of variance accounted for all faCtors with eigenvalues greater than 1. 84 Table 3 actor analysis of knowledge test items FACTOR EIGENVALUE PCT OF VAR 1 11.10568 14.8 2 4.06301 5.4 3 3.88617 5.2 4 3.41616 4.6 5 3.14234 4.2 6 3.04511 4.1 7 2.92488 3.9 8 2.78527 3.7 9 2.65622 3.5 10 2.49907 3.3 11 2.38604 3.2 12 2.20712 2.9 13 2.16898 2.9 14 2.03637 2.7 15 1.97130 ' 2.6 16 1.89015 2.5 17 1.75348 2.3 18 1.65575 2.2 19 1.61516 2.2 20 1.57926 2.1 21 1.52589 2.0 22 1.45656 1.9 23 1.36599 1.8 24 1.30923 1.7 25 1.25342 1.7 26 1.19881 1.6 27 1.16056 1.5 28 1.08787 1.5 29 1.01588 1.4 85 The conclusion to be drawn with respect to dimensionality depends on the criterion that one uses. There are many factors with eigenvalues greater than 1. Given the range of knowledge tapped by the test and the range of item characteristics, this is not surprising. However, only one of the factors explains more than 5.4% of the test variance. Also, the first factor eigenvalue is almost three times the size of the next largest eigenvalue (11.11 vs. 4.06; Hulin et al., 1983). Given the latter two facts, IRT analysis was deemed appropriate. Item parameter estimates for the 75 items and 0-parameter estimates for the 165 respondents were generated with BILOG (Mislevy & Bock, 1979). This analysis suggested that five of the items (Nos. 25, 41, 46, 64, and 75) did not conform to the three-parameter IRT model. X2 values and degrees of freedom for these items were 8436.2 (aomxca “we 09:0 “woe unto—BOCJ :0 HOOP—Co D 3:09—an mo :Ommmvhflvm a 2an 104 Neither test anxiety nor its interaction with item characteristics significantly predicted knowledge test scores. Test anxiety explained only 1.4% of the between subjects variance, and its interaction with item characteristics explained less than 1% of the within-subjects variance. Thus, there was no support for Hypotheses 5 and 9. Difficulty-based item characteristics and 1, As with the analyses of knowledge test scores, the firSt step in all of the analyses of 1, involved the entering of the two dummy variables corresponding to the three levels of difficulty-based item characterstics. The R2 value associated with the regression of the within-subjects variance in 1, onto the two within-subjects dummy variables was .0005 (F3261), suggesting that difficulty-based item characteristics had no effect on 1, values. Conscientiousnesgrnd 1, The results with respect to the relationship between conscientiousness and 1, can be found in Table 10. 105 Gems n: _v V m. cam .m So. 5380:: m m2 4 «co. amoemaouefiomeov m wmm .m co. moumtaofiazu 8»: _ a 9.3 3% 865 9a mm .55 HHS. 6225» 8332835 Eu: Ea 3058:5628 See me commas: em 2 03mg. 106 Conscientiousness had a nonsignificant effect on 1, values (Fl.l63.05). Qapelessnessgnd 1, The results with respect to the relationship between conscientiousness and 1, can be found in Table 11. 107 Among: :gm AzgvmVxxus _V m. cmm.~ mmc. me... five. wwm.~ co. to 353:3 am 35298 «m .3: ..=_.a....m.. 3 5?. 82082:. 32.3225 35:23.20 Eu: 35:5 632.; dBm 85:23:28 Eu: 95 32.3228 25 ._ mo :232wom _ _ 035. 108 Carelessness had a significant effect on 1, values (F,.163=7.50; p<.01), accounting for 4.4% of the between subjects variance. Those respondents who reflected more carelessness had lower 1, values (i.e., provided response patterns with greater inappropriateness) than did those respondents who reflected less carelessness. The effect for the carelessness by item characteristics interaction was also significant (F232,,= 3.61;p_<.05). This effect accounted for 2.2% of the within-subjects variance after removal of the relevant main effects. Figure 12 contains a plot of this interaction. 109 L2 0.3_ p 0.2 ------------ i ........ , .................... .. 0-1 ""77. ‘ 5 f. , ---- i: ' if r . _. ‘. .53? Item Difficulty Figure 12. Flor of the effect on 1, of the interaction between carelessness and item characteristics 110 This interaction was also not of the form hypothesized (see Hypothesis 11). Those respondents who were higher in carelessness (i.e., respondents that had lower scores on the carelessness measure) displayed less inappropriateness as a function of item difficulty than did respondents lower in carelessness. _M_ath Anxiety and l, The results with respect to the relationship between conscientiousness and I, can be found in Table 12. 111 £75 3; 3.3 N? _V m can .m woo. cozofies _ m 32 ._ 3o. bores an: N wmm .m . 8. 323220820 82. _ Hm cog—98 um cues—axe am .6225 mam 3 Sam 3 633:3 ill 35:20.22”. Eu: use 3355 £3: 2:0 ._ Co 5321?;— N~ 03m... 112 Math anxiety did not have a significant effect on 1, values, accounting for less than 1% of the between-subjects variance (F1.153=1-32§ p>.05). The effect for the math anxiety by item characteristic interaction was also nonsignificant (F1326=1.28;p>.05). Thus, Hypothesis 12 was not supported. Test anxietv The results with respect to the relationship between conscientiousness and L can be found in Table 13. 113 —V _v _V ll-l cam .m moo. 52222:— M m2 ._ 8c. 3255 .5 . N mam .m co. 85:82:20 Eu: ~ “In. 3.5530 «m uoEm—mxo «m 35:5 3% mm .55 3 mafia» mountaommso Eu: can 50?. :a 33 2514 he 533.3% E 035,—. 114 Test anxiety also failed to produce a significant effect on 1., values, accounting for less than 1% of the between-subjects variance (F1.163=1'32; p_>.05). The effect for the test anxiety by item characteristic interaction (Hypothesis 13) was also nonsignificant (1713251283105). Table 14 summarizes the findings of the present study with respect to the Hypotheses presented on p.66. As can be seen, support was found for Hypotheses 1, 3, and 4, which dealt with the effects of item characteristics, carelessness, and math anxiety on knowledge test scores. None of the other Hypotheses were supported. l 15 Table 14 Summarv of hvpotheses and support Knowledge Test 12 Main Effects Interactions Interactions Hl-Item Characteristics Xa 112-Conscientiousness HES-Carelessness H4-Math Anxiety O N N O HS-Test Anxiety H6-Conscientiousness H7-Carelessness H8-Math Anxiety OOOO H9-Test Anxiety HID-Conscientiousness H1 l-Carelessness H12—Math Anxiety OOOO H13-Test Anxiety ‘ - X indicates support for the hypothesis while 0 indicates lack of support Discussion The purpose of the present study was to investigate the determinants of test inappropriateness. Specifically, I investigated the effects of difficulty-based item characteristics, math anxiety, test anxiety, conscientiousness, and carelessness on knowledge test scores and the 12 index of test inappropriateness. The following sections discuss the results of this study as they relate to the hypotheses that were presented on page 67. Hypothesis 1: Dfiifficultv—based item characteristics and test scores Difficulty-based item characteristics had a profound effect on knowledge test scores. Specifically, difficulty-based item characteristics accounted for 20% of the within-subjects variance in knowledge test scores. As hypothesized, respondents chose the correct response option less often for items with many difficulty-based item characteristics than they did for items with fewer difficulty-based item characteristics. These results suggest that the responses to test items are a function not only of the standing of the respondent on the construct of interest, but also of the format of the item. This finding is consistent with previous research (e.g., Hughes & Trimble, 1965; Dudycha & Carpenter, 1973). 116 117 vaothesis 2: Conscientiousness and test scores Conscientiousness was found to have little effect on test scores, accounting for less than 1% of the between-subjects variance. Although contrary to hypothesis 2 of the present Study, this is nor entirely inconsistent with past research on the criterion- related validity of the Big Five personality dimensions. Meta—analyses of the relationship between conscientiousness and work-related outcomes have generally yielded uncorrected validities of less than .15 (Barrick & Mount, 1992; Tett et al., 1992). vaothesis 3: Carelessness and test scores Carelessness was found to have a considerable effect on knowledge test scores, accounting for 21% of the between-subjeCts variance. As hypothesized, those respondents who exhibited a large degree of carelessness received lower test scores than did those who exhibited little or no carelessness. In fact, the mean knowledge scores ("easy" items, "moderate" items, and "difficult" items) for those respondents who answered all six carelessness items correctly were .62, .60, and .53 respectively whereas the scores for those respondents who missed at least one of the carelessness items were .50, .47, .39. This suggests that responses to test items were due not only to the standing of the respondent on the construct of interest, but also to the carelessness of the respondent at the time of test administration. In Other words, there was evidence of Type P inappropriateness due to an effect for carelessness. One possible alternative explanation for this finding is that the carelessness items used in the present study were in fact Statistical knowledge items and, therefore, should be related to knowledge scores. While this is a possibility, it should be noted 118 that each of the items used in the carelessness scale were items which were answered correctly by all Students who had responded to the item in previous tests in which those items had been used. Participants in the present study were still taking an Introductory statistics course at the time of testing, and the material contained in the carelessness items was material that had been covered in the class. So, while the content of the carelessness items was knowledge-related, the only viable cause of variance on the items (given that they are administered to people familiar with the rubric of statistics) was carelessness. vaothesis 4: Math anxiety and test scores Math anxiety was found to have a significant effect on knowledge test scores, accounting for 6% of the between-subjects variance. As hypothesized, respondents higher in math anxiety had lower test scores than did respondents lower in math anxiety. This suggests that there was also evidence of Type P inappropriateness due‘ to an effect for math anxiety. Hypothesis 5: Test anxieg and test scores T esr anxiety was found to have little effect on test scores, accounting for less than 1% of the between-subjects variance. One explanation for this result is that the experimental testing situation lacked those aspects of real testing situations which lead to test anxiety. This possibility is discussed in more detail below. Hypothesis 6: Conscientiousness by item characteristic interaction and test scores Hypothesis 6 was not supported by the data. The conscientiousness by item characteristic interaction did contribute significantly to the prediction of test scores, but the form of the interaction was not as expected. It was hypothesized that 119 respondents low in conscientiousness would be more adversely affected by difficulty- based item characterisrics than would respondents high in conscientiousness. Instead, the opposite was found. As can be seen in Figure 11, however, the extent of the interaction is slight and may have been due only to chance. va0thesis 7: Carelessness bv item characteristic interaction and test scores Hypothesis 7 was not supported by the data. The carelessness by item characteristic interaction did not contribute significantly to the prediction of test scores. In other words, there was no evidence of Type I inappropriateness resulting from an interaction between carelessness and item characteristics. Hyporhesis 8: Math anxietv by item characteristic interaction and test scores . Hypothesis 8 was not supported by the data. The math anxiety by item characteristic interaction did not contribute significantly to the prediction of test scores. In other words, there was no evidence of Type I inappropriateness resulting from an interaction between math anxiety and item characteristics. Hypoghesis 9: Test gnxiety by item characteristic interagtion and test scores Hypothesis 9 was not supported by the data. The test anxiety by item characteristic interaction did not contribute significantly to the prediction of test scores. In other words, there was no evidence of Type I inappropriateness resulting from an interaction between math anxiety and item characteristics. As with the main effects, one explanation for this result is that the experimental testing situation lacked those aspects of real testing situations which lead to test anxiety. 120 Hypothesis 10: Conscientiousness by item characteristic interaction and l, Hypothesis 10 was not supported by the data. The conscientiousness by item characteristic interaction did not contribute significantly to the prediction of 1,. Although this is contrary to the hypothesis of the present study. it is not surprising given the lack of effect for this interaction on test scores. The lack of effect for the interaction on test scores suggests that there was little in the way of Type I inappropriateness (at least as caused by the conscientiousness by item characteristic interaction), therefore, any variance in L, was due to other factors or chance. Hyp0thesis 11: Carelessness byitem characteristic interarction and L Hypothesis 11 was not supported by the data. Although the carelessness by item characteristic interaction did contribute significantly to the prediction of 11, the interaction was not of the form expected. It was hypothesized that respondents higher in carelessness would have higher 1., values only for the tests composed of items with difficulty-based item characteristics. Instead, respondents higher in carelessness displayed less of an item characteristic-lz effect. So, while 12 was predicred by this interaction, it was not predicted in a way that was consistent with Hypothesis 11. Hypothesis 12: Math anxiety by item characteristg: interaction M Hyporhesis 12 was not supported by the data. The math anxiety by item characreristic interaction did not contribute significantly to the prediction of 1,. As with conscientiousness, however, the lack of effect for this interaction on test scores suggests that there was little in the way of Type I inappropriateness (at least as caused by the math anxiety by item characteristic interaction), therefore, any variance in 11 was due to other factors or chance. 121 Hypothesis 13: Test anxietv bv item characteristic inmcdon and l, Hypothesis 13 was n0t supported by the data. The test anxiety by item characteristic interaction did not contribute significantly to the prediction of test scores. In other words. there was no evidence of Type I inappropriateness resulting from an interaction between test anxiety and item characteristics. As with the main effects, one explanation for this result is that the experimental testing situation lacked those aspects of real testing situations which lead to test anxiety. Implications and conchrsions The results of the present study have several implications for the interpretation of ability and knowledge test scores. First, item characteristics, in particular difficulty- based item characteristics, can have a profound effect on test scores. As was suggested in the Introduction, these item characteristics force the respondent to use abilities other than those related to the osrensible content of the items to arrive at the correct answer. Insofar as this is the case, the items which possess these characteristics are measuring constructs Other than those that they are intended to measure. The second implication for test scores has to do with carelessness. The present study found that carelessness had a considerable relationship with test scores. Specifically, the results suggest that those respondents who perform poorly on carelessness items answer far fewer items correctly than do those respondents who perform well on carelessness items. This is a particularly important issue for testing situations such as those encountered in concurrent, criterion-related validity studies in which respondents are existing employees who have nothing to gain from performing 122 well on the selection tests. If there is a significant number of respondents who are careless in their responding, then estimates of validity may be adversely affected. Also, if incumbent test scores are to be used in some fashion to set cutoffs for job applicants, carelessness may lead to cutoffs that are too low. In short, the results of the present study suggest that, in those situations where there are few or no formal rewards for test performance, test results can be interpreted in light of the effects of carelessness. A third implication with respect to test scores is that math anxiety may have a considerable impact on math test scores, specifically, those respondents higher in math anxiety can be expected to answer fewer math knowledge questions correctly than respondents lower in math anxiety. The exact nature of this relationship, however, is not entirely clear. As with general test anxiety, a person may be high in math anxiety simply because of a lack of knowledge or ability (Naveh-Benjamin et al., 1987), in which case it would appear that the lack of knowledge is causing both the math anxiety and the lack of math performance. If, on the other hand, a respondent is high in math anxiety but does not lack the requisite knowledge or ability for a given task, then it is more likely that performance is determined by both anxiety and ability. More research is needed which isolates these factors so that their independent contributions to test performance can be identified. To summarize, the results of the present study suggest that difficulty-based item characteristics such as response option complexity and negative wording and personality characteristics such as carelessness and math anxiety can have a substantial effect on math test scores. Future research should investigate the specific testing 123 situations in which these effects hold. Also. the present study focused only on Statistical knowledge tesrs. which is an example of a test of maximum performance. Future research should endeavor to discover how these factors affect responses to tests of typical performance. The present study also has implications for 12. Interactions between item characteristics and each of the four personality variables were predicted, but none were found to be significant and in the hypothesized direction. One explanation for these findings is described in the Limitations section of this paper, namely, that a lack of external motivators in the testing situation led to uninterpretable results with respect to test scores. If this was the case, then 12 might also be uninterpretable. Another possible explanation is that 12 simply does not reflect interactions between item characteristics and respondent characteristics. Instead, It might simply reflect nonsystematic response tendencies that, by definition, cannot be predicted. Before we accept this explanation, however, the relationship between 12 and interactions between item and respondent characteristics must be studied in testing situations which contain the external motivators that were missing in the present study. Only then can firm conclusions be drawn with respect to the hypotheses suggested in this paper. Limitations As was mentioned earlier, the most plausible explanation for the lack of hypothesized effects in the present study was that the testing situation lacked the external motivators present in most real-world, evaluation-oriented testing situations. Although efforts were made to encourage diligence on the part of the respondents, 124 grades, job opportunities. and orher outcomes which can drive performance were not in any way contingent upon performance on the knowledge test. This lack of external morivators may have led to a failure to elicit reaCtions to the testing situation that would have been present if knowledge test performance were somehow tied to rewards. For example, most of the previous research on test anxiety has examined test anxiety in the context of an actual testing situation (i.e., research was not the only function served by the administration of the test). It may be that respondents who have an involuntary anxiety reaction to these true testing situations have little or no such reaction to a testing situation which lacks important consequences. This lack of external motivators may also have led to a uniformly low level of effort on the knowledge test itself. Effects for variables such as conscientiousness might have been washed out by such a general lack of effort. There are two final points to be made with respect to the issue of the role of external motivators in testing. First, while the testing situation used in the present study was unlike many real-world testing situations, it shared many of the characteristics of the typical concurrent validation study. In concurrent validation, as in the present study, participants respond to test items knowing that rewards are not contingent upon test performance. In fact, many such participants view the collection of concurrent data as a waste of their time. More research must be conducted which compares the effects of anxiety, conscientiousness, carelessness, and their interactions with item characteristics on item responses in situations which contain typical external motivators versus situations which do not contain such motivators. Such research 125 could shed more light on the nature of these predictors as well as on the nature of inappropriateness. The second point to be made is related to the issue of the presence or absence of external modvators. It may be that a test such as the statisrical knowledge test used in the present study simply isn’t a valid measure of the construct of interest in a testing situation such as that used in the present study. It was suggested earlier that test appropriateness is an issue only for a test for which evidence of validity has been provided. If a certain level of effort is a prerequisite for the validity of a test, and if testing situations such as that used in the present study do not foster that level of effort, then perhaps appropriateness is n0t an issue in such situations. This is a question that the comparative research suggested above could address. A final limitation of the present study is that it tested only part of the model presented on page 62. The relationships involving acquiescence, need for approval, field articulation, response sets, test wiseness, and omissiveness were not examined. The relationships involving acquiescence and need for approval are particularly promising for test of typical performance such as personality tests. As was discussed in the Introduction of this paper, acquiescence and need for approval may interact with item characteristics to affect test scores, and this interaction may be detectable with 1,. Likewise, field articulation and test wiseness show promise for tests measuring maximum performance. These factors may also interact with item characteristics to affect test scores, with the interaction being detected by 1,. As with the relationships examined in the present study, these relationships should be investigated in a testing situation which provides the external motivators typical of real—world testing situations. LIST OF REFERENCES References Allport, G.W. (1928). A test for ascendance-submission. JournJaI of Abnormfl Psychology, 2;, 118-136. Anastasi, A. (1988). Psychological Tesggwth ed.) New York: Macmillan. Anderson, KJ. (1990). Arousal and the inverted-U hypothesis: A critique of Neiss’s "Reconceptualizing arousal". Psychological Bulletin, _1_Q’_7_, 96-100. Benjamin, M., McKeachie, W.J., Lin, Y., & Holinger, DP. (1981). Test anxiety: deficits in information processing. Journal of Educational Psychology, fl, 816-824. Berg, I.A., & Collier, J.S. (1953). Personality and group differences in extreme response sets. Educational and Psychological Measurement, _1_3_, 164—169. Birenbaum, M. (1985). Comparing the effectiveness of several IRT-based appropriateness measures in detecting unusual response patterns. Educational and Psychological Measurement, _4_5_, 523-535. Block, J. (1965). The Challenge of Response Sets: Unconfounding MeaningI _qupiescence, gig! SocialDes'pability in fie MMPI. New York: Irvington. Bock, R.D., Dicker, C., & VanPelt, J. (1969). Methodological implications of content-acquiescence correlation in the MMPI. Psychological Bulletin, 11, 127-139. Boring, E.G. (1950). " A History of Experimental Psychology (rev. ed.). New York: Appleton, Century, Crofts. 126 127 Broverrnan, D.M., Klaiber, E.L., Kobayashi, Y., & Vogel, W. (1968). Roles of activation and inhibition in sex differences in cognitive ability. Psvchological Review, _7_5_, 23-50. Cady, V.M. (1923). The estimation of juvenile incorrigibility. Journal of Delinquency Monograph, No.2. Campeau, PL. (1968). Test anxiety and feedback in programmed instruction. JLurnal of Educational Psychology, 2, 159-163. Cohen, J ., & Cohen, P. (1983). Applied Mrmle RegressionLCorrelation AnalLsis for the Behaviml Sciences. Hillsdale, NJ: Erlbaum. Cortina, J.M. (1993). What is coefficient alpha?: An examination of theory and application. Journal of Applied Psychology, _7_8_, 98-104. Costa, P.T., & McCrae, RR. (1988). From catalog to classification: Murray’s needs and the five-factor model. Jo_urnal of Personality aird Socia_l Psychology, fl, 258-265. Couch, A., & Keniston, K. (1960). Yeasayers and naysayers: agreeing response set as a personality variable. Journal of Abnormal Social Psycholggy, 69, 151-174. Cronbach, L]. (1946). Response sets and test validity. Educational and Psmhological Measarement. 6, 475-494. Cronbach, L.J. (1950). Further evidence on response sets and test design. Educational and Psychological Megurement, _1_O_, 3-31. 128 Crowne, D.P., & Marlowe, D. (1964). The Approval Motive: 8w Evaluative Dependence. New York: Wiley.Jackson, D.N., & Messick, S. (1958). content and style in personality assessment. Psvcholigcal Bulletin, 55, 243-252. Damarin, F. (1970). A latent structure model for answering personal questions. Psychological Bulletin, E, 23-40. Dolly, J.P., & Williams, KS. (1986). Using test taking strategies to maximize multiple choice test scores. Educational and Psychological Measuremeng, _4_6_, 619-625. Donlon, T.F., & Fischer, FE. (1968). An index of an individual’s agreement with group-determined item difficulties. Educational and Psychologi_cal_ Measuremeng, 28, 105-113. Drasgow, R, Levine, M.V., & Williams, EA. (1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Joamal of Mghematical and Statistics Psychology, 3, 67- 86. Drasgow, R, Levine, M.V., & McLaughlin, ME. (1991). Appropriateness measurement for some multidimensional test batteries. Applie_d Psychological Measuremeng, _1_5_, 171-191. Drasgow, R, Levine, M.V., Williams, B., McLaughlin, M.E., & Candell, CL. (1989). Modeling incorrect responses to multiple-choice items with multilinear formula score theory. Applied Psychological Measurement, _l_3_, 285-299. 129 Dreger, R.G., & Aiken, LR. (1957). The identification of number anxiety in a college population. Journal of Educational Psychology, fl, 344-351. DuBois, PS. (1966). A test-dominated society: China 1115 BC. - 1905 A.D. In A. Anastasi (Ed.), TeSting Problems in Perspective (pp. 29-36). Washington, DC: American Council on Education. Dudycha, A.L., & Carpenter, J .B. (1973). Effects of item format on item discrimination and difficulty. Journal of Applied Psychology, 58, 116-121. Edwards, AL. (1957). The SocijlfiDesirability Vapiable in Personality Assesement and Research. New York: Dryden. Endler, N.S., & Hunt, J.M. (1966). Sources of behavioral variance as measured by the S-R inventory of anxiousness. Psychological Bulletin, 6;, 336-346. Fagley, N.S. (1987). Positional response bias in multiple-choice tests of learning: its relationship to test wiseness and guessing strategies. Journal of Educational Psychology, 19, 95-98. Feldman, J .M. & Lynch, J .G. (1988). Self-generated validity and other effects of measurement on belief, attitude, intention, and behavior. Journal of Amalied Psychology, fl, 421-435. Forehand, G.A. (1962). Relationships among response sets and cognitive behaviors. Educational and PsvchologipaLMeasprement, Q, 287-302. Gaier, E.L., Lee, M.C., & McQuitty, LL. (1953). Response patterns in a test of logical inference. Educational and Psychological Measurement, _1_3_, 550- 567. 130 Gehman, W.S. (1957). A study of ability to fake scores on the Strong Vocational Interest Blank for men. Educational and Psychological Measurement, pl], 65-70. Gibb, B.G. (1964). Test Wiseness as Secondary Cue Response. Unpublished doctoral dissertation, Stanford University. Goodenough, DR. (1976). The role of individual differences in field dependence as a factor in learning and memory. Psychological Bulletin. 8_3, 675-694. Green, RF. (1951). Does a selection situation induce testees to bias their answers on interest and temperament tests? Educational and Psychological Measuremeng 1_l_, 503-515. Hamisch, D.L., & Linn, R.L. (1981). Analysis of item response patterns: Questionable test data and dissimilar curriculum practices. Journal of MOE Measurement, _1_8_, 133-146. Harper, F.B.W. (1974). The comparative validity of the Mandler-Sarason Test Anxiety questionnaire and the Achievement Anxiety Test. Educational and Psychological Measuremeng, 34, 961-966. Hartshome, H., & May, M.A. (1928). Studies in Deceit. New York: Macmillan. Henry, E.M., & Rotter, J .B. (1956). Situational influences on Rorschach responses. JournJaLof Consaldngjswchology, 20, 457-462. Herrmann, DJ. (1982). Know thy memory: The use of questionnaires to assess and study memory. Psycholpggical Bulletin, 22, 434-452. 131 Hollenbeck, J. R., Ilgen, D.W., & Sego, D. (in press). Repeated measures regression and mediational tests: Enhancing the power of leadership research. Leadership Quarterly. Hothersall, D. (1990). History of Psycholggy (2nd Ed.) New York: McGraw- Hill. Hough, L.M., Eaton, N.K. Dunnette, M.D., Kamp, J.D., & McCloy, RA. (1990). Criterion-related validities of personality constructs and the effect of response distortion on those validities. Journal of Applied Psychology, Z_5_, 581-595. Hughes, H..,H & Trimble, W.E. (1965). The use of complex alternatives in multiple choice items. Educational and PsychologicalkMeasuremeng, _2__5_, 117-126. Humm, D.G., & Humm, K.A. (1944). Validity of the Humm-Wadsworth temperament scale: with consideration of the effects of subject’s response- bias. Journal of Psychology, L8, 55-64. Humm, D.G., Storment, R.C., & Ioms, ME. (1939). Combination scores for the Humm-Wadsworth temperament scale. Journal of Psychology, 1, 227-253. Jackson, D.N. & Messick, S. (1958). Content and style in personality assessment Psychological Bulletin, _5_5_, 245-252. Jackson, D.N. & Messick, S. (1965). Acquiescence: the nonvanishing variance component. American Psychologiit, 20, 498-502. 132 Kingston, A.J., George, C.E.. & Ewens, WP. (1956). Determining the relationship between individual interest profiles and occupational forms. Journal of Educational PsvchOIOgy, _4_7_, 310-316. Klein, G.S., Barr, H.C., & Wolitzky, DC. (1967). Personality. Annual Review of PsychOIOgy, _l_8_, 467-560. Lawrence, P]. (1957). Some characteristics of incorrect responses to intelligence test items. Australian Journal of Psvchology, 2, 1-11. Levine, M., & Rubin, D. (1979). Measuring the appropriateness of multiple. choice test scores. Journal of Educational Statisfiti_c_s, 4, 269-290. Lorge, I., & Diamond, L.K. (1954). The prediction of absolute item difficulty by ranking and estimating techniques. Educational and Psycholofial Measurement, l4, 2-10. Mandler, G. & Sarason, SB. (1952). A study of anxiety and learning. 1041M. of Abnormal and Social Psychology, 41, 166-173. Messick, S. (1966). The Psychology of Acguiescence: A Interpretation of Research Evidence. Educational Testing Sevice: Princeton, NJ, April. Metfessel, N.S., & Sax, G. (1957). Response set patterns in published Intructor’s Manuals in education and psychology. California Journal of Educational Research, 8, 195-197. Metfessel, N.S., & Sax, G. (1958). Systematic biases in the keying of correct responses on certain standardized tests. Educational and Psycholigcal Measurement 18, 787-790. 133 Millman, J., Bishop, C.H., & Ebel, R. (1965). An analysis of test-wiseness. Egational and Psychological Measuremeat, 25; 707-726. Naveh-Benjamin, M., McKeachie, W.J., & Lin, Y.G. (1987). Two types of test anxious students: support for an information processing model. Journal of _E_d_rLcational Psychology, 22, 131-136. Naveh-Benjamin, M. (1991). A comparison of training programs intended for different types of test-anxious students: Further support for the information processing model. Journal of Educational Psychology, 83;, 134-139. Paulman, R.G., & Kennally, K.J. (1984). Test anxiety and ineffective test takers: Different names, same construct? Lo_u:rnal of Educational Psvchology, 16, 279-288. Peabody, D. (1966). Authoritarianism scales and response bias. Psychological Bulletin, as, 11-23. Peterson, R.A. (1961). A technique for the detection of blind checking in questionnaire research. Educational and Psychological Measuremeng, _2_1_, 361-362. Rapaport, G.M., & Berg, LA. (1955). Response sets in a multiple-choice test. Educational and Psychological Measuremeng, _12, 58-62. Rasch, G. (1960). Probabilistic Models for Some Intelligencefiand AW _T_e_s_t_s_. Kobenhavn: Danmarks Paedagogiske Institut. (Reprinted by University of Chicago Press, 1980). Reise, SP (1990). A comparison of item and person-fit methods of assessing model-data fit in IRT. Applied Psychological Measurement, 14, 127-137. 134 Rorer, LG. (1965). The great response style myth. Psychological Bulletin, 63, 129-156. Rosenberg, N., Izard, C.E., & Hollander, ER (1955). Middle category response: reliability and relationship to personality and intelligence variables. Educgional and Psychological Measurement, L5, 281-290. Rosenzweig, S. (1934). A suggesdon for making verbal personality test more valid. Psychological Review, _4_1_, 400-401. Ruch, EL. (1942). A technique for detecting attempts to fake performance on a self-inventory type of personality test. In Q. McNemar and M.A. Merrill, §£ufdies in Personm. (pp. 229-234), New York: McGraw-Hill. Rudner, L.M. (1983). Individual assessment accuracy. Journal of Educational Mea§_urement, 29, 207-219. Sarnacki, RE, (1979). An examination of testwiseness in the cognitive test domain. Review of Educational Reseaich, a, 252-279. ' Sato, T. (1975). The Construction and Interpreta_tion of S-P Tables. Meiji Toosho. Satterly, D.J., (1976). Cognitive styles, spatial ability, and school achievement Journal; of Educational Psychology, _6_8, 36-42. Saupe, IL. (1960). An empirical model for the corroboration of suspected cheating on multiple-choice tests. Educational and Ps'Lchologic_al_ Measurement. 29, 475-489. 135 Schrnitt, N., Gooding, R.Z., Noe, N.A., & Kirsch, M. (1984). Metaanalyses of validity Studies between 1964 and 1982 and the investigation of study characteristics. Personnel Psychology, 31, 407-422. Schuman, H., & Kalton, G. (1985). Survey methods. In G. Lindzey and E. Aronson (Eds) Handbook of Social Psychology. Hillsdale, NJ: Erlbaum. Tatsuoka, K.K. & Tatsuoka, M.M. (1980). Detection of abem response patterns and their effect on dimensionality. Research Report 80-4-ONR Urbana, 11: University of Illinois, Computer based Education Research Laboratory. Tatsuoka, K.K. & Tatsuoka, M.M. (1982). Detection of aberrant response patterns. Journal of Educational Statistics, '_7_, 215-231. Tobias, S. (1979). Anxiety research in educational psychology. Journal of Educational Psychology, fl, 573-582. vanderFlier, H. (1977). Environmental factors and deviant response patterns. In Y.H. Poortinga (Ed.) Basic Problem_s in Cross Cultural Psychology. Amsterdam: Swets & Seitlinger, B.V. Wahsltrom, M., & Boersma, RI. (1968). The influence of test-wiseness upon achievement. Erin—cational and Psychological Measurement, ga, 413-420. Waterhouse, I.K. & Child, LL. (1953). Frustration and the quality of performance: An experimental study. 103ml of Personality, 22, 298-311. Watson, D, & Clark, LA. (1984). Negative affectivity: the disposition to experience negative emotional states. Psychological Bulletin, _9_6_, 465-490. 136 Whitcomb, M.A., & Travers, R.M.W. (1957). A study of within-test learning functions as a determinant of total score. Educationagnd Psychological Measurement. _l_7_, 86-97. Wiggins, J.S. (1962). Strategic, method, and stylistic variance in the MMPI. Psychological Bulletin, 52, 224-242. Witkin, HA. (1974). COgnitiye Styles. Essence and Origins: Field Dependence and Field Indflendence. New York: International Universities Press. Wright, B.D., & Panchapakesan, NA. (1969). A procedure for sample free item analysis. @cational and Psychological Measgment, 22, 23-48. Zimmerman, W.S. (1954). The influence of item complexity upon the factor composition of a spatial visualization test. Educational and Psychological Measprement, E, 106-119. I LIST OF APPENDICES APPENDIX A Personality measures Conscientiousness Please use the following scale to answer questions 1 - 4: A - Strongly disagree B - Disagree C - Neutral D - Agree E - Strongly Agree Please mark your answers on the bubble sheet that was provided. I try to perform all of the tasks assigned to me conscientiously. When I make a commitment, I can always be counted on to follow through. I am a productive person who always gets the job done. I strive for excellence in everything I do. Test Anxiety Please use the following scale to answer questions 5 - 15: A- The statement does not describe my present condition B - The condition is barely noticeable C - The condition is moderate D - The condition is strong E - The condition is very strong; the statement describes my present condition well. I feel my heart beating fast. I feel regretful. I am so tense that my stomach is upset. I am concerned about others in the class seeing the results of this test. I have an uneasy, upset feeling. I feel that others will be disappointed in me. 137 138 I am nervous. I feel I may not do as well on this test as I could. I feel panicky. I do not feel very confident about my performance on this test. Math Anxiety When my stats professor (or any math professor) asks questions to find out how much we know about a particular mathematical concept or approach, I worry that I will do poorly in the class. When my stats professor is showing the class how to do a particular problem, I worry that the other students in the class migh understand the problem better than I do. When I am in stats (or any math class), I usually feel relaxed and at ease. When I am taking a math test, I usually feel relaxed and at ease. Taking math tests scares me. I dread having to do math. The thought of taking a more advanced stats class (e. g., Psych 302, 304, or required stats courses in graduate school) scares me. In general, I worry about how well I am doing in school. If I miss a given day of stats class, I worry that I will be behind the other students when I come back. In general, I worry about how well I am doing in math. Compared to other subjects, I worry a great deal about how well I am doing in math. Carelessness Item In a one-way ANOVA, if the degrees of freedom between groups is equal to the number of groups minus 1, and there are 4 groups, what would the degrees of freedom between groups be? 139 Which of the following does ANOVA stand for? a. ANalysis Of VAriance b. Standard Deviation c. Correlation d. Repeated Measures Designs The two types of errors that one can make in the sort of hypothesis testing exemplified by the above scenario are a. Type I and Type II b. Type III and Type IV c. Type V and Type VI (1. Type VII and Type VIII Which of the following are within the range of possible correlations? a. -.50 b. .40 c. .50 d. all of the above (This question does not apply to the scenario that the previous questions applied to or any other particular scenario) In general, which of the following are within the range of possible z-scores? a. 1.0 b. 2.0 c. -1.0 d. all of the above Another word for the average is a. mean b. variance c. sample (1. population APPENDD( B Statistical knowledge items For questions 1 - 75, use the following scenario. Suppose I am interested in examining the effect of the number of times one watches "Jeopardy" gag the number of times one watches "Wheel of Fortune" on the number of Trivial Pursuit questions that one can answer. In an attempt to do this, I assign 30 people to one of two levels of Jeopardy-watching and one of three levels of Wheel of Fortune- watching. The data look like this. JEOPARDY WATCHING NONE 10 TIMES 58 45 75 51 NONE 68 75 37 72 27 69 265 312 WHEEL OF FORTUNE 61 55 WATCHING 41 58 10 55 42 TIMES 50 68 47 32 254 255 75 31 65 31 20 50 46 TIMES 35 32 30 40 255 180 140 141 (Diff) 1.) If each subject had received every level of one of the variables in the "Trivial Pursuit" experiment, I could nor use a. a three-way ANOVA b. a Chi-squared test c. a one-way ANOVA d. all of the above (Mod) 2.) If I had simply wanted to compare the number of people who watch JeOpardy to the number of people who watch Wheel of Fortune, 1 would have to use which of the following? a. a three-way ANOVA b. a Chi-squared test c. a one-way ANOVA d. any of the above For questions 3-8, use the following scenario. Suppose I am interested in the effects of the number of Jagerrneister shots that one does on the number of questions that one can answer about analysis of variance. In an attempt to assess these effects, I assign 15 people to one of three groups so that there are 5 people in each group. GROUP 1 gets no Jager, GROUP 2 gets 3 shots of Jager, and GROUP 3 gets 8 shots of Jager. The data look like this: GROUP 1 GROUP 2 GROUP 3 3 2 6 7 7 8 3 8 8 1 2 7 0 4 8 X2 = 482 (Diff) 3.) A statement that does not represent the null hypothesis for this test is r = 2 = 3 r = 2 r = 2 = 3 both a and b 9.0g!» (Diff) 4.) The degrees of freedom Between, Within, and Total for this test are not a. 3, 11, 14 b. 2, 13, 15 c. 2, 12, 14 d. both a and b 142 (Mod) 5.) What are the Sums of Squares Between, Within, and Total? a. 58.8, 61.2, 120 b. 53.73. -63.2. 116.93 c. 53.73, 63.2, 116.93 d. none of the above (Mod) 6.) If the Sums of Squares Between and Within are 53.73 and 63.2 respectively ( they aren’t necessarily), what are the Mean Squares Between and Within? a. 17.91, 5.74 b. 26.86, 5.27 c. 3.58. 4.21 d. none of the above (Mod) 7.) Which of the following is true of the F-ratio? a. F“, = 5.10 b. F112 = 5.10 c. F,”12 = .25 (1. none of the above (Mod) 8.) With which of the following levels of significance would you reject the null? a. .05 b. .01 c. .005 (1. both a and c (Easy) 9.) A Mean Square is a form of which of the following? a. standard error b. variance c. standard deviation (1. covariance (Easy) 10.) Which of the following represents a factorial design? _ a. Each level of one variable is paired with one level of every other variable b. Each variable is paired with every other variable c. One variable is paired with every level of one other variable (1. Each level of every variable is paired with each level of every other variable 143 For questions 11-12, use the following scenario Suppose I am interested in assessing the effects of job satisfaction (1V1) and salary (1V2) on job performance (DV). So, I collect data on these three variables for 100 people and correlate the variables. The correlation between satisfaction and performance is .23, the correlation between salary and performance is .45, and the correlation between salary and satisfaction is .42. (Mod) 11.) Which of the following is the type of regression analysis that would I use to assess the effects of job satisfaction and salary on job performance? a. multiple regression b. repeated measures regression c. simple regression d. any of them depending on the nature of the variables (Mod) 12.) Which of the following is the regression equation? a Y = .OSXl + .43X2 + a b. Y = .07Xl + .6lX2 c. Zy = .05Zl + .4322 d. none of the above (Easy) 13.) In general, the F-ratio will be large if which of the following is true? a. If the variance within the groups is larger than the variance between the groups b. If the variance within the groups is equal to the variance between the groups c. If the variance within the groups is less than the variance between the groups (1. If there is no variance between the groups (Easy) 14.) Which of the following best represents the null hypothesis for a one-way ANOVA with three groups? a- " 2:0 r' 3:0 2' 3:0 = 0 l l- 1 1 b. c. d. 2 2 3 (Easy) 15.) Which of the following is true of ANOVA? a. It is relatively insensitive to violations of the normality assumption but not the homogeneity assumption b. It is relatively insensitive to violations of the homogeneity assumption but not the normality assumption ' c. It is relatively insensitive to violations of both of its assumptions d. It is greatly affected by violations of either of its assumptions 144 (Easy) 16.) Which of the following has to be true in order for the Central Limit Theorem to be applicable? a. the samples come from the same population b. the samples come from different populations c. the sample means are equal d. the population variances are different (Easy) 17.) If I wanted to examine the effects in an ANOVA further, which of the following should I use? a. a second ANOVA b. an acid test 0. urinalysis d. multiple comparison procedures For questions 18 - 19, use the following scenario. Suppose I am interested in knowing the difference in ACT scores between men and women. In an attempt to investigate this, I draw a sample of 16 men and a sample of 16 women. The mean of the ACT scores for the men is 18.6 and the mean for the women is 20.4. The variances of the individual scores are 6.1 and 8.9 respectively. Suppose further that the standard error for the differences between means is .97. (Mod) 18.) Calculate a t-score for the difference between these two means. (When using the t-score formula, be sure to put the means in the order that they were presented) a. 1.85 b. -1.85 c. -l.8 d. none of the above (Diff) 19.) I would fail to reject the null with a. a one-tailed test that examines the lower 5% of the distribution b. a two-tailed test with a significance level of .05. c. a two-tailed test with a significance level of .10. d. both a and b 145 Use the following scenario to answer quesrions 20 - 21 Suppose I am interested in knowing the difference in ACT scores between men and women. In an attempt to investigate this, I draw five samples of men with 12 in each sample and 5 samples of women with 12 in each sample. For each pair of samples, I record the mean ACT score for men, the mean ACT score for women, and the difference between the means so that my data look like this ACT (MEN) ACT (W OMEN) DIFFERENCE X11: 18 X21: 24 X11- X21: '6 x,,=22 x3=22 x,,-x,,= 0 X,, = 17 x,, = 23 X1, - X2, = -6 x,, = 20 X, = 26 x,, - X, = -6 X15: 16 X25=27 X’s-X25='11 X,l = 18.6 xx2 = 24.4 52,2 = 5.8 32,2 = 4.3 (Mod) 20.) If we draw another sample of 15 men and 15 women, and their mean ACT scores are 17 and 22 respectively, and the standard error of the differences between means is 3.18, what is the t-score for the difference between these two means? a. 1.73 b. -1.57 c. .49 d. none of the above (Mod) 21.) What would the effect on the power of my t-test be if I doubled my sample size? a. My power would increase b. My power would decrease c. My power would be unaffected (1. Either a or b depending on the situation For questions 22 - 24, use the following scenario. Suppose I am interested in knowing the difference in armpit hair between men and women. In an attempt to investigate this, I draw a sample of 21 men and a sample of 21 women. The mean for pithair for men is 4.6 and the mean for women is 6.6. The variances of the individual scores within these samples are 7.1 and 7.7 respectively. 146 (Diff) 22.) The standard error of differences is not a. .70 b. .84 c. 14.8 d. This applies to both a and c (Mod) 23.) Calculate the t-score for the difference between these two means. a. -2.38 ° b. -2.0 c. 2.0 d. none of the above (Mod) 24.) With which of the following would I fail to reject the null? a. a one-tailed test that examines the upper 5% b. a one-tailed test that examines the lower 1% c. a two-tailed test with a significance level of .05 d. both a and b For questions 25 - 26, use the following scenario Suppose I want to estimate the average number of hours per day that Dan Quayle spends playing with his Legos. In order to do this, I take a sample of 12 days and record the number of hours that he spends playing with his Legos on each of those days. They are as follows: 4, 3.5, 6, 7, 3.5, 4.5, 6.5, 8, 1.5, 2.5, 6, 5. The standard deviation of this set of scores is 1.93. (Mod) 25.) What is the 95% confidence interval around the mean? (Hint: You must use the value for a two-tailed test) a. .58 < < 9.08 b. 3.6 < < 6.06 c. 3.09 < < 6.57 d. none of the above f 147 (Mod) 26.) What is the 99% confidence interval around the mean? (Hint: You must use the value for a two-tailed test) a. .58 < < 9.08 b. 3.6 < < 6.06 c. 3.09 < < 6.57 (1. none of the above (Easy) 27.) What is a confidence interval? a. It is a range of scores within which we expect the population mean to fall. b. It is the population mean c. It is a range of scores within which we expect the sample mean to fall. d. It is our two best estimates of the population mean. (Diff) 28.) Suppose I was interested in knowing the relationship between goal difficulty and goal commitment (a fairly common topic in my field). In an attempt to investigate this relationship, I draw a sample of 15 people, give them a goal for a task, and get the correlations between goal difficulty and commitment. The correlation from these 15 people is -.48. The t-score for this correlation would not be a. -1.73 b. 1.73 c -1.97 d. b and c For questions 29 - 33, use the following scenario Suppose I know that the mean of the population of grade point averages at MSU is 2.35. Suppose further that I get a sample of GPA’S from an unknown source, and the scores are: 3.3, 3.6, 2.2, 2.4, 3.1, 1.6, 1.4, 2.9, 3.9, 2.5, 3.4, 3.3. The mean of these scores is 2.8 and the standard deviation is .79. (Diff) 29.) The t-score for this mean is not a. .45 b. .57 c. -1.97 d. all of the above (Mod) 30.) What would the degrees of freedom for this t-test be? a. 12 b. 11 c. 132 d. either a or b depending on the situation 148 (Diff) 31.) We would fail to reject the null with a. a one-tailed test which examines the lower 5% of the distribution b. a two-tailed test with a level of significance of .05 c. a two-tailed test with a level of significance of .01 d. all of the above (Mod) 32.) Which of the following would happen if the sample size were doubled? a. The t-score that I calculate would be larger and the score that I use from the Table would be smaller b. The t-score that I calculate would be smaller and the score that I use from the Table would be larger. c. The t-score would be larger and the score from the Table would be unaffected d. Neither the calculated t nor the score from the Table would be affected (Diff) 33.) If the sample variance were doubled, it is not the case that a. the t-score that I calculate would be larger and the score that I use from the Table would be smaller b. the t-score that I calculate would be smaller and the score that I use from the Table would be larger. c. the t-score would be smaller and the score from the Table would be unaffected (1. Both a and 0 apply (diff) 34.) Suppose I was interested in knowing the relationship between goal difficulty and goal commitment (a fairly common topic in my field). In an attempt to investigate this relationship, I draw a sample of 15 people, give them a goal for a task, and get the correlation between goal difficulty and commitment. The correlation from these 15 people is -.48. The degrees of freedom for the t-test for correlations is not a.) 13 b.) N-l c.) N-2 . d.) This applies to none of the above (Easy) 35.) Which of the following is true of an unstandardized regression weight (i.e., b in a regression equation with a single predictor)? a. It is the amount of change in the dependent variable associated with a unit change in the independent variable b. It is the amount of change in the independent variable associated with a unit change in the dependent variable c. It is the correlation between the independent and dependent variables (1. It is the covariance between the independent and dependent variables 149 (Diff) 36.) Suppose I know the number of times each Michigan resident has been swindled by Gov. Engler (So, I have access to this population of scores). I then take many different samples of 15 people each and calculate the mean for each sample. If the mean of the means were 7.8 and the standard deviation of individual scores were 2.2, the population mean and the standard error of the mean would not be a = 2.79, = .57 b. = 7.8, = 2.2 c. either a or b d. all of the above (Diff) 37.) I am interested in predicting graduate school GPA from GRE scores. In an attempt to do this, I take a sample of 10 graduate students, record their GRE scores and their GPA’s. The data look like this subject # GRE (X) _GLad School GPA (Y) 1 730 2.9 2 1120 3.1 3 1310 3.9 4 810 3.2 5 960 ' 3.0 6 1250 3.5 7 1180 3.3 8 1410 3.7 9 840 3.4 10 660 3.1 If the correlation between these two variables is .75, it could not be said that GRE scores account for a. 68% of the variance in GPA b. 95% of the variance in GPA c. 57% of the variance in GPA (1. both a and b 150 (Diff) 38.) I am interested in examining the relationship between GRE scores and GPA in graduate school. In an attempt to do this. I take a sample of 10 graduate students, record their GRE scores and their GPA’s. The data look like this: (by the way, GRE scores can range from 400 - 1600) ' subject # GRE (X) Grad School GPA (Y) 1 1230 3.4 2 1120 3.1 3 1310 3.4 4 1210 3.9 5 1360 3.8 6 1250 3.5 7 1180 3.3 8 1410 3.6 9 1380 3.4 10 1100 3.6 If the covariance between these two variables is 7.44, this (without any knowledge of the variances) would not tell you a. anything b. that there is a Strong, positive relationship c. that there is a weak, positive relationship d. This applied to none of the above For questions 39 - 40, use the following scenario. Suppose I am interested in assessing the relationship between schizophrenia (measured with a 10 point scale) and job performance (measured on a 10 point scale). To do this. I collect data on both of these variables for 30 people. (Mod) 39.) If the scatterplot for the data looked like the following, what kind of relationship would it be? ' a. imperfect negative b. imperfect positve c. perfect positive d. none of the above (Diff) 40.) If the scatterplot looked like the following, it would not be a(n) a. imperfect negative relationship b. imperfect positve relationship c. positive relationship d. Both b and 0 apply 151 (Easy) 41.) Which of the following correlation coefficients represents the suongeSt relationship? a. .68 b. .22 c. -.46 d. -.82 (Easy) 42.) Which of the following correlation coefficients represents the weakesr relau'onship? a. .68 b. .22 c. -.46 d. -.82 (Easy) 43.) Which of the following covariances represents the strongest relationship? a. 4.28 b. 17.71 c. -24.94 d. can’t tell (Easy) 44.) What is the probability of rolling a 1 then a 2 then a 3 then a 4 in four rolls of a die? a. .005 b. .51 c. .00077 d. .67 (Easy) 45.) What is the probability of rolling a 1 or a 2 or a 3 or a 4 in a single roll of a die? a. .005 b. .51 c. .00077 d. .67 (Easy) 46.) What is the probability of rolling a 1 then a 2 pr; a three then a four in two rolls of a die? a. .11 b. .055 c. .67 d. .00077 152 (Diff) 47.) Suppose I know that the mean of the population of grade point averages at MSU is 2.35. Suppose further that I get a sample of GPA’S from an unknown source, and the scores are: 3.3. 3.6, 2.2, 2.4, 3.1, 1.6, 1.4, 2.9, 3.9, 2.5, 3.4, 3.3. The mean of these scores is 2.8 and the standard deviation of these individual scores is .79. If I m that the null hypothesis were actually false (This would never actually happen), and I used a one-tailed test that examined the lower 5% of the disuibution, I would not commit a. a Type 1 error b. a Type 2 error c. an error (1. either a or c For questions 48 - 50, use the following scenario Suppose I know that the mean number of "beauty surgeries" that Phyllis Schlafly has in a day is 3.1 (a population mean), and the standard deviation of these individual scores (sigma) is 1.25. I then receive data on the number of surgeries that an unknown nazi fraulein has each day for ten days. The mean for these ten days is 3.7. (Diff) 48.) The z-score for this mean is not a. 4.8 b. 1.54 c. .6 d. both a and c (Mod) 49.) What is the null hypothesis for this test? a. The number of beauty surgeries that Phyllis Schlafly has is from the sample for the unknown nazi b. The sample is from the population of Phyllis Schlafly surgeries c. The sample has a mean that is larger than the population mean d. none of the above (Diff) 50.) [fl knew that the null hypothesis were actually true (This would never actually happen), and I used a one-tailed test that examined the upper 5% of the distribution, I would not commit a. a Type 1 error b. a Type 2 error c. an error d. all of the above 153 For questions 51 - 52, use the following scenario. Suppose I measured the height (in inches) of MSU football players and found that they were normally disuibuted with a mean of 75 and a standard deviation of five. (Diff) 51.) The percentage of MSU football players that can be expected to be between 70 and 80 inches tall is not a. 68% b. 95% c. roughly two-thirds d. This applies to none of the above (Mod) 52.) Approximately what percentage of MSU football players can be expected to be between 65 and 85 inches tall? a. 68% b. 95% c. one-third d. either a or b (Mod) 53.) What would the Z-score be for an MSU football player who was 83 inches tall? a. 1.6 b. 16 c. .16 (1. none of the above (Diff) 54.) Referring to the Z-score that you just calculated, the percentage of the heights that you would expect not to fall above this score is a. 5% b. 95% c. either a or b (1. none of the above (Easy) 55.) Which of the following is the term used to describe the extent to which the scores in a set of scores congregate in the tails of the distribution? a. skew b. positive skew c. potato Stew d. kurtosis 154 (Easy) 56.) Which of the following is the term used to describe the extent to which a distribution is asymmetrical? a. Skew b. positive skew c. p0tato Stew d. kurtosis (Easy) 57.) Which of the following is a property of the normal distribution? a. It is skewed b. It is unimodal c. It is leptokurtic d. It is platykurtic (Easy) 58.) The variance is basically which of these? a. the average squared deviation from the mean b. the sum of the squared deviations from the mean c. the sum of the absolute deviations from the mean d. the average absolute deviation from the mean (Easy) 59.) Which of the following is an advantage of the mean as a measure of central tendency? a. It is greatly affected by extreme scores b. It can be manipulated algebraically c. It is not greatly affected by extreme scores d. It is difficult to calculate (Easy) 60.) Which of the following is a disadvantage of the mean as a measure of central tendency? a. It is affected by extreme scores b. It can be manipulated algebraically c. It is not greatly affected by extreme scores d. It is difficult to calculate (Easy) 61.) Consider the following set of numbers: 4, 15, 7, 5, 1, 5, 6, 8 Using these numbers, calculate the percentile for a score of 8. a. .125 b. 8.75 c. .7 d. .875 155 For questions 62 - 63, use the following scenario. Suppose I wish to assess people’s height and political affiliation. To do this,I take 100 people and record their height in inches. I also assign them a 1 if they are democrat, 2 if republican. and 3 if they are other. (Diff) 62.) Height in inches is not a a. nominal scale b. ordinal scale c. interval scale d. all of the above (Mod) 63.) What type of scale would political affiliation be? a. nominal b. ordinal c. interval d. none of the above (Easy) 64.) Which of the following is clearly an interval scale? a. Height in inches b. Height in centimeters c. Temperature (in Celsius) (1. Gender (Easy) 65.) Which of the following is clearly a ratio scale? a. Weight in Kilograms b. Class standing c. Grade Point Average d. Temperature (in Celsius) For questions 66 - 69, use the following scenario. For some strange and terrible reason, I am interested in knowing the average number of white collar crimes committed per day by Sen. Bob Dole over the past 2000 days. In an attempt to estimate this value, I randomly choose twenty days from these 2000, count the number of white collar crimes he committed on each of those 20 days, and get the average of those twenty numbers. (Diff) 66.) My sample size is not a. twenty b. the average for the twenty days c. fifty Million (1. either b or c 156 (Diff) 67.) The parameter that I am trying to estimate is not a. the average number of white collar crimes per day committed by Dole over the past 2000 days. b. the average number of white collar crimes committed per day by Dole over the 20 days that I measured. c. the total number of white collar crimes committed by Dole divided by 2000 d. This applies to none of the above (Mod) 68.) What is the Statistic that I have used? a. The average number of white collar crimes per day committed by Dole over the past 2000 days. b. The average number of white collar crimes committed per day by Dole over the 20 days that I measured. c. 2000 d. all of the above (Mod) 69.) What is the population of interest? a. The white collar crimes committed per day by Dole over the past 2000 days b. The white collar crimes committed per day by Dole over the past 20 days c. The average number of white collar crimes per day committed by Dole over the past 2000 days. (1. Either a or c. Use the following scenario for questions 70 - 74. For some strange and terrible reason, I am interested in knowing the average height of the fifty million people who voted for Ross Perot (the second coming of Thurston Howell). In an attempt to estimate this value, I find twenty pe0ple who voted for Perot, measure their heights, and calculate their average. (Mod) 70.) What is my population of interest? a. The cast of Gilligan’s Island b. The twenty people whose heights I measured c. All voters (1. none of the above (Mod) 71) What is my sample size? a. Twenty b. Not enough information provided c. Fifty Million (1. none of the above 157 (Diff) 72.) The group that is not the sample that I am using is a. the fifty million who voted for Perot b. the twenty people whose heights I measured c. both a and b d. none of the above (Diff) 73.) The numerical value that is not the parameter that I am trying to estimate is a. The average height of the fifty million people who plan to vote for Ross Perot b. The average height of the twenty people that I measured. c. Fifty Million (1. both b and c (Mod) 74.) What is the statistic that I have used? a. The average height of the fifty million people who plan to vote for Ross Peror b. The average height of the twenty people that I measured. c. Fifty Million d. either a or b (Easy) 75.) Which of the following is the term for numerical values used to make generalizations about a large set of data from a subset of that set? a. descriptive statistics b. dependent Statistics c. inferential statistics (1. parametric statistics APPENDIX C Descriptives for statistical knowledge items Variable Mean Sthev Minimum Maximum N Label ITEMAI .73 .44 0 1 165 ITEMAZ .58 .49 O l 165 ITEM .92 27 0 1 165 ITEMA4 .64 .48 0 l 165 l'l'EMA5 .38 .49 0 l 165 ITEMA6 .65 .48 0 l 165 lTEMA7 .75 .44 0 1 165 ITEMA8 .33 .47 0 l 165 ITEMA9 .67 .47 O l 165 ITEMAIO .25 .44 0 l 165 l'I'EMAll .28 .45 0 l 165 I'I'EMA 12 .75 .43 0 1 165 lTEMA13 .71 .46 0 1 165 lTEMAl4 .23 .42 0 1 165 1'1'EMA16 .63 .48 0 1 165 1'1'EMA17 .67 .47 0 1 165 ITEMA18 .32 .47 0 1 165 ITEMA19 .54 .50 0 1 165 W .33 .47 0 1 165 ITEM! .44 .50 0 1 165 ITEMAZ’. .84 .37 0 1 165 1'1'EMA23 .46 .50 0 1 165 ITEMA24 .54 .50 0 l 165 W .67 .47 0 1 165 ITEMA26 .27 .45 0 1 165 1TEMAZ8 .78 .41 0 1 165 1TEMA29 .27 .44 0 1 165 lTEMA30 .15 .35 0 1 165 ITEMABI .27 .44 0 1 165 W .61 .49 0 l 165 ITEMA33 .75 .43 0 1 165 I'I'EMA34 .45 .50 0 1 165 l'l'EMA35 .68 .47 0 1 165 W .68 .47 0 1 165 l'I'EMA37 .60 .49 0 1 165 ITEMA38 .58 .49 0 1 165 I'I'EMA39 .32 47 0 1 165 ITEMA40 17 .38 0 1 165 1TEMA41 .36 48 0 l 165 I'I'EMA42 .28 45 0 1 165 ITEMAM 19 40 0 1 165 W45 68 47 0 1 165 ITEMA46 88 .33 0 1 165 lTEMA-W 43 .50 0 1 165 ITEMA48 44 .50 0 l 165 ”EMA-89 .35 48 0 1 165 lTEMA50 65 48 0 1 165 W1 .50 .50 0 1 165 I'I'EMA52 .55 .50 0 1 165 1'1'EMA53 .54 .50 0 l 165 I'I'EMA54 63 48 0 1 165 ITEMASS 7O 46 0 1 165 W6 49 .50 0 1 165 1TEMA57 .39 49 0 1 165 FIEMAS9 43 .50 0 1 165 ITEMAGO 64 48 0 1 165 ITEMA61 47 50 0 1 165 158 ITEMAI3 1'1'EM1 1.006 .1” . 1797‘ . 1994‘ .1566‘ .0807 .1825‘ .0409 .1051 .663 .1610‘ .2558“ . 1871‘ .1020 .2479“ .1344 .1213 .1027 .0701 . 1713‘ .1037 .1173 .0477 .1551‘ .615 .0796 .0847 .0544 .0227 ..0019 4296 .0550 . 1722‘ .2694“ -.0168 .21 12“ .0256 .0170 . 1925‘ .2288“ 159 .49 0 l .46 0 1 .50 0 1 .44 0 l .47 0 1 .5 0 0 1 .32 0 1 .50 0 l .5 0 0 1 .48 0 1 .49 0 1 .50 0 1 50 0 1 .50 0 1 .50 0 1 .48 0 1 .46 0 1 .32 0 1 - - Corrdniou Coefficients - - ITEMZ 111-2MB ITEM4 .1000 .1797‘ . 1994‘ 1.006 . 1625‘ .632 . 1625‘ 1.000) .2466“ .0232 .2466“ 1.006 .0996 -.0982 .0922 -.0474 .1187 .602 .0969 .1906‘ . 1946‘ .0938 .0122 .1513 . 1681‘ .3234“ .1172 .0441 .0160 -.07 89 .0887 .1317 .1328 .3086“ .1442 .610 .651 .1099 .1261 .0844 .1065 .0843 .0889 .690 .2041“ .1157 657 .0903 .1096 - 0397 .0343 .1286 .1811‘ .2114“ -.1157 - 0357 .0439 .623 - 0148 .1062 .1232 .1139 .0743 -.0547 .0446 .0414 .fl65“ 1359 .0850 .0782 .0795 .0267 .0226 .0276 .0952 .147 l 2268“ .682 .0667 .0746 .1709‘ .168 -.0)70 .0260 .667 .1255 -.0285 . 1825‘ .1365 . 1998‘ .1380 .0401 .618 .0830 .2218“ -.690 .0746 .0397 .3164“ .0861 .0921 .368“ .0602 «692 .600 .1282 -.0199 . 1254 .0726 .0531 -.0567 .1214 .0123 .0732 .0172 .0774 .0382 .0995 .1347 .1142 -. 1435 -.0841 -.0116 -. 1202 1'1'EM5 .1566‘ -.0982 .0922 1'1'EM6 .0807 -.0474 . 1 187 .1771‘ .275700 .646 -.0439 .1289 . 1855‘ .0742 .1020 .0814 .2617“ .1176 -.0183 .1299 .1181 .0454 .1292 .0692 .1227 -.0252 -.(X)83 .0181 -.633 .1641‘ .325.— .657 .0405 .0969 .1159 .1166 .1530‘ -. 1356 um: new .0409 .1051 .0933 .1681‘ .0122 .3234“ .1513 .1172 .0989 ..0722 .1264 .4168“ .0518 .1855‘ 1.0000 .1286 .1286 1.0000 .1262 .0014 .0560 .1456 .1918- .1668‘ -.0367 .0652 .0786 0134 .1596- 1883- .0736 .1568“ .1341- .0925 .0144 .2106“ .0916 ..0090 .0635 .0668 .0990 .0755 .0651 ..0292 .1004 .1847- .1096 .0274 .0659 .0211 .1133 .1006 .0175 .15-77- .0420 -.0420 .0175 .0701 .0731 .1075 .0723 .1967‘ .0896 .0920 .1202 .0453 .1395 .1335 ..0105 .1160 .0933 .0371 .0273 .1395 ..0400 .1433 .0186 .1161 .2130" .0682 -.1788° -2132» mm 1137111 .663 1610‘ .0441 0887 .0160 1317 -.07 89 .1328 -.612 .0200 . 1904‘ .1674‘ ..0099 .1771‘ .1262 .660 .0814 .1456 1.0000 «620 a- 1.0000 .0784 294' .667 .1115 .1100 3179' .0728 .0842 -.0965 .692 -.0146 .644 .676 0593 .0372 .0560 .0469 .1070 -.08(X) .0193 . 1020 . 1034 .655 .1949‘ .0000 1242 .0795 - 0166 .655 .1321 .666 .0835 .0746 .0118 .652 .624 .0369 .1343 -.0181 .0447 .0813 .696 .1338 .0803 .1568‘ .0726 -.0909 .1214 -. 1251 -. 1031 .0229 .0437 0323 .1870‘ -.0296 .0720 .61 1 110‘ .0301 -.0999 FTENLA45 TTEDLA46 FTENLA47 TTELLA48 1T10812 1182419 FTE3421 YYELAZ4 fTE0425 1150426 1110828 .0548 .0700 - .1366 «0147 YTEBLAl TTEBLAZ .0421 .1301 - .0311 «0202 .1301 .1628‘ .1180 .0713 .0374 .1089 .0570 .0421 .0330 .0280 «0604 .0393 .0629 .0805 - .0458 - .1280 - .0640 . .0208 .1830‘ .1998‘ .1273 .0513 .0171 .1862‘ .1352 .1095 .1157 .0013 .2026“ .2416“ .1481 .0952 .0046 .1412 .0743 .2020“ .1247 .0944 .0398 .1000 .0596 .0501 .0406 .0358 .0380 .1398 .0018 .1425 .0812 .0916 .1776‘ .0197 .1390 .0747 .0346 .1250 .0696 «0024 .0029 .2558“ .3086" .1442 .0610 .0697 .0542 .2757.— .1918‘ .0251 .1099 .1261 «1368 «0444 .0546 «0367 .0652 .0364 FTEL‘LB TTE0814 .1871‘ «0085 «1086 .0724 .0340 TTENLKB «0203 «0740 «0207 «0828 .0908 .0556 .0069 .1072 .0516 .0724 .0128 .0932 .1422 .1053 .0967 .1289 «0047 .1700‘ .0350 .0314 «1214 .0090 .0053 .0897 2037-- .1323 .0551 .0705 «1009 .0350 .1020 .0844 .1065 .0843 «0083 .1249 «0439 .0786 .0134 .1100 .2699“ .1147 .2236“ L0000 .1505 .0134 .1170 .1012 .0173 «0169 .0863 .0721 .1300 .1425 .1499 .0450 .2479“ .0889 .0090 .2041“ .2053“ .1301 .1289 .1596‘ .1883‘ .0728 .0842 .1697‘ .0623 .1505 L0000 160 .0736 .1825‘ 10575 - Ab- «0369 .0288 .0240 .0550 .0023 «0414 «1091 .0326 .1375 .0422 .1226 «0214 .0505 .0627 .0602 «0413 .0285 .0838 .1582‘ .0359 .0733 .1535‘ .0736 .0378 .1425 «0046 .1263 .0940 «0072 «0498 .0359 .1344 .1157 .0357 .0903 .1411 .0909 .1855‘ .0736 «0965 .0592 .1967‘ .1220 .0134 .1202 .1562‘ .1709‘ .1454 .0485 .1070 .1644‘ .0211 .1319 .1585‘ .0964 .0587 .0647 ITENLA4 YTENLKS .1627‘ .0732 . 204 .0958 «0613 .0757 .1482 .1393 «0109 .0587 .0402 .0518 .0981 «0161 «0829 .0717 .2031“ .1174 «1231 .2018“ .1487 .1276 .0077 .0362 .0081 «0017 «0172 .0416 .0602 «0839 .2568“ .6699“ .6699“ L0000 .0160 .1487 .1328 .1675‘ .1024 .0024 .0731 .0178 .0461 «0398 .2624“ .1213 .1096 «0397 .0343 .0291 .1176 .0742 .1841‘ .0925 «0146 .0644 .0351 .0120 .1170 .0160 .1202 L0000 .1149 «0649 .1014 .2341“ .0413 .2190“ .0734 .0159 .1748‘ .0189 .0817 «0379 .0569 .0010 .0257 .0171 «0401 «0321 .2093“ «0259 .0760 .0901 «0122 .0072 .0664 .0852 .0857 «1035 .0519 .1382 .1457 .0225 «1182 «0348 «0547 .0943 .0065 «0637 .1124 .0584 «0619 ‘«0756 «0574 .1344 .0388 .1425 .0443 FTENhké 1TEBLA7 .0514 .0652 .0870 .1445 .1020 .0136 .0770 .1564‘ .1010 .1144 .1946’ .1283 .0971 .0465 «0598 .1825‘ .1427 .0527 «0507 .0177 .0936 .1577‘ .0924 .1771‘ .0301 .1785‘ .1980‘ .0442 «0385 .0800 .1027 .1286 .1811‘ .2114“ «1115 .0958 .1020 .0744 .2106“ .0376 .0593 .2565“ .0239 .1012 .1487 .2624“ .1149 L0000 .0744 .0776 .1829‘ .0489 .0974 .2235“ «0347 .0701 «1157 «0357 .0439 «0078 .0721 .0814 .0916 L0000 .4541“ L0000 .0292 «1522 «0033 .1918‘ .1529‘ .1890‘ .0870 .0372 .0612 .0721 .0548 TTENLAS .0275 «0005 «0042 «1242 .0485 .0258 .1424 .1419 .0721 «0583 .1245 .2278“ .0352 .1424 «1430 .0701 .0561 .1514 «0898 «0728 .0728 «0010 £2211“ .1329 .0721 .1346 «0997 .0909 .0179 .0316 .1713‘ .0523 «0148 .1062 .0743 .0224 .2617“ .0635 .0668 .0469 .1070 m2. 0 .0793 «0169 .1675‘ .1709‘ .1014 .0776 .4541“ .2240" .0450 .1021 .1815‘ .1472 .1097 .0734 «1007 «0199 .0752 .0742 «0814 .0397 «1097 .0799 .0234 .0376 .0728 «0465 «0172 «1010 .0542 «0210 «0167 .0175 «0161 .0319 «0252 «0828 .0309 .0071 «1291 «0101 «0136 «0350 .0542 .0725 «1423 «1319 .0083 .0071 .1578‘ .1382 .0238 .0875 .1537‘ TTENLAO 102001 .0518 .0897 .0773 .0987 .1136 .0282 .1083 .0383 .0796 .0875 .1328 J770‘ .1683‘ .1674‘ .0881 .0693 .1599‘ .0818 «0126 «1270 .0459 «0278 .1754‘ .0763 .0056 «0415 .0535 .0315 .0312 «0549 1'1'EMl6 1'1'EMl7 1'1'EM18 ITEMI9 ITEMZO 1'1'EM21 mzmz ITEMZ .1037 .1173 .1232 «0547 .1139 0446 .0743 .0414 .0049 «0140 «0113 .0321 .1176 «0183 .0990 .0755 - «0800 .0193 .1627‘ .1135 .0863 .1024 .1454 £41.. .1829‘ .0292 .2240“ L0000 «0843 .0843 .2085“ .0134 .2027“ .0551 .0292 .1020 .1034 .1093 .0029 .0721 .0024 .0485 .0413 .0489 «1522 .0450 «0843 L0000 «0730 .0860 .0074 .0760 1'1'EM81 ITEMSZ l'I'EMl 1TEM3 1TEM4 1'1'EM6 1TEM8 1TEM9 ITEM 10 ITEM 11 11'EM12 161 .2199“ «660 .1259 .1779‘ .1577‘ .648 «0752 .0781 .0372 «624 .611 .0313 .1212 .619 «638 «0362 .682 .0927 .046 «0333 .0898 .628 .652 .614 .0603 «616 .2013“ .1128 .611“ .0640 «0186 «646 .0473 .651 .632 .0179 «0853 .1945‘ .688 .620 .1801‘ .622 .2052“ .0166 .2530“ .1722‘ .0734 .1119 .1715‘ .2439“ .625 .2471“ .0750 «0283 .1314 . 1583‘ .1603‘ «699 «623 «06 15 .0105 .618 .2134“ .1949‘ .0251 .0260 .1398 .1943‘ . 1622‘ .1040 .1184 «1687‘ .627 .0871 .0839 .662 «613 .1105 «0659 .1745‘ .1790‘ .2121“ .1731‘ .634“ . 1656‘ .624 .0424 .2046“ .1431 .166 .2328“ .2075“ «688 .1651‘ .661 «0177 .673“ .0713 «617 .0779 .0229 «1959‘ « 186‘ « 1405 «688 .1451 «0120 .680 .2529“ . 1841‘ . 1952‘ .1715‘ .0872 «635 .1149 .0233 .676 .666 .666 . 1032 .0446 . 1351 .1584‘ . 1627‘ .675 . 1646‘ .1734‘ .0870 .1214 .1514 .2052“ .1973‘ .1132 ITEMA12 1'1'EMAI3 l'1'EMAl4 I'I'EMA16 ITEMAN ITEMl8 lTEMl9 .0121 .644 .146 .0116 «0815 .644 «673 .0760 .695 .0409 .0147 .617 .1259 .1091 .1297 .673 .1406 .2432“ .2626“ .1386 .0542 .0173 .632 .602 .667 . 1242 .0201 «0999 .683“ .66 .136 .1487 . 1847‘ .146 .1950‘ .645 «0483 .614 .677 .610 .1235 «624 .1174 .618 .0405 .618 .2251“ .1346 .1445 .1438 .0150 .0963 .1242 .690 . 1553‘ .0562 «644 .621 .698 .665 «0192 . 1892‘ «1259 .1315 .685 .1351 .1838‘ . 1888‘ «0998 «673 .1484 .0151 .645 .614 .176‘ .1153 .1356 .66 «695 .1520 «638 .61 1 .11 10 .0845 .1342 .601 .647 .0725 .1083 .1264 .617 .3015“ .610 .0405 .0793 .638 .0778 .1977‘ .688 «617 .687 «641 «681 «634 .1444 .1290 «643 .669 .160 . 1928‘ .186‘ .1301 .0929 «611 .1232 .2912“ .2775“ .0755 .636 .1227 .086 .1497 «652 .0813 .698 .0879 « 161 « 1452 «620 «0777 «1530‘ .0771 «0856 .615 «607 «0742 .1111 .1245 «0858 «66 «615 «0194 «0122 .1903‘ .1598‘ «076 .0450 .645 «0483 «0880 .2457“ .1883‘ «016 .0227 .2489“ «610 .0975 .639 .689 .1049 . 1617‘ .1656‘ .629 .0432 .1284 .0744 .634 .0977 .1315 .1524 .1351 .617 . 1627‘ .0837 .0664 .3055“ «642 .612 .638 .1243 . 1492 . 1836‘ .0177 .629 .620 «0702 «696 .642 .681 .638 «0117 «639 .683 . 1264 .1280 .676 .0740 «0422 .0152 .652 .016 . 1677‘ «0257 «0122 .620 « 1522 «684 «0721 «642 «0476 ITEMZ4 ITEMZS ITEMZ6 ITEM28 ITEMZ9 1TEM30 1TEM31 .0477 . 1551‘ .615 .0796 .0847 .644 .627 .2765“ .0782 .626 .1471 .667 .168 .667 .1359 .0795 .676 .2268“ .0746 «670 .1255 .0850 .667 .652 .682 .1709‘ .660 «685 .0140 .0442 .1430 «0446 «106 .648 «0151 .1213 «0811 «0702 .0791 .1499 .0467 .658 .1299 .1181 .0454 .1292 .692 .167 «652 .164 .1096 .659 .1183 .0175 .0420 .0175 . 1847‘ .674 .611 .166 . 1577‘ «0420 .0701 .655 .66 .0795 .655 .666 .0746 .652 . 1949‘ . 1242 «0166 .1321 .0835 .0118 .624 .262“ .2182“ .687 .2735“ .2199“ .07 81 «638 .105 1 .653 .1051 «645 .0424 «620 .2032“ . 1673‘ .016 .0152 «661 «656 «622 .0749 .0499 .672 .616 .0460 .688 1'1'EM20 «0807 .607 .616 .616 .066 «612 «1120 .0644 « 1394 «623 . 1998‘ «601 .0756 «683 .0781 .1075 .669 .1343 .628 .0774 . 1569‘ .0497 .633 .652 .658 .6 89“ .686“ .1447 .1761‘ .607 .680 .620 .1216 «1534‘ .1604‘ .647 .0745 «0702 ITEMZl «647 «0177 .1 169 .66 .262“ .0410 «0433 «673 .046 .0745 .608 .633 .0479 .1439 «688 .632 .0971 .605 .654 .071 1 .667 .0156 .666 .641 «696 .1380 .0401 .618 «0172 .647 .0181 .0723 .1967‘ «0181 .0447 .611“ .674 «0498 .0445 .613 .0868 «668 . 1869‘ .1231 .667 «697 .685 .690 . 1591‘ .677 «1145 «638 «0137 .666 .642 ITEM6 .612 .661 «0137 «695 .6 14 «1012 .07 11 «644 .613 «0457 . 1424 .654 .6 18 .652 .651 «W4 .123 1 . 1664‘ .656 .626 «698 .164 «687 .0472 .666 .0788 . 1616‘ .086 «692 .669 “M4 .650 .630 .618“ «690 .121 1 .0745 «633 .696 .0920 .613 .696 .0179 . 1027 .626 «623 .1367 . 1937‘ .1088 «0153 .649 .0439 .636 :m9-- .664 «674 . 1569‘ .3164“ .1641‘ .126 .0458 .1338 22052-- 1119813 FTE5414 11E5416 rr25417 1113818 11E5119 1113820 FTE3421 1119822 FFEBI23 11E3824 ITE5426 1110828 1113829 FTE3430 FTE3133 1113838 1110841 rr25442 1112844 1110845 1112846 11E3848 YTEA€76 11Ebf78 1110879 FTEhd81 ffEhdSZ .0T74 .1300 .0731 .1070 .2190“ .0974 «0033 .1021 .0843 «0730 L0000 .0430 .1564‘ .2184“ .1448 .2088“ .0348 .2126“ .0877 .0866 .2236“ .2106“ .0894 .1533‘ .0772 .15 86‘ .1313 .1522 «1003 .2236“ .1039 .0909 .1377 1112824 «0582 .0073 .1029 .2424“ .1950‘ .0983 .0913 «0168 .1478 .0664 .1608‘ .0601 .1573‘ .0381 .0713 .1851‘ «1034 .0636 .0667 «0693 .0450 .0479 .0362 .0489 .0418 .0374 «0392 .0576 .0273 .1048 .1415 .0626 . 1425 . 1499 .0178 .0461 .1644‘ .0211 .0734 .0159 .2235“ «0347 .1918‘ .1529‘ .1815‘ .1472 .2085“ .0134 .0860 .0074 .0430 .1564‘ L0000 .0289 .0289 LOOOO .1868‘ .0599 .0775 «0308 .0365 «0211 «0678 .0000 «0088 .0127 .0694 .1002 .1291 .1242 .0092 .0715 .0461 .0932 .0262 «1389 .1303 .0777 .0092 .0240 «0913 .0132 .0179 «1162 .0190 .0959 .0542 «1283 .0642 .1007 .1707‘ .1023 .1212 «0100 .0863 «0249 rrEmazs 11E1426 «0180 «0518 .1795‘ «0337 .1200 «0718 .0345 «0497 .1977‘ .1291 «0089 .0461 .1032 .0704 .0514 «0025 «0614 «1038 .1731‘ «0100 .1871‘ .0669 .1203 «1091 .0817 «0157 .1313 .0406 .0258 «1165 .0388 «1231 «0092 .0053 .0944 «0396 «0269 «0930 .0086 «1683‘ .1200 .0866 .0444 «0103 .2212?‘ «0587 .2149“ «0472 .1212 .0999 .1890‘ .1364 .0601 «0149 .0270 «0156 «0093 «1077 «1880‘ «1356 .1140 .0450 «0398 .1319 .1748‘ .1890‘ .0870 .1097 .2027“ .0760 .2184“ .1868‘ .0599 L0000 .1195 .1347 «0796 .1818‘ .1716‘ «0188 .0137 .0523 «0120 .0579 .1057 .1215 .1186 .1383 «0749 .202360 «0163 .2220“ .0865 1113828 .0816 .1336 .0619 .1137 .1890‘ .2034“ .2026“ «0683 .0956 .0738 .1192 .1118 .1223 .2990“ .0005 .1128 .0838 .0224 «0393 «0721 .0427 .0818 .1052 .1643‘ .2813“ .1530‘ .0069 .1060 «0348 «0853 162 «0060 .1259 .1779‘ .1577‘ .0548 '«0752 .1051 .0774 .0074 .1027 .1448 .0775 «0308 .1195 LOOOO «0155 «0847 .1144 .0930 .0826 .2094“ .2321“ «2350“ .1778‘ .1514 .0925 .0648 .0142 «0185 .0626 .1820‘ .1402 .1251 rrEmmz9 «1856‘ .0421 .0786 «0349 .1723‘ «0208 «0280 .0658 .0748 .1402 .0285 .1502 .1080 .0320 «1043 .1777‘ «0039 .1939‘ «0458 «0183 «0091 .0927 .1000 .0477 .1679‘ .1245 .0805 «0058 «0398 «0887 .0372 «0624 .0311 .0313 .1212 .0019 .0053 .1569‘ «0498 .0326 .2088“ .0365 «0211 .1347 «0155 LOOOO .0622 .0109 .0383 «0314 .0629 .0579 .1614‘ .1058 .1642‘ .1275 .0132 1119830 .0923 .0157 .1006 .1301 .1398 .1379 .1177 .0419 .1599‘ .0234 .0617 .1309 .0224 .1177 .0225 .0155 .0209 «0863 .0666 «0469 .2533“ .0311 .0716 .1016 .0234 .1103 «0513 «0105 .1394 .0127 «0362 .0282 .0927 .0409 «0333 .0898 .1051 .0497 .0445 «0623 .0348 «0678 .0000 «0796 «0847 .0622 1110831 .0727 «1014 «1133 .0202 «0752 .0360 .0320 .0658 .1028 «1089 .0570 .0403 .0516 «1180 .0055 .0847 «1219 .0293 «0458 .0914 «0091 «0776 «0698 .0752 «0258 «0952 «0293 «0346 «0398 «0887 .0329 1110832 .0650 «0129 .0297 .0574 .25254o .2664“ .0271 .2094“ .0818 .0889 «0846 .0465 .1788‘ .0815 .0313 .0544 «0045 .0836 .0923 «0754 .0008 «0686 .1272 .0619 .1140 .1030 «0089 .0552 «1517 .0144 .0640 «0186 «0046 .0473 .0351 .0032 .0424 .0252 .0868 .1937‘ .0877 .0694 .1002 .1716‘ .0930 .0383 «0655 .2906“ L0000 .1306 «0351 «0278 .1603‘ .1380 .0580 .1105 .1364 «0411 «1081 .1150 .0872 «0101 «0525 1113833 .0121 .0173 .1017 .0455 .1721‘ .1407 .1788‘ .0316 .0905 «0101 .0318 «0525 .0866 .1174 .0388 «0296 «0882 .0385 «1001 .0595 .0527 .0245 .1620‘ .0249 .1598‘ .1368 «0104 «0247 «0786 «0122 «0853 .1945‘ .0688 .0920 .1801‘ .0622 «0920 .0558 «0568 .1088 .0866 .1291 .1242 «0188 .0826 «0314 .1376 .2271“ .1306 L0000 «0237 «0357 .0745 .0583 .0881 .1710‘ .0046 .1520 .0140 .1327 .0407 .0179 «0045 TTELC&4 .0672 «0162 .0797 «0089 .1110 .1948‘ .0339 .1749‘ .0611 .0670 .1334 .0732 .2291“ .0605 «0842 .1651‘ .0429 «0022 «0624 «0155 .0886 ' -0825 .0548 «0377 «0067 .0976 .0266 .0023 .0072 «0624 .0166 .2530“ .1722‘ .0734 .1119 .1715‘ .2032“ .2389“ .1869‘ «0153 .2236“ .0092 .0715 .0137 .2094“ .0629 .0333 «0149 «0351 «0237 LOOOO .8744“ .1007 .2588“ .0755 «0002 .1341 .204130 .0092 .0549 .1820‘ .2571“ .1163 1119835 «0101 .0644 .0950 .0582 .0934 «0966 .0074 .0264 .1296 .0211 .1545‘ .1231 .0079 .0358 .1314 .0548 .1200 .3540“ «0365 .0338 «0079 .1185 .0827 .1670‘ .1260 .0069 .1137 «0189 .0582 .0042 11E341 11E342 111083 111084 111085 11E346 11E517 11E548 111389 1110810 11E3411 1110812 11Ehdl3 11E3414 1113816 11E5417 11E3818 11Eh419 11E3420 1110821 11Eh£22 11Eh£Z3 1113424 1110836 “2694.. .0861 .0921 .3008“ .0415 .1930‘ .2325“ .1395 .1385 .1568‘ .0726 .2439“ .0825 .2471“ .0750 «0283 .1314 .1583‘ .1673‘ .2286“ .1231 .0249 .2106“ «0744 .1017 .0302 .0798 «0601 .0445 .0399 .1464 .0362 .1109 .1377 .0244 .0159 .1197 1113437 11Eh438 «0168 2112“ .0602 1282 «0092 -0199 .0000 .1254 «0562 «0018 .1093 «0733 .0057 .0405 «0105 .0938 .1160 .0371 «0909 «1251 .1214 «1031 .1603‘ 1949‘ «0599 .0251 «0823 0260 «0615 1398 .0105 .1943‘ .0318 .1622‘ .2134“ 1040 «0105 0152 .1447 .1761‘ .0067 «0097 .0099 .0439 .0894 1533‘ .0262 .1303 «1389 .0777 «0120 .0579 «2350“ 1778‘ .1614‘ .1058 .0727 «0445 .0102 «0697 .1603‘ .1380 .0745 .0583 .1007 .2588“ .0586 .2448“ L0000 .0602 .0602 LOOOO «0053 «1125 .1714‘ .0887 .1704‘ «0084 «0877 .0995 .0250 «0503 .1007 .1799‘ .0379 .2498“ .0350 .0668 .0797 . 1615‘ 1110837 1110838 .0311 «1479 «1088 .0449 .1287 .1894‘ .1592‘ .1249 .0645 .1779‘ «0103 .0889 .0650 .0675 «0396 «0277 1266 .1052 .0350 .0171 .1800‘ .0998 .0198 .1034 «0827 .0644 «0162 .2019“ .1784‘ .0398 «0561 .0885 .0536 .0772 .0240 .1057 .1514 .1642‘ .2104“ .0313 .0580 .0881 .0755 .0671 «0053 «1125 L0000 .1104 .1471 .0343 «1018 .1314 .0164 .1312 1110839 .0744 «0041 .0871 «0013 «0480 .1268 .0405 .0164 «0296 «0331 .1264 .0983 «0413 163 1110840 1110841 1113442 .0170 .1214 .0123 .0732 .0160 .1247 .1159 «0400 .1433 .0323 .1870‘ .1105 «0659 .1745‘ .1790‘ .2121“ .1731‘ .2234“ «0056 .0580 .0690 .1005 .1586‘ «0913 .0132 .1215 «0002 «0061 .1714‘ .0887 .1104 LOOOO .1343 .0366 .0233 .2073“ .0195 .0962 .0199 1110840 .0053 .1299 .0618 .0506 .0291 .0452 .1525 .0405 .1642‘ .0310 .0732 .0626 .1504 .0465 .0247 .1925‘ .0172 .0774 .0382 .0217 .1697‘ .1166 .0186 .1161 «0296 .0720 .1656‘ .0324 .0424 .2046“ .1431 .1096 .2328“ «0622 .0320 .1591‘ .2239“ .1313 .0179 «1162 .1186 .0648 .0150 .0076 .0749 .1364 .0046 .1341 .1523 .1704‘ «0084 .1471 .1343 L0000 .0054 «0141 .0799 «0329 .1178 .0738 1110841 .1393 .0461 .0334 «0391 .1059 .0737 .1528 .1527 .0455 .2199“ .0645 .1386 .0880 .0421 «0732 .2288“ .0995 .1347 .1142 .1480 .0349 .1530‘ .2180“ .0682 .0011 .2065“ .2075“ «0688 .1651‘ .0661 «0177 .2273“ .0713 .0749 .1216 .0977 .0364 .1522 .0190 .0959 .1383 .0142 .0443 .0749 .1166 «0411 .1520 .20410. .2258“ .0995 .0978 .2115“ «0249 .1252 .0383 «0012 .2398“ .1232 .0863 .2440“ .1841‘ .1751‘ «0597 11E1844 «1202 «1435 «0841 «0116 «1590‘ «1272 «1356 «1788‘ «2132“ .0301 «0999 «0017 .0779 .0229 «1959‘ «1806‘ «1405 «0388 .0499 «1534‘ «1145 .0562 «1525 «0336 .0108 «0695 «2594“ «1509 «0217 «1759‘ .0381 «0901 «0785 «0838 .0882 1113845 .0548 .1273 «0085 .0736 .1585‘ .0189 .1344 .0372 .0734 .0742 .1382 .1451 «0120 .0680 .2529“ .1841‘ .1952‘ .1715‘ .0372 .1604‘ «0938 «0674 .2236“ .2508“ .1104 .1976‘ .0109 .1494 .0524 .1030 .1260 .1545‘ .1252 .1494 .1054 1110846 .0700 «0513 «1086 .1825‘ .0964 .0817 .0388 .0612 «1007 «0814 .0238 .0872 «0335 .1149 .0233 .0576 .0566 .0666 .0216 .0647 «0137 .1569‘ .1039 .1707‘ .1023 «0163 .1820‘ .1006 «0700 .0092 .0872 .0407 .1820‘ .1877‘ .0379 .2498“ «0279 .0195 «0329 .1521 «0527 .1820‘ L0000 .0227 .1065 11Ehd46 «1155 .1544‘ .0765 .0385 .1411 .0618 .2057“ .0675 .0334 .0602 .0667 .1613‘ .0553 .0837 «0203 1113847 .1366 .0171 .0724 .1226 .0587 «0379 .1425 .0721 «0199 .0823 .0875 .1032 .0446 .1351 .1584‘ .1627‘ .0575 .1646‘ .0460 .0745 .0866 .0564 .0909 .1212 «0100 .2220“ .1402 .1275 .0572 .0638 «0101 .0179 _257100 .2734“ .0350 .0668 .0164 .0962 .1178 .0210 «0857 .1522 .0227 L0000 .0884 1158647 «2040“ .0758 .2763“ .1930‘ .0664 .1331 .1362 .1015 .2262“ .2089“ .1480 .3157“ .0831 .1094 .0107 1113819 1113421 1110824 1118445 1113446 1113847 1118448 .0924 .0733 .3422“ «0414 «0316 .0055 .0210 .1740‘ .1819‘ .1153 .0715 .2062“ «0284 .0215 «0414 1118848 .0863 «0249 .0865 .1251 .0132 «0129 .0329 «0525 «0045 .1163 .0790 .0797 .1615‘ .1312 .0199 .0738 .0867 «0357 .2730“ .1065 .0884 LOOOO «0168 .0320 «0396 .0233 .0000 «0248 «0359 .0562 .1092 «1399 .0546 .0149 «0052 .1346 .0620 1110849 .0421 .1352 «0203 .0288 .1627‘ .0010 .0514 .0275 «0817 «1097 .0518 .0121 .0244 .1400 .0116 «0005 .0644 «0073 «0807 «0847 .0512 .0327 «0582 «0180 «0518 .0816 «1856‘ .0923 .0727 .0650 .0121 .0672 «0101 ‘«0744 .0311 «1479 .0562 .0443 «1155 «2040“ «0680 «0111 .0067 .1467 .0364 .0134 .2079“ .1398 .1287 .0686 .0668 .1675‘ .0747 «0559 .1312 .0364 1110850 .1301 «1095 «0740 .0240 .0732 .0758 .1702‘ .0551 .1513 «0288 «0404 «1772‘ .0728 .0331 «0663 «0249 .0955 «0453 «0234 .0284 «0499 «0404 1113851 .0311 .1157 «0207 .0550 .1204 .0171 .0870 «0042 .0817 .0799 .0773 .1297 .0573 .1406 .2432?‘ .2626“ .1386 .0542 .0216 .1169 «0137 .0430 .1029 .1200 «0718 .0619 .0786 .1006 «1133 .0297 .1017 .0797 .0950 .0302 .1287 .1894‘ «0041 .0618 .0334 «0441 «0336 .2508“ .0765 .2763“ .1532‘ 164 .0901 .1068 .0634 .1523 «0838 «0950 «0619 «1503 «0460 «0353 .1106 .1365 .0452 .1261 .0841 .0566 «0614 .0716 «0342 .0667 .0669 .0642 .1162 .0950 .0451 .0962 .0532 .0584 .0392 .0082 1115152 1113153 «0202 .1301 .0013 .2026“ «0828 .0908 .0023 «0414 .0958 «0613 «0401 «0321 .1445 .1020 «1242 .0485 .0203 .0551 .0234 .0376 .0987 .1136 .0173 .2283“ .0932 .0506 .0302 .1300 .0667 .1487 .1242 .1847‘ .0201 .1409 «699 .1950‘ .0316 .0226 .0809 .2002“ «0695 .0514 .0754 .0977 .2424“ .1950‘ .0345 .1977‘ «0497 .1291 .1137 .1890‘ «0349 .1723‘ .1301 .1398 .0202 «0752 .0574 .2625“ .0455 .1721‘ «0089 .1110 .0582 .0934 .0440 .0798 .1592‘ .0645 .1249 .1779‘ .0871 «0013 .0506 .0291 «0391 .1059 «0249 .1252 .0108 «0695 .1104 .1976‘ .0385 .1411 .1930‘ .0664 «0064 «0092 .0162 .1391 .0942 .0247 .1018 «0212 «0730 .1847‘ .1711‘ .0210 .1866‘ .0671 .1910‘ .1098 «1015 1110854 .1628‘ .2416“ .0556 «1091 .0757 .2093.- .0136 .0258 .0545 .0728 .0282 .0245 «0483 .0014 .0377 .0010 .1235 «0024 «0812 .0410 «1012 .1536‘ .0983 «0089 .0461 .2034“ «0208 .1379 .0360 .2664“ .1407 .1948‘ «0966 «0601 «0103 .0889 «0480 .0452 .0737 .0383 «2.594.- .0109 .0618 .1331 .1008 «0855 «1292 .0405 «0329 «0149 «0465 «1642‘ «1575‘ «1457 «1167 «0942 «1019 «0985 .0435 .0632 1113855 .1180 .1481 .0069 .0326 .1482 «0259 .0770 .1424 «0293 «0465 .1083 .1174 .0218 .0405 .0518 .2251“ .1346 .1445 «1120 «0433 .0711 .1482 .0913 .1032 .0704 .2026“ «0280 .1177 .0320 .0271 .1788‘ .0339 «0012 «1509 .1494 .2057“ .1362 .1784‘ .2602“ .2597“ .0682 «0771 «0181 .1739‘ .0647 .1095 .1149 .0473 .1110 .0617 .1449 .1147 «0365 1110856 .0713 .0952 .1072 .1375 .1393 .0760 .1564‘ .1419 .0648 «0172 .0383 .1438 .0150 .0963 .1242 .0390 .1553‘ .0562 .0644 «0573 «0244 .1627‘ «0168 .1527 .2398“ «0217 .0524 .0675 .1015 .1504 .1120 1919‘ .0279 .1153 .1273 .0979 «0405 «0067 «0259 0594 .2117“ .1856‘ .1772‘ .2852“ .2103“ .1941‘ .0824 .2038“ .0227 .2583“ .2109“ .1260 .0957 .1227 .0746 .0379 .0024 .1461 «0987 «0834 1110857 1110859 .0374 .1089 .0046 .1412 .0516 .0724 .0422 .1226 «0109 .0587 .0901 «0122 .1010 .1144 .0721 ««0583 .0072 ««0199 «1010 .0542 .0796 .0875 «0244 1315 .0521 0985 .0598 .1351 .0265 .1838‘ «0192 .1888‘ 1892‘ «0998 -1259 «0073 «1394 «0323 .0409 .0745 .0213 «0457 .0264 .1301 .1478 .0664 «0614 1731‘ «1038 «0100 0956 .0738 0748 .1402 .1599‘ .0234 .1028 -1089 .0818 .0889 .0905 «0101 .0611 .0670 .1296 .0211 .1464 0362 .1266 0350 .1052 0171 .0405 .0164 .1642‘ .0310 .0455 .2199“ .1232 «0603 «1759‘ .0381 .1030 .1260 .0334 .0602 .2262“ .2089“ .1559‘ .0391 1110849 1113810 1113811 1113812 1110813 1110814 1113816 1113817 1113818 1113819 1113832 1113848 «0680 .1702‘ .1532‘ «0092 .1008 .1784‘ .1504 .1559‘ .0391 .0717 .0380 .1249 «0124 .0681 .2891“ .1942‘ «1684‘ «0392 .1766‘ .1620‘ .1316 .1377 «0228 .0012 .0714 .1561‘ 1110860 .0570 .0743 .0128 «0214 .1946‘ .1245 «0708 «0210 .1328 .1484 .0151 .0245 .0214 .1709‘ .1153 .1356 .1245 .0808 .1424 «0345 .1608‘ .1871‘ .1192 .0285 .0617 .0570 «0846 1110849 10000 «0694 «0045 .0003 «0837 «0147 .0896 .0896 «0220 «0246 .0288 «0017 «0327 «0216 «0106 .1282 .1169 «2091“ «1463 «0477 .0731 «0673 «0054 «0182 «1271 « 1764‘ «0704 .0257 .0117 «0270 1110861 .0421 .2020“ .0932 .0505 .0518 .0664 .1283 .2278“ .0052 «0167 .1770‘ .0600 «0695 .1520 «0638 .0311 .1110 .0845 «0570 .0833 .0854 .0374 .0601 .1203 «1091 .1118 .1502 .1309 .0403 .0465 1110850 «0694 L0000 «0209 .0252 .0073 .1461 .0771 .0882 .1519 .0246 .1032 .0526 .0852 .0216 «0911 «0708 .0744 .0567 «0923 .0477 .1047 .1461 .3199“ .1455 .1784‘ .1764‘ .0704 .0544 «0670 «0525 1110862 .0330 .1247 .1422 .0627 .0981 .0852 .0971 .0352 .0285 .0175 .1683‘ .1342 .0201 .0347 .0125 .1083 .1264 .0917 «0817 .0479 .0518 .1840‘ .1573‘ .0817 «0157 .1223 .1080 .0224 .0516 .1788‘ .1679‘ .1549‘ .1403 .1049 .1150 .0791 .0848 .1543‘ «0591 1113864 .0280 .0944 .1053 .0602 «0161 .0857 .0465 .1424 .1120 «0161 .1674‘ .3015“ .0510 .0405 .0793 .0838 .0778 .1977‘ .0858 .1439 .0352 .0684 .0381 .1313 .0406 .2990“ .0320 .1177 «1180 .0815 «0617 «0041 «0381 «0534 .1444 .0898 «0988 «0469 .0713 .0258 «1165 .0005 «1043 .0225 .0055 .0313 1113853 «0837 .0073 .1758‘ .1691‘ L0000 .0479 .1445 .0075 .0234 .2523.- .0850 .1089 .0917 .1179 .0956 .0752 «0249 .0879 «0856 «1423 .0693 .0227 .1115 .0733 .0909 .2323“ .0094 «0191 «0521 «0095 1113866 .0393 .1000 .1289 .0285 .0717 .0519 .1825‘ .0701 .0175 «0252 .0155 .0847 .0544 1110854 «0147 .1461 .1427 .0415 .0479 10000 .2441“ .1493 .1806‘ .1077 «0830 .0369 .0627 .0243 «1047 «0360 «0061 «0554 .0403 «0396 .0647 .1677‘ .1576‘ .1536‘ .1584‘ .1141 «0200 .0547 «0414 «0777 1110860 .0629 .0596 «0047 .0838 .2031“ .1382 .1427 .0561 .2912" .2775“ .0755 .0536 .0561 .0971 .1231 .0712 «1034 «0092 .0053 .0838 «0039 .0209 «1219 «0045 .0879 .2548“ .0605 .1664‘ «0392 .0636 .0944 «0396 .0224 .1939‘ «0863 .0293 .0836 «0930 «0393 «0458 .0666 «0458 .0923 .2018“ «1182 .0177 «0564 «1291 «1270 .0315 «0742 .1111 .1245 «0858 «0206 .0711 .0626 «0693 «1683‘ «0721 «0183 .0914 «0754 .2533“ 1110833 1113835 1110836 1113838 1113839 1110840 1110841 1110842 1113844 1113845 1110846 1113847 1113848 1113849 1110850 1110879 111382 1113811 1113812 1113813 1110814 .0318 .1334 .1545‘ .1109 .1800‘ .0998 «0296 .0732 .0645 .0863 .1545‘ .1480 1110860 .0288 .1032 .0802 .0023 .0045 «0830 .1429 .0367 .0938 «0301 LOOOO .1515 .0627 .0602 «0918 .2279“ .0838 .1330 «0825 «0275 .0779 .0997 .0899 .0919 .0717 _2273oo «0573 .0988 .0598 .0754 1113873 .0208 .1398 .0090 .0736 .1276 «0547 .1577‘ «0010 .0277 «0136 «0278 .0245 «0483 «0880 «0525 .0732 .1231 .1377 .0198 .1034 «0331 .062 .1386 .2440“ «0901 .0971 .1613‘ .3157“ .0717 1110861 «0017 .0526 .1037 «0114 .0850 .0369 .1560‘ .2236“ .1160 .2666" .1515 10000 .2342“ .1028 «0584 .1245 «0192 .1913‘ «1091 .0081 .0891 .0621 .3495“ .2567“ .1685‘ .1201 .1978‘ .1124 .0705 «1091 1113874 .1830‘ .0018 .0053 .0378 .0077 .0943 .0924 .2211“ .0456 «0350 .1754‘ .2489“ «0010 .0975 .0866 .2291“ .0079 .0244 «0827 .0644 .1264 .1504 .0880 .1841‘ «0785 .1252 .0553 .0831 .0380 .0917 .1043 «0752 «0853 .0790 1113875 .1998‘ .1425 .0897 .1425 .0362 .0065 .1771‘ .1329 .0485 .1578‘ .0763 .1656‘ .0029 .0432 .1174 .0605 .0358 .0159 «0162 .2019“ .0983 .0465 .0421 .1751‘ «0838 .1494 .0837 .1094 .1249 1110864 «0216 .0216 .1764‘ «0260 .0917 .0243 «0450 «0781 .0897 «0245 .0602 .1028 .1300 L0000 «1551‘ .0880 .1301 «0122 .0267 «0201 .1793‘ .0243 .0983 .1216 .0826 .1099 .0653 .0538 .1111 «0980 1110876 .0812 .0916 .2087“ «0046 .0081 «0637 .0301 .0721 .0583 .0542 .0056 .1315 .1524 .1351 166 .0388 «0842 .1314 .1197 .1784‘ .0398 «0413 .0247 «0732 «0597 .0882 .1054 «0203 .0107 «0124 1110865 «0106 «0911 .0428 .1460 .1179 «1047 .0309 «1042 «1174 .0352 «0918 «0584 «0360 .1776‘ .0197 .1323 .1263 «0017 .1124 .1785‘ .1346 .1502 .0725 «0415 .3055“ «0642 .0212 «0296 .1651‘ .0548 .0924 «0168 «0111 .0551 .0901 .1068 .0162 «0855 .2602“ .1120 .1919‘ .0681 1110866 .1282 «0708 .1133 .0625 .0956 «0360 .1480 1110878 .1390 .0747 .0551 .0940 «0172 .0584 .1980‘ «0997 «0038 «1423 .0535 .0177 .0529 .0520 «0882 .0429 .1200 .0733 .0320 .0067 .1513 .0634 .1523 .1391 «1292 .2597“ .0279 .1153 .2891“ 1113867 .1169 .0744 .1346 «0347 .0752 «0061 .1016 .1442 .0396 .0626 .0838 «0192 .0616 .1301 «0371 .1219 L0000 .1071 «0414 .0728 .1622‘ .1291 «0145 .0772 «0164 «0070 .1018 .1636‘ .1351 «0005 «0117 «0039 .0385 «0022 .3540“ .34 O. «0396 .1467 «0288 «0838 «0950 .0942 .0405 .0682 .1273 .0979 .1942‘ 1110869 «2091“ .0567 .1393 «0498 «0756 «0385 .0179 «1303 .0083 .0312 .0740 «0422 .0152 «1001 «0624 «0365 «0414 .0233 .0364 «0404 «0619 «1503 .0247 «0329 «0771 «0405 «0067 «1684‘ .0029 .0350 .0359 «0839 «0574 .0316 «0121 «0549 «0122 .0220 «1522 .0595 «0155 .0338 «0316 .0134 «1772‘ «0353 .1018 «0149 «0181 «0259 «0392 1113871 «0477 «0617 «0275 .0527 «0079 .0055 «0248 .2079“ .1106 .1365 «0212 .1739‘ .2117“ .1856‘ «0053 .0731 .1047 .0911 .1190 «1423 22324” .1389 «0128 .1366 .0891 .0543 .1793‘ ‘«0442 .1188 .1622? .0074 «0081 L0000 .1401 .1017 1113816 1113878 1113881 1113882 .2457“ .1883‘ .0109 .0227 .0525 .0156 .1024 .0024 .0479 .0444 «0103 .0818 .0927 .0311 «0776 «0686 .0245 «0825 .1185 .0210 «0359 .1398 .0331 .0452 .1261 «0730 «1642‘ .0647 .1772‘ .2852“ .1766‘ .0539 .0989 .1049 . 1617‘ .0344 .0266 «0387 .1646‘ .0362 .2212“ «0587 . 1052 .10W .0716 «0698 .1272 .1620‘ .0548 .0827 .1740‘ .0562 .1287 «W63 .0841 .W66 .1847‘ «1575‘ .1W5 .2103“ .1941‘ .1620‘ 113874 «W54 .3199“ . 1549‘ .1115 .1576‘ .1804‘ .1611‘ .1901‘ .1941‘ .0899 .3495“ .0824 .W83 .0328 .0415 «0145 .1331 «1121 .W35 .1017 .0539 .3153“ .1688‘ .2274“ .1174 .1426 «W58 «1121 .1234 .0744 .0934 .0977 .1070 .0941 .0472 .0242 .0489 .2149“ «0472 .1643‘ .0477 .1016 .0752 .0619 .0249 «0377 .1670‘ .1819‘ .1092 .0686 «0249 «0614 .0716 .1711‘ «1457 .1149 .0824 .2038“ .1316 113875 «0182 .1455 .1403 .0754 .0733 .1536‘ «0115 .0654 .1757‘ .1546‘ .0919 .2567“ .1088 .1216 «0226 .0348 .0772 .1068 .0856 .1423 .0037 .1536‘ .3153“ 10000 .3020“ .2064“ .1123 .0958 .0786 «0286 .0317 .1627‘ .0837 .0664 .0199 «0489 .0866 .1301 .0418 .1212 .W99 .2813“ .1679‘ .W34 «W58 .1140 .1598‘ «W67 .1260 .1153 «1399 .W68 .W55 «W42 .W67 .W10 «1167 .0473 .W27 .2583“ .1377 1113876 «1271 . 1784‘ .1049 .0453 .W .15 84‘ . 1094 .W80 .1761‘ .1594‘ .07 17 .1685‘ «0178 .0826 «0138 «0295 «0164 .2204“ «0834 .0349 «W94 . 1331 . 1688‘ .3020“ 1WW . 1505 .W47 .2696“ .0662 «1218 167 .063 8 «0702 .W83 .1243 «W96 . 1264 .1492 .0942 . 1280 . 1836‘ .0581 .0576 .0828 .0555 «0178 . 1862‘ .2086“ «0224 .07 88 . 1616‘ .0802 .2064“ .0149 «0321 .0374 «0392 .1576 . 1890‘ .1501 .0270 .1364 «0149 «0156 . 1530‘ .W69 .1060 .1245 .0805 «W58 .1103 «0513 «0105 «W52 «0293 «0346 . 1030 «W89 .0552 .1368 «0104 «W47 .W76 .0266 .W23 .W69 .1137 «0189 .0715 .2062“ «W84 .0546 .0149 «W52 .1675‘ .0747 «0559 «0453 «W34 .W84 .0669 .1162 .0451 .W42 .W50 .W62 .1866‘ .W71 .1910‘ «W42 « 1019 «W85 .1110 .1517 .1449 .2109“ .W57 .0746 .1260 .1227 .1379 «0228 .W12 .0714 113877 113878 113879 «1764‘ «0704 .W57 .1764‘ .0704 .644 .1150 .0791 .0848 «1107 «WZ7 .0144 .2323“ .W94 «0191 .1141 «WW .0547 .1W9 «0143 .W59 . 1895‘ .1266 .0515 «0166 .0715 .1446 .0769 .W82 .1379 .2273“ «0573 .W88 .1201 .1978‘ .1124 .Wl7 .1043 «0752 .1099 .0653 .0538 «1363 «1299 .0524 «0147 .0841 .09z2 «W70 .1018 .1636‘ .1492 .0686 .(XTM «0812 .0723 .0174 «W81 .W74 .W72 . 1296 .0169 .W93 .0134 .0303 .0811 .2274“ .1174 .1426 .2064“ .1123 .W58 . 1505 .W47 .2696“ 1WW . 1670‘ .W20 . 1670‘ 1WW .0691 .W20 .1591 1WW «0969 .W 16 .1034 «1192 «W37 «W25 .W52 «W 84 .0 [W «(7121 . 1677‘ «W42 «W57 «0476 .0179 « 13W «0483 «0494 «W92 .05 69 «1595‘ .W95 .W7 3 .1048 «W93 «1880‘ « 1077 « 1356 «0348 «085 3 «0398 «0887 .1394 .0127 «0398 «0887 « 1517 .0144 «07 86 «0122 .W72 «(£24 .05 82 .W42 .W15 «0414 . 1346 .1520 .1312 .0364 «0499 «0404 .0532 .W92 .0584 .W82 . 1098 «1015 .0435 .0632 .1147 «0365 .W24 «W87 .1461 «0834 .1561‘ «W20 1113881 1138W .0117 «W70 «W70 «(52.5 . 1543‘ «0591 .W43 «W46 «0521 «W95 «0414 «0777 .111 1 .W67 .1175 «0884 .241“ «W66 «0404 « 1218 .0598 .07 54 .0705 -. 1W1 «0853 .0790 . 1 1 1 1 «W80 «0432 «07 54 .0398 .0458 .1351 «W05 «W16 «0723 «0313 .1078 .0592 .0460 .W92 «0460 «0141 .W10 «W58 « 1121 .07 86 «W86 .0662 -. 1218 «W69 «1 192 .W16 «W37 .1034. «022.5 1.0000 .1340 . 1340 1WW APPENDD( D Item parameter estimates for statistical knowledge items ITEM INTER SLOPE THRESH DISPER ASYMP CHISQ DF 3.13. 8.13. SE. 8.13. 3.13. (PROB) 0001I 0.495 I 0.822 I -0.602 I 1.216 I 0.192 I 2.9 4.0 I 0169* | 0179* I 0255* I 0265* I 0084* I (0.5795) I I I I I I 0002I -0099 I 0.703 I 0.141 I 1.423 I 0.201 | 5.7 5.0 I 0201* I 0191* I 0273* I 0387* I 0084* I (0.3373) I I | I I I 0003I 1.575 I 0.862 I -1.827 I 1.160 I 0.205 I 1.9 3.0 I 0255* I 0255* I 0442* I 0342* I 0091* I (0.6008) I I I I I I 0004l 0.063 I 0.674 I -0093 | 1.484 I 0.217 I 0.9 5.0 I 0194* I 0173* I 0298* I 0381* I 0089* I (0.9691) I | I | I I 0005I -1.132 I 0.639 | 1.772 I 1.566 I 0.246 | 5.4 6.0 I 0440* I 0223* I 0534* I 0546* I 0069* I (0.4947) I I | I I I 0006I 0.054 | 0.551 I —0098 I 1.816 I 0.253 I 6.9 5.0 I 0208* I 0148* I 0386* I 0488* I 0099* I (0.2241) I I I I I I 0007I 0.536 I 0.925 I -0580 I 1.082 I 0.208 I 2.2 4.0 I 0179* I 0238* I 0245* I 0278* I 0089* I (0.7047) I I I I | I 0008I -1.125 I 0.821 I 1.370 I 1.217 I 0.179 I 3.7 6.0 I 0386* I 0272* I 0328* I 0403* I 0059* I (0.7199) I | I I I I 0009I 0.167 I 0.696 I -0240 I 1.436 I 0.231 I 2.1 5.0 I 0194* I 0184* I 0302* I 0380* I 0094* I (0.8397) l I | I I | 0010I -2.358 | 0.904 I 2.609 I 1.106 I 0.222 I 6.8 6.0 I 0946* I 0403* I 0826* I 0494* I 0042* I (0.3371) I I I I I I 0011I -l.558 I 1.170 I 1.331 I 0.854 I 0.164 I 3.0 5.0 I 0552* I 0458* I 0280* I 0334* I 0050* | (0.7076) I I I I I I 0012I 0.749 I 1.297 I 0.577 I 0.771 | 0.162 I 5.5 3.0 I 0184* I 0320* I 0175* | 0190* I 0074* I (0.1370) I I I I I I 0013I 0.309 I 0.439 I -0705 I 2.278 I 0.218 | 8.3 5.0 168 0014I -1.801 I 0017I -1.298 I 0018I -0164 I 0019I -1.167 | 0020I -0533 I 0022I -1.264 I 0023| -0.416 I 0026| -1.305 | 00271 -2.198 I 0028I -2.661 I 169 0608* I 0094* I (01407) I I 0.888 I 0.148 I 11.1 5.0 0339* I 0044* I (0.0499) I I 1.178 I 0.206 I 3.5 5.0 0305* I 0084* I (0.6296) | I 1.026 I 0.183 I 3.9 5.0 0257* I 0079* I (0.5655) I I 0.900 I 0.176 | 3.8 6.0 0301* I 0053* I (0.7016) I I 1.373 I 0.164 I 7.9 5.0 0347* I 0072* I (0.1592) I I 1.731 I 0.206 I 11.7 7.0 0592* I 0066* I (0.1107) I I 1.494 I 0.168 I 13.1 6.0 0364* I 0069* I (0.0406) I I 1.552 | 0.206 | 3.8 4.0 0381* I 0091* I (0.4408) I I 0.954 I 0.324 I 5.5 6.0 0376* I 0064* I (0.4798) I | 0.974 I 0.239 I 4.0 5.0 0322* I 0080* I (0.5496) I I 1.377 I 0.207 I 2.4 5.0 0301* I 0087* I (0.7898) I I 1.086 I 0.196 I 1.9 4.0 0258* I 0086* I (0.7651) I I 1.103 I 0.138 I 4.0 5.0 0339* I 0049* I (0.5458) I I 1.122 | 0.114 I 4.7 5.0 0463* I 0036* I (0.4601) I I 1.096 I 0.243 I 2.5 6.0 0029I -0148 I 0031I -1.317 I 0032I -0.065 I 0033I -0023 I 0034I -0101 I 0035I -0086 I 0036I -1.610 I 0037I -2.654 I 0038I -1.053 I 0039I -1.296 I 0042| -0.804 I 0043I -0741 I 170 0510* I 0040* I (0.8745) I I 1.522 I 0.269 I 8.8 5.0 0445* I 0096* I (0.1177) I I 1.760 I 0.211 I 6.5 5.0 0444* I 0092* I (0.2604) I I 0.884 I 0.318 | 8.1 6.0 0349* I 0062* I (0.2335) I I 0.593 | 0.322 I 7.8 4.0 0213* I 0082* I (0.0993) I I 0.536 I 0.319 | 6.5 4.0 0203* I 0080* I (0.1617) I I 2.304 I 0.239 I 11.4 6.0 0640* I 0096* I (0.0762) I I 1.390 | 0.195 I 4.1 5.0 0338* | 0082* I (0.5309) I I 1.252 I 0.235 I 2.1 7.0 0490* I 0055* I (0.9545) I | 0.634 I 0.124 I 2.8 5.0 0308* I 0032* I (0.7331) I | 1.127 I 0.191 I 4.2 6.0 0383* I 0064* I (06494) I I -. 0.992 I 0146 I 3.8 5.0 0324* I 0052* I (0.5785) I | 1.187 I 0.213 I 1.6 5.0 0310* I 0088* I (0.8979) I I 1.289 I 0.218 I 1.2 2.0 0388* I 0095* I (0.5556) I I 0.836 I 0.192 I 11.6 5.0 0284* l 0063* I (0.0397) I I 1.141 I 0.219 I 2.8 6.0 I I 0044I I I 0045I I I 0046! I I 0047I I I 0048I I I 0049I I I 0050I I I 0051I I I 0052i | I 0053I I I 0054I I I 0055I I I 0056I I | 0057I I I 0058I 0332* I I 0.004 I 0220* I I -0523 I 0295* | I -0677 I 0381* | I -0243 I 0220* I I -0025 | 0213* I I 0.314 I 0175* I I -0694 I 0350* I I -1.261 I 0506* I | -0815 I 0348* I I 0.027 I 0208* I I -0856 | 0397* I I -1.629 I 0726* I I 0.231 I 0205* I I -2.576 I 1383* I I 0.431 I 0293* I I 0.639 I 0192* I I 0.957 I 0314* I I 0.785 I 0288* I I 0.807 I 0222* | I 0.561 I 0158* I I 0.674 I 0174* I I 0.851 I 0298* I I 1.033 I 0391* I | 0.861 I 0277* I I 0.719 I 0194* I I 1.235 I 0431* I I 1.525 I 0684* I I 0.757 I 0204* I I 0.952 I 0455* I I 0.607 I 0274* I -0007 0345* I 0.547 0232* I 0.862 0371* I 0.301 0239* I 0044 0376* I -0466 0303* I 0.815 0302* I 1.220 0296* I 0.946 0283* I I 0.038 0293* I 0.693 0194* I I 1.068 0217* I I -0306 0302* I I 2.707 1247* I I -0710 171 0382* I 0074* I (0.8326) I I 1.565 I 0.264 I 16.4 5.0 0470* I 0099* I (0.0060) I I 1.045 I 0.228 I 1.4 5.0 0343* I 0077* I (0.9224) I I 1.273 I 0.341 I 8.2 6.0 0467* I 0084* l (0.2224) I I 1.239 I 0.193 I 2.7 5.0 0341* I 0078* I (0.7429) I I ‘ 1.782 I 0.249 I 5.3 5.0 0503* I 0097* I (0.3752) I I 1.483 I 0.213 I 7.3 5.0 0383* I 0091* I (0.1982) I I 1.175 I 0.267 I 3.9 6.0 0412* I 0079* | (0.6881) I I 0.968 I 0.254 I 3.1 6.0 0366* I 0061* I (0.7985) I I 1.161 I 0.224 I 4.0 6.0 0374* I 0071* I (0.6741) I I 1.391 I 0.233 I 1.6 5.0 0375* I 0092* I (0.9063) I I 0.810 I 0.244 I 3.7 5.0 0283* I 0066* I (0.6004) I I 0.656 I 0.257 I 4.2 6.0 0294* I 0053* I (0.6566) I I 1.320 I 0.258 I 4.2 5.0 0356* I 0099* I (0.5180) I I 1.051 I 0.428 I 4.6 7.0 0503* I 0045* I (0.7128) I I 1.646 I 0.208 I 7.2 5.0 I I 0059I I I 0060I I I 0061I I I 0062I I I 0063I I I 0064I | | 0065I I I 0066I I I 0067I I I 0068I I I 0069I I I 0070| I 0167* I I 0.203 I 0190* I I 0.280 I 0216* I I -0636 I 0352* I I -0.573 I 0299* I I -0087 I 0237* I I 0.010 I 0210* I I -0620 I 0297* I I -0742 I 0321* I | -0179 I 0188* l I -0719 I 0349* I I -1.379 I 0520* I I -2.140 I 0876* I 0143* I I 0646 I 0172* I I 0.655 I 0159* I I 0.442 I 0139* I I 0.605 I 0188* I | 0.682 I 0205* I I 0944 I 0270* | I 0.992 I 0319* I I 0.873 I 0282* I I 0.723 I 0171* I I 0.668 I 0224* I I 0.818 | 0301* I I 0.842 I 0365* | -0314 0326* I I 0.428 0290* I | 1.441 0696* I | 0.947 0404* I | 0.127 0335* I I -0011 0224* I | 0.625 0217* I I 0.850 0261* I I 0.247 0238* I I 1.076 0408* I I 1.685 0435* I I 2.541 0818* I 172 0388* I 0090* I (0.2040) I I 1.548 I 0.235 I 4.5 5.0 0412* I 0095* I (0.4766) I I 1.528l 0190I 1.1 5.0 0372* I 0079* I (0.9512) I I 2.264 I O.312| 8.3 6.0 0.12.7 *I 0093*I(02137) I I 1.654I 0.246 I 8.2 6.0 0515* I 0085* I (0.2250) I I 1.466 I 0.272 I 5.0 5.0 0441*1 0.098*I(04138) I I 1.060I 0.218 I 7.4 5.0 0303* I 0085* I (0.1902) I I 1.008I 0.199I 5.2 5.0 0324* I 0071* I (0.3929) I I 1.145I 0.205 I 1.8 6.0 0369* I 0071* I (0.9358) I I 1.382I 0.161 I 3.4 5.0 0327* I 0071* I (0.6395) I I 1.497 I 0.274 I 10.2 6.0 0501* I 0082* I (0.1170) I I . 1.222 I 0.238 I 6.7 6.0 0450* I 0060* I (0.3469) I I 1.187I 0260I 5.8 7.0 0515* I 0047* I (0.5613)