LIBRARY ' Michigan State University PLACE IN RETURN BOX to remove this checkout from your record. TO AVOID FINES return on or before date due. MAY BE RECALLED with earlier due date if requested. DATE DUE DATE DUE DATE DUE 6/01 c:/C|RC/DateDue.p65-p.15 SCORE INFLATION ON BIODATA AND SITUATIONAL JUDGMENT INVENTORY ITEMS: A COLLEGE ADMISSIONS QUANDARY By Lauren Jill Ramsay A THESIS Submitted to Michigan State University in partial fulfillment of the requirements for the degree of MASTER OF ARTS Department of Psychology 2003 I ABSTRACT SCORE INFLATION ON BIODATA AND SITUATIONAL JUDGMENT INVENTORY ITEMS: A COLLEGE ADMISSIONS QUANDARY By Lauren Jill Ramsay Tests used in a selection context must be carefully examined to ensure that if they are susceptible to inflation, that vulnerability does not affect their utility. Biodata and situational judgment measures could be used to support college admissions decisions. This study explores the relationship between score inflation on these two tests, and situational factors influencing performance: motivation to perform well, coaching on how to perform well, and a warning statement not to respond dishonestly. Both motivation and coaching were found to predict performance. Conscientiousness and emotional stability were associated with social desirability measured by self deception and impression management, and extreme inflation appeared to be driven more by situation than by personality. Item-level analyses showed that biodata items that did not require elaboration were more susceptible to inflation, as were items that were less objective, less controllable, and more college-relevant Criterion-related validity was not negatively affected by inflation in this study, and corrections to selection decisions based on three indices of inflation (bogus items, an inflation index, and high scores on impression management) did not result in any change in the quality of candidates selected, in terms of performance outcomes. Implications of the findings are discussed. ACKNOWLEDGEMENTS This research was conducted with support from the College Board, New York. iii TABLE OF CONTENTS LIST OF TABLES ............................................................................................................. vi LIST OF FIGURES ......................................................................................................... viii INTRODUCTION ............................................................................................................... 1 LITERATURE REVIEW .................................................................................................... 5 Definitional Issues ................................................................................................... 5 Biodata ..................................................................................................................... 8 Situational Judgment Inventory ............................................................................. 12 Coaching ............................................................................................................... 13 Individual Differences in Faking ........................................................................... 19 Effects on Validity ................................................................................................. 21 Inflation Controls ................................................................................................... 25 Warning ........................................................................................................ 26 Composition of Items ................................................................................... 27 Statistical Control using Social Desirability or Bogus Items ....................... 30 Study Development ................................................................................................ 34 METHOD .......................................................................................................................... 45 Sample .................................................................................................................... 45 Study Design .......................................................................................................... 46 Procedure ............................................................................................................... 47 Manipulations ........................................................................................................ 49 Measures ................................................................................................................ 50 RESULTS ............................................... - ........................................................................... 57 Situational Differences ........................................................................................... 57 Individual Differences ........................................................................................... 67 Item Differences ..................................................................................................... 73 Validity and Selection Decisions ........................................................................... 82 CONCLUSION .................................................................................................................. 91 Discussion .............................................................................................................. 91 Limitations ............................................................................................................. 97 Practical Implications ............................................................................................. 98 iv APPENDICES ................................................................................................................. 1 04 Appendix A -— Mael’s Taxonomy ........................................................................ 104 Appendix B - Sample Questionnaire ................................................................... 106 Appendix C - Sample Biodata Items ................................................................... 120 Appendix D — Sample Situational Judgment Items ............................................. 122 Appendix E — Wording of Instruction Sets .......................................................... 124 Appendix F - Informed Consent Form — Motivated Group ................................ 129 Appendix G - Informed Consent Form — Not Motivated Group ......................... 132 Appendix H — Faking Study Protocol .................................................................. 135 Appendix I — Coaching Directions ...................................................................... 141 Appendix J - Coaching Handout ......................................................................... 143 Appendix K — College GPA Release Form ........................................................ 156 Appendix L — High School GPA and SAT/ACT Score Release Form ............... 158 Appendix M - Twelve Dimensions of College Performance .............................. 160 Appendix N - Bogus Biodata Items .................................................................... 164 Appendix 0 — Inflation Index Items .................................................................... 166 Appendix P — Calculation for Inflation Index Selection ..... ' ................................. 169 Appendix Q — Biodata Item Types Coding Instructions ...................................... 172 Appendix R — SJ I Item Types Coding Instructions ............................................. 174 REFERENCES ................................................................................................................ 176 LIST OF TABLES Table 1 - Sample Elaborated Biodata Item ....................................................................... 29 Table 2 — Expected Score Levels ....................................................................................... 39 Table 3 —- Study Manipulations and Participants per Cell .................................................. 46 Table 4 — Coefficient Alpha and Descriptives for Ratings of Biodata Item Characteristics .................................................................................................................... 55 Table 5 - Coefficient Alpha and Descriptives for Ratings of SJ I Item Characteristics .................................................................................................................... 56 Table 6 — Analysis of Variance Results for Biodata with and without the Inclusion of Covariates ...................................................................................................................... 59 Table 7 — Means and Standard Deviations of Biodata across Conditions ......................... 61 Table 8 — Analysis of Variance for SJ I with and without Covariates ................................ 63 Table 9 — Means and Standard Deviations of SJ I Responses for Various Study Conditions .......................................................................................................................... 64 Table 10 - Correlation Matrix ........................................................................................... 69 Table 11 - Overall Means and Standard Deviations of Four Item Characteristics Fisher 2 for Biodata ............................................................................................................ 73 Table 12 — Analysis of Variance Results for Fisher 2 for Four Item Characteristics for Biodata ......................................................................................................................... 74 Table 13 — Means and Standard Deviations of Four Item Characteristic Fisher 2 for Various Study Conditions for Biodata ............................................................................... 75 Table 14 — Overall Means and Standard Deviations of Four Item Characteristics Fisher 2 for SJ I ................................................................................................................... 77 Table 15 - Analysis of Variance Results for Fisher 2 for Four Item Characteristics for SJ I ................................................................................................................................. 78 Table 16 — Means and Standard Deviations of Four Item Characteristic Fisher 2 for Various Study Conditions for SJ I ...................................................................................... 79 vi Table 17 - Correlations and Descriptive Statistics for Predictors and Criteria ................. 82 Table 18 — Zero Order and Partial Correlations between Situational Judgment and Biodata and Two Criteria Controlling for Measures of Faking ......................................... 83 Table 19 — Coefficient Alpha of Biodata and SJ I Scales for Various Study Conditions .......................................................................................................................... 84 Table 20 — Adjusted r for Biodata and SJ I Scales in Predicting GPA and Absenteeism Across Various Study Conditions ...................................................................................... 85 Table 21 — Descriptives for High Performance and Reference Groups ............................. 87 Table 22 - Descriptive Statistics for Selection Ratio of .10 .............................................. 89 Table 23 — Descriptive Statistics for Selection Ratio of .25 .............................................. 89 Table 24 - Descriptive Statistics for Selection Ratio of .50 .............................................. 9O vii LIST OF FIGURES Figure 1 — Conceptual Model ............................................................................................ 36 Figure 2 — Expected Interaction of Warning, Coaching, and Motivation on Test Performance ............................................................................................................... 40 Figure 3 -— Interactions between Coaching, Motivation, and Warning for Biodata Performance ....................................................................................................................... 66 viii ,. » .5. l“' «w \ A.‘ .— ll " s.... I I.... D n -c 21' f“: {I .' V- v...-. ..‘.w \ h...‘ - t . A.-. .. .3. - . ...__ "~ ~.. \ "A. .5”. INTRODUCTION Colleges have historically considered traditional academic measures, such as high school GPA and SAT/ACT scores, in predicting academic performance in college, which informs admission decisions. However, there is more to the college experience than academic performance alone, and as colleges are pressed to measure student performance along broader criteria, they need to consider a similarly broad range of predictors. Biodata questions and situational judgment inventories have shown promise as two such examples of predictors of college student performance (Oswald, Schmitt, Ramsay, Kim, & Gillespie, in press), and in general, the prospect of using a more inclusive range of instruments is attractive for gathering a wealth of information on applicants, given the aim of increasing the diversity of applicants accepted to college. Nevertheless, such measures require thorough development and validation efforts because applicants to college may be motivated to distort their responses (e. g., “fake good”) and inflate their scores on these tests. Faking can be defined as a conscious effort to manipulate responses to make a positive impression (see Zickar & Robie, 1999), and people can and do fake on certain tests when motivated to do so (McFarland & Ryan, 2000; Viswesvaran & Ones, 1999). It is reasonable to assume that applicants may manipulate their responses to a test in a way that enhances the likelihood of their success in a selection process (Hough, Eaton, Dunnette, Kamp, & McCloy, 1990), and such manipulation may affect the utility of the tests (Rosse, Stecher, Miller, & Levin, 1998). This manipulation may be motivated by a desire to appear socially desirable or job desirable (Ones, Viswesvaran, & Reiss, 1996). As Corr and Gray (1994, p. 433) succinctly put it, “to some extent it is de rigueur for successful job applicants/incumbents to engage in some form of impression management.” Nevertheless, some inflation goes beyond low-key impression management, and may include outright lying and deliberately providing false information. As it is difficult to establish the boundaries of what is honest and what is dishonest responding, I will use the term inflation, rather than faking, unless I am referring to literature that specifically uses the term faking. I seek to understand the extent to which individual difference characteristics affect test responses, how responses might be distorted under different situational constraints, and how both these factors affect the utility of biodata and situational judgment inventories designed for college admissions. While we know that individuals are capable of inflation on some items on biodata : h... tests and situational judgment inventories, we do not know enough about situational factors that contribute to inflation, or methods of identifying it and limiting its effects. If eventually biodata and situational judgment measures might be used to contribute to the information collected by colleges in making admissions decisions and in student development contexts (Oswald, et al., in press), then applicants will be motivated to achieve high test scores, and many will likely demand coaching programs to help prepare for the test. In other words, if there is a move to supplement ability measures with the noncognitive measures considered in this paper in making college admissions decisions, the evaluation process may become more vulnerable to coaching and other forms of manipulation. Some researchers regard faking studies to be of little value, however, when conducted outside a real selection experience. I argue that there are important reasons to examine faking in tests to be used in a selection context, before they are actually used for Ix) decision-making. Inflation and coaching may be problematic in some contexts, and considering the motivation to perform well on selection tests in a college-admissions context, these issues must be addressed when considering tests such as biodata and situational judgment inventories for college admissions. Past research makes a similar recommendation: With the development of the Assessment of Background and Life Experiences (ABLE), a test containing personality and biodata content, that test was not regarded as ready for implementation until research had been conducted on the effects of coaching and faking (White, Young, & Rumsey, 2001). In the present context, without studying the causes, contexts, and outcomes of test-takers’ manipulating their responses to these new tests, and instead simply implementing a new test in a college admissions context, universities would be opening themselves up to intense criticism and probable litigation for using a test the appropriateness of which has not yet been supported. It is essential that test users understand the extent to which a test’s validity may be affected by coaching or inflation, and how warning statements, biodata elaboration requirements and correction scales might be useful if such tests were used in a college admissions context. To that end, this study tests a sample of freshman students, most of whom are in their first semester of college. With this sample I will be capturing information on individuals who have not yet had college experiences and are very similar in age and experience to those actually applying to college. Much of the research on faking has focused on personality tests, allowing enough studies for Viswesvaran and Ones (1999) to meta-analyze the extent to which personality trait scales can have their scores inflated. They conclude that in between-subjects design studies of the Big Five factors and social desirability scales, faking good instructions in laboratory settings result in response inflation of about half a standard deviation across the Big Five, and more than one standard deviation in social desirability scales, compared to those with instructions to respond honestly. As is the case in employment selection, by inflating scores, some applicants to college may be able to distort and thereby improve their ranking in comparison to others, consequently affecting who is selected, under a top-down selection system (e.g., Rosse, et al., 1998). While identifying inflation is one avenue toward ensuring a reliable selection process, another is discouraging responses that are dishonest or untrue. Following is a review of literature related to inflation that is relevant to biodata measures and situational jUdgment tests. I discuss the effectiveness of various methods to reduce inflation, such as Warning statements and biodata elaboration requirements, and the use of items with Characteristics that make them more resistant to inflation. I address identifying and correcting for score inflation using various scales. I also e"(a-thine the impact of test coaching, as coaching will almost certainly occur in the event that any noncognitive measure is used in high-stakes decision-making. In many cases I haVe tapped the literature on personality and integrity tests when it is likely to apply to responses to biodata and situational judgment items. LITERATURE REVIEW Definitional Issues The literature covers a range of studies that describe personality test score inflation somewhat generally as faking (see Viswesvaran & Ones, 1999). Score inflation could be partially explained by a number of effects that are not always differentiated. Effects sometimes classified under the umbrella of response bias are socially desirable responding, job-desirable responding, self-deception, impression management, and lying. Considering all these effects as one thing does little to further our understanding of how and why individuals inflate their scores on certain tests. These different precipitators of score inflation are each described below. = «the. Socially desirable responding may occur when the characteristics being captured by a test or test item are transparent to the respondent, and those characteristics are regarded as attractive by the respondent or more generally attractive in the respondent’s CUIture. Personality tests are criticized for being susceptible to socially desirable r esPending, where conscientiousness and emotional stability factors of the Big Five, for e"EEIII‘rple, are regarded as socially desirable and adaptive. This transparency potentially IIla-kes personality measures, such as those capturing the Big Five, susceptible to inflation. On average, the Big Five measures have been shown to be inflated by about hal f a standard deviation under instructions to fake good (Viswesvaran & Ones, 1999). To be able to control for this tendency, social desirability scales have been created to gather information about the tendency of individuals to respond in a socially desirable allrler. However, these scales themselves are fakeable, with Viswesvaran and Ones (1 999) showing in their meta-analysis a faking effect size of more than one standard deviation on social desirability scales. Paulhus (1984) presents evidence of the two-component model of socially desirable responding; self—deception and impression management. Self-deception is the unconscious inclination that an individual has towards claiming that desirable characteristics apply to them. Impression management is the conscious dissembling that an individual engages in to present a favorable impression. Paulhus’ Balanced Inventory of Desirable Responding (BIDR) is an example of a social desirability measure that Captures both dimensions. Paulhus demonstrated that impression management tended to be more susceptible to situational change than self-deception. It is this impression management component that is regarded as being most closely linked to faking. While these two factors of social desirability are conceptually useful, they have not consistently been empirically supported by other researchers, and there are more nuanced ways to 100k at them, including the positive and negative attributes of specific items (see Paulhus & Reid, 1991). Job—desirable responding reflects the recognition that individuals understand that, for particular jobs, different job behaviors or characteristics will be appropriate and desirable. If the test being taken is capturing traits that are transparent to the respondent, then respondents who understand which characteristics are desirable in the particular Work context can respond in a job-desirable way. Dwight and Alliger (1997) show that j Qb‘relatedness of integrity items was positively related to their fakability, and this may b 6 because job-relevant items are more transparent, which makes job-desirable r asponding easier. Miller (2001) describes how coaching for a test can focus on traits l'k desirable for a particular job, where job applicants can modify their responses to a test (or interview) to suit the target audience (e. g., an HR recruiter, a direct supervisor). Lying is a term that should be used with caution, because it implies an intent to deceive. Nonetheless, the term has been used to refer to people with scores on “lie Scales” that in some cases simply comprise unusual responses rather than responses that are necessarily lies. Scales that are made up of bogus items reflecting fictitious experiences that the individual could not possibly have had (e.g., Anderson, Warner, & Spencer, 1984), are the closest that research in this area gets to capturing outright lying. By identifying these definitional issues I seek to point out that being too quick to simplify the issue of faking may result in missing separate and important factors in score inflation. The present study examines score inflation in two tests whose data show initial promise for use in a college admissions context: biodata and situational judgment inventories. Inflation on these two tests may be a result of any number of the different factors described previously. Social desirability is evident in certain biodata and Situational judgment items, where some responses will be regarded as more socially acceptable than others. For example, it may be apparent that an item addressing mmtiCUItural tolerance has certain response options that reflect socially desirable behaViOI'. Similarly, there are biodata and situational judgment items that reflect the typical expectations of a college student, and those with information about those expectations should not find it difficult to identify questions and response options that reflect deSirable behavior, such as attending class. Some individuals may have a natural inelination to present themselves in a positive light, and this tendency toward impression management is expected to vary across individuals, even when they are faced with the a‘l‘ai.‘ L.‘~"H Same questions and situational factors as others. These various situational and individual facets of social desirability all have a place in research on non-cognitive predictors such as biodata and situational judgment inventories. Having provided some background with the definitional issues that relate to this reseaI‘Ch, I will now address the literature on two tests that are of interest in a college admiSSion context: biodata and situational judgment inventories. Biodata Biodata measures comprise items related to the examinee’s background and experiences, and they have long been used as a tool in personnel selection (see discussion in Stokes, 1994, in The Biodata Handbook). While biodata items have demonstrated = ‘4‘” utility in adding to the information that we have about a candidate, they are not impervious to inflation. Lautenschlager (1994) discusses a selection of twelve biodata studies that support the notion that individuals may distort or dissimulate on biodata items, even though they are designed as measures of actual background and factual experience. Studies have used a range of participant groups, including job incumbents, jOb applicants, and students playing various roles. Accuracy in biodata has been Operational ized in different ways: correlational accuracy, level of mean differences, and absolute accuracy. Correlational accuracy, according to Lautenschlager, refers to consistency in correlation with an external criterion. Level of mean difference refers to the: variation in the mean score between participant groups in studies where there are difi‘erent Conditions for different groups. Absolute accuracy refers to consistency in responses at the individual subject level. Lautenschlager discusses twelve studies using biodata items that relate to faking, going back as far as 1950, providing a map of the development of research in this area. Early studies, such as that of Keating, Patterson and Stone (1950), capture accuracy of biodata by calculating the correlation between the report of the individual and report of the supervisor. Correlations were very high (as high as .98 for duration of employment). while it is possible that correlations would be high if everyone was inflating their responses equally, I suspect that in this case such strong relationships probably occur because of the clear verifiability of the information requested. Today’s biodata items tend to cover a broader content area, they may be clearly job-relevant, and at the same time be more difficult to verify as many experiences take place independently of the knowledge of others, making them more susceptible to inflation. Regarding the type of biodata items that may be susceptible to inflation, several studies are illuminating. Klein and Owens (1965) conducted a faking manipulation of a biodata questionnaire under instructions to respond as would a typical, creative research scientist. Under this maniplflation, this study found item-type differences, where objective items were less susceptible to inflation than subjective items. Doll (1971) extended the idea of item type by including continuous (Likert scale) and noncontinuous (multiple choice) items, and found that continuous item responses were more susceptible to faking. Also included in this study was a warning of the presence of an existing lie scale, which also reduced faking. A study conducted by Cohen and Lefl(owitz (1974) concluded that the MMPI’s K- scale could identify people who may be responding in a socially desirable fashion. Thornton and Gierasch (1980) experimented with an empirically keyed biodata inVentory, and they concluded that empirical keying did not result in the inventory being any less vulnerable to faking. Empirical keying, however, has since been demonstrated to be one effective method of limiting faking (Kluger, Reilly, & Russell, 1991). Pannone (1 984) attempted a different approach to identify faking — he included a bogus item that asked about experience on a nonexistent piece of equipment. This method continues to generate some interest as a means of identifying faking. Graham, McDaniel, Douglas and Snell (2002) provide another useful review of biodata studies that relate to faking and validity issues. They cite the study by McManus and Masztal (1993) who used Mael’s (1991) taxonomy to investigate the relationship of item characteristics and validity, concluding that historical, external, objective, and verifiable items are less susceptible to faking. Mael’s taxonomy has also prompted other = ..,... research to be discussed in more detail later (see Composition of items). Stanush (1997) conducted a meta-analysis to establish the susceptibility of self- report measures to faking. She concluded that self-report measures are susceptible to distortion under two experimental conditions: under instruction on how to self-present, or under a condition of honest responding (d = .64). However, inflation as a result of instructional set was larger than that explained by motivation, which Stanush uses to supp“t the argument that the validity of self-report measures is not likely to be negatively affected by real-life motivation. I question this conclusion, because it is difficult to capture all aspects of the effects of real-life motivation in a study, and evidence about mean differences is different than evidence for similar criterion—related validity. Stanush (1997) does find that biodata inventories tended to be more susceptible to inflation than personality inventories (d = .94 vs. d = .45, respectively; see also MCFal‘land, 2000). Studies of biodata accuracy have provided valuable information on 10 1'" L—rx the usefulness of these tests; however, the present study addresses the unresolved issue on the effects of inaccuracy in biodata on the predictive validity of the test. Miller (2001) demonstrates that the Conscientiousness Biodata Questionnaire (cBDQ), a biodata measure designed to capture the personality trait of conscientiousness, was readily faked in a study where participants were provided with a coaching session that provided trait-related information. While understanding the usefulness of biodata as accurate or inaccurate is important, it is also valuable to attempt to reduce inaccuracy. Kluger, et al. (1991) note that findings of response bias in biodata items have been mixed. They argue that empirically keying the item response options creates items that are less susceptible to faking, and also have statistical advantages. With empirical keying, it can be difficult for fakers to guess or know where they will lose or gain points with faking, whereas with item-keying, fakers can operate effectively once the direction of the desirability of responses on the continuum is determined. They note that faking good may also include presenting behaviors that are not socially desirable, but that are perceived as desirable in candidates for a particular position (i.e., job desirable). The anthers found that, generally, participants asked to respond as if they were real applicants for ajob yielded higher scores than the honest group, had lower empirically keyed scores, and more extreme empirically keyed scores. Gore (2001) also explores the fakability of biodata questions and notes empirical keying as a potential way to reduce response inflation in biodata questions. Considering that the biodata questions generated by Oswald, et al., (in press) are not empirically keyed and have response options that are largely Continuous Likert-type scales, similar to most personality self-report measures, it ll - ij‘th is worth investigating whether these items are more susceptible to inflation, as the present Study does. Situational Judgment Inventory Situational judgment inventories (SJ ls) typically comprise sample problem Scenarios for which the respondent must choose a response representing an appropriate course of action. A review addressing situational judgment items can be found in McDaniel and Nguyen (2001). Motowidlo, Dunnette, and Carter (1990) are cited for their primary study in developing situational judgment items. Items tap a broad range of content areas and dimensions, and they can vary tremendously in complexity. While they are most often presented written in paper-and-pencil format, administration can vary, and may even include the presentation of video vignettes (e.g., Lievens, Coetsier, & Decaesteker, 2000). Key characteristics identified by McDaniel and Nguyen are that situational judgment inventories are typically correlated with ability and experience, as well as conscientiousness, emotional stability, and agreeableness. While situational judgment measures are useful in that they can address a variety 0f SkillS, and they add to the prediction of performance, the method could still permit some level of inflation, depending on how the items are written. If SJ Is are found to be less vulnerable to inflation than personality and biodata measures yet also show adequate predictive validity, they may be more likely to be used in a selection context. Nguyen (2002) fOund that respondents to a situational judgment test for customer service selection were able to raise their scores when instructed to score favorably, depending on the thaSing of the test questions. When the items were phrased in a fashion that required a MOSt Likely/Least Likely response instruction, participants were able to raise their scores. However, when the items required a Best/Worst response choice, respondents were not able to raise their scores. Vasilopoulos, Reilly and Leaman (2000) examined a sample of job applicants, and administered a battery of tests that includes situational judgment items. All participants also responded to the BIDR impression management scale, and based on their total score, they were then dichotomously categorized as being high or low on impression management. Although not the focus of their study, f" Vasilopoulos, et al. found that in comparing groups with low, moderate and high job familiarity, the difference in mean situational judgment scores between those low on impression management and high on impression management grew as job familiarity grew. Mean differences in standard deviation terms ((1) were .18, .22, and .73, ‘ ""' respectively. These results support the idea that situational judgment items are susceptible to inflation due to socially desirable responding, and that knowledge about the .lOb for which the situational judgment test is written is valuable in gaining a higher score. However, this social or job desirable responding could actually be adaptive behavior that reflects useful knowledge. Now that I have described the characteristics of biodata and situational judgment inventories, I will discuss coaching as a means of improving performance. Coaching Where there is a strong motivation to do well on a test, such as gaining admission to COllege, people are likely to use whatever resources are available to them to do the best that they can, including test coaching programs. Coaching can be defined as getting outside guidance (White, et al., 2001), and such external interventions can improve scores 0 n Selection tests (see discussion in Sackett, Burris, & Ryan, 1989). Kulik, Bangert- l3 Downs and Kulik (1984), in their meta-analysis of coaching, conclude that studies of improvements in SAT scores reflect a d = .15 improvement as a result of coaching. For a range of other ability tests, they found a d = .51 coaching effect size for test-retest study designs, and a d = .27 coaching effect in studies without a pre-test. Ability tests are less susceptible to coaching than are some non-cognitive tests. Alliger and Dwight (2000), through their meta-analysis, show that coaching instructions and instructions to fake good both tended to improve scores on integrity tests (d = .90 with faking instructions, and d = 1.32 with coaching for overt tests, and d = .38 with faking instructions, and d = .36 with coaching for personality-based measures). As integrity tests are non-cognitive in nature, these results may generalize to those for biodata measures. Miller (2001) provided a brief coaching session to participants in a study of their responses to personality and biodata items. Miller’s coaching consisted of explaining and defining the Big Five traits, describing how the traits are measured, and providing sample items. The trainer also specified that the most important trait in predicting job performance was conSCientiousness, although other traits were also covered so that the participants were able t0 discriminate between personality characteristics. That study demonstrated that brief coaching resulted in significantly improved performance on the personality dinlension of interest in the study - conscientiousness — as well as biodata items tapping consCientiousness, along with post-training knowledge tests that were created for the Study. While tests that are pure g measures are stable, because of the stable nature of g, DeI‘Sonality-related tests are not as reliable. As biodata and situational judgment tests are Tl . . . . 0t Phre ability measures, but are instead based on personal experience and a broad range 14 of competencies, it is highly likely that coaching would produce greatly improved performance on these measures, as coaching could make salient the most desirable response pattern. This would allow the participants to modify their self-presentation strategy to maximize their score. Messick and Jungeblut (1981) found that score improvement was positively related to the amount of time spent in coaching, though there is also a point of diminishing returns, and there can be effective brief coaching. Messick and Jungeblut acknowledge the variety of training that is included under the “coaching” umbrella; some authors restrict their usage of the coaching term to mean practice on sample items and last-minute cramming, while others use it to mean special full-time instruction that could extend for months. The authors were examining coaching for the SAT, a cognitive ability test that requires knowledge and skills typically acquired over a long period of time. The effectiveness of brief coaching has been variable, dependent on the type of coaChing provided. The most brief of the SAT coaching programs evaluated in the MeSSiCk and Jungeblut (1981) study provided 30-60 minutes each of coaching for Verbal and Math sections of the SAT, resulting in a significant improvement in scores, even though the underlying skills required for high scores on the SAT are supposed to be acquired over long periods of time. Similarly non-cognitive measures are susceptible to brief coaching. Klubeck and Bass (1954) found that 30 minutes of training to improve perfoI‘rnance in leaderless group discussion was effective for those who were already high in leadership skills, but less effective for other individuals, while Petty (1974) demonstrated that 15 minutes of training could be effective in improving performance in 1e . . . aderless group discussions. Sackett, et al. (1989) note that exercise-spec1f1c training, 15 which is effectively training to the test, rather than more general skill development training, increases the effectiveness of coaching. The same authors also comment that there has been an absence of research on coaching on personality tests. Some eleven years later, Miller (2001) states that coaching on personality selection tests has received little attention, and found that an exercise-specific training program that included sample .w I- -I.‘ itezrns was effective in improving scores on biodata and personality tests. A 15-minute coaching session that included time to complete a learning outcome measure was L ‘I'AJ'.. suflcient to generate an effect size of d = 1.66 on the conscientiousness scale of the NE O-F Fl when comparing the scores of those given coaching along with instructions to 1'. 1 ° 9!.“ ' fake to those receiving control group training and instructions to be honest (coaching and faking instruction effect). The effect size dropped to d = .48 when looking at just the coaching effect, by comparing those who received coaching along with faking instructions to those with control group coaching along with faking instructions, to hold the faking instruction condition equal across the groups. Effect sizes for differences on the CBDQ varied by subscale, with the subscales measuring organization (d = 1.23) and attention to detail (d = 1.14) showing the greatest effect sizes for the coaching and faking instruction effect. The subscales of planfulness and deliberate/rational showed a Coaching effect size of d = .60. Cunningham, Wong, and Barbee (1994) conducted a study that included a Specific explanation of the content of the test, effectively providing very brief coaching on the underlying rationale of an overt integrity test. This coaching was provided simply by providing written statements about the nature of the test prior to the test admini stration. For example, the following statement was used to provide information r .0 {13" :wil5‘ 'KL’q ‘V' ” \flhidadc C... Is. v.) 1 _, l |..| L‘. NLK‘AA‘A o ,.: «<1 ‘1 "‘ r. 5)»..‘8‘ 465‘ . - ‘ :‘C'. if 311:1 b q I I)‘ ~ ~« ~ ~ -0 .L.v1.>t'>CC UL ‘u— :kl-m;} “uni: s 'l"-" i .- i in) ii fl I ..c _v "‘ s s - .i . '\ Pity‘l‘: 0L "“u “M 58. u I '- ¢ 4 -.,._ +‘ 3&1ch if C+ : "lama-7?" ' :l‘. \iAALij‘ in NE... ' V . ‘Ntii .mwrl)\ 7) l' ' \' 1 s. .3" ~. ‘ ‘N‘ \ I. :. ubtk Lu) DCVI' E'H'I-v ;.' . 'WL .V.. “in ‘ .,~. ~ ‘v- .“-\. 4') - . . m... x -“ ‘5' . a d ‘1 ‘ u about punitive questions: “For example, honest individuals tend to have relatively punitive attitudes towards themselves and to those who commit crimes. They are more likely to recommend punishment than forgiveness for those who commit crimes. You may want to keep these ideas in mind as you take the test” (p. 650). Similarly, for proj ective questions, participants were coached, “For example, honest individuals tend to pro j ect the image that they are honest, and deny any temptations towards dishonesty. They also see other people, both fi'iends and strangers, as being as honest as they are. You may want to keep these ideas in mind as you take the test” (p. 650). This written form of coaching was effective in raising scores by about 10%, and demonstrated that specific information about the test led to improved performance in ‘ "*"" related areas of the test (in this case, information about projective tests leading to greater improvement in projective questions, and information about punitiveness leading to gr eater improvement in punitive questions). Considering that coaching could make the dim'ensions being examined in a biodata or situational judgment test more overt (as aCCOI'nplished by Miller (2001) for biodata), and overt tests of integrity have been shown to be more easily faked than covert tests (Alliger & Dwight, 2000), relatively brief (:0 acT'hing on the dimensions captured by biodata and situational judgment items is likely to be effective in raising scores on these tests. Another issue that needs to be examined is whether there are differential effects of coaching for high and low ability applicants. Kulik, Kulik, and Bangert’s meta-analysis ( 1 984), for example, found that effect sizes of practice on aptitude tests were greater for higher ability (d = .82) and middle ability (d = .40) samples than for low ability groups (d \ \ ‘ 1 7) on identical tests in a test-retest study. Also, ethnicity may have an effect on 17 ‘0'. \ u b '1 ' O -.-‘,"'"‘L‘ J 5‘“ V“ .1- $3.... -I l' ”I "u . ‘ . jbunu‘“\ v“"""fl REF)" - mu ' D “‘blnd“" w .l . qu'c - ,l 1"”? .‘AEJD ' . I p...” v ‘ 9 'r‘ '.'§"".?J-* "" . .h...~.~\.ulb\o “as: O - --O‘ 1" ‘L‘V‘ l kvé.‘ tL‘ SK :5.” v 1 . lifl”\ - "a1 vb ‘..b.\-db .- “in.“ V l-lill C03; . Q . ' ;"'P.v in . iL-y.....: _“ D‘ ‘ also ' . ‘ ‘ g" \h P‘ .....5 .33“. En in: . 5n"; - . u‘udl' Ad”. ' :~ ‘IIibblk‘ -“'—“ win. a \ , . K ‘uv... v 4,. . yv- . funk“; ll 5‘. v ‘ '\ I u -5 . II 1 u“. y \ C l- km a 4“n9 ~\ . ~URHC\lj t Lu. l co aching effectiveness, namely non-Whites may be more likely than Whites to be in the lower and middle-ability groups, and may be less likely to improve as a result of coaching. Ryan, Ployhart, Gregarus, and Schmit (1998), in discussing test preparation programs in a selection context cite a study presented by Holden (1996) that demonstrated that coached Caucasians improved more than minorities. Differential access to coaching along with differential results based on ability or other individual di fTerences might exacerbate the problem of adverse impact in college admissions. That coaching can be effective is cause for closer examination of the effects of coaching on biodata and situational judgment tests for use in college admissions. While some research has been conducted on the coaching of children for educational selection (e-g-, Bunting & Mooney, 2001 ), and there has been an examination of coaching for college admissions tests such as the SAT, as l have discussed, and MCAT (e.g., Jones, 1 98 7 ; Jones, & Vanyur, 1987), there is a dearth of work on the coaching of students responding to biodata and situational judgment tests in a college admissions context. As coaching would probably become widely available if such a test was to be used in a SeleCtion context, it is important to consider changes in its predictive validity, especially as there appear to be differences in students who undertake coaching. Those who seek coaching on the SAT I tend to be from more affluent families, have parents with more f<>tTrial education, and are more likely to be Asian American (Powers & Rock, 1998). one would expect that as coaching became available, scores of those coached would tend to be elevated, and the predictive validity of the test may be reduced over time. 18 .. \i )0" CORN.“ mt" '9’". '4'” if 3.2-)»- we _;. . ~ n‘“ QLL Q’s LOAA)\1C . ‘17 ac; '13: WEI; ca: wan" Q. ' ”'5. ‘ 9"" \ i ”A ~0..\£\‘ - U U mgr». h 11 h 1 w‘...>\.erlnil'».\A-L‘ exp-Ejulipl‘yc ”,4 ,s..,..i\.lu‘.. .s\ r v ‘5 1"") as, l:,‘. .J . v .. ' :bhilhri.hlrl r71 Hi. T int and r a . -. I l . .,' ‘ \.-L.‘.lu or; DC’V . .. J - «T .m .x_, t. t “ C \. RE! 3 I»;\‘ ‘4 Q. ‘. V»u A “‘J '2“- -~\ P Individual Differences in Faking Conscientiousness is linked to achievement-strivin 7, and extraversion is linked to status-striving (Barrick, Stewart, & Piotrowski, 2002), which suggests that individuals who are conscientious and extraverted may tend to be more motivated to perform well on tests that will earn them a symbol of achievement and status, such as getting accepted into college. Not only are there possible individual differences in motivation, there may a1 so be individual differences in ability to inflate responses, as emotional stability and co nsci entiousness have been found to be positively related to socially desirable responding (Ones, Viswesvaran, & Reiss, 1996). Studies that have been conducted on faking in biodata have been criticized for making the assumption that everyone in a faking condition inflates their scores to the same extent (e. g., Becker & Colquitt, 1992). MCFarland and Ryan (2000) argue that there are individual differences in response inflation on personality-related measures, and they conducted a study examining i1"ICIiVidual variability in faking across non-cognitive measures. They conducted a study Using a sample of students with honest vs. fake instructional sets as a within-subjects factOr. The order of instructions was manipulated as a between-subjects factor. The respondents completed an integrity test and biodata inventory under the manipulated irzlStruction conditions, and then all respondents completed a self-monitoring scale. P articipants in the faking condition were offered a financial incentive whereby the top 1 5% in test scores would receive $15. McFarland and Ryan concluded that the I>erSOnaIity characteristics found to be positively associated with faking on noncognitive leaSures include integrity and conscientiousness, while neuroticism was negatively 19 .- - oqnxfl’? \"" 3. lm ‘ yam-w ~ ,. T o 01’1"” K".'\\ I ~03“ .x-.l... 1“ I ‘1‘. 0-1,) \‘r‘ '.4\. .JE \k-l‘u Q . I u‘.. «a o. 4 _'- ’ \H. ...'..'l .5; ~ , ,‘ 1 1' met ".v ,4 .- ‘,. ‘5 .‘Ji...|.L.«14 9.. I»: l f : Doggy-r-léi‘ ; ..v}\l..-Aul\ n. J I. P'ta‘i» ‘ uC.r. .U "o ~o . . ~ «.33. {'03 if I ‘ i l" a. . “I“ 4 . ‘ “midn‘ 11 p., .. ’ “.- «.1,» \""\' »-H-u.b§ I Mersman and Schultz (1998) suggested that the ability to fake is an independent construct. They conducted a study with a sample of students who completed personality, gel f—monitoring and social desirability scales, and the tendency to fake good was captured by establishing consistency with Saucier’s (1994) Big Five Mini-Markers under honest arnd fake conditions. Mersman and Schultz found that faking good was not correlated with self-monitoring, impression management or self-deception, and conclude that individual differences in faking capacity are unrelated to those constructs of self- presentation. I view this conclusion with caution as in this study, the participants who were told to fake did not have any real motivation to do so. As with many studies, the honest respondents may not have been providing responses that were entirely without = NH" inflation. The combination of these two factors in this study may have minimized the relationships between faking and individual differences because the gap between fakers and non-fakers was restricted. It is apparent that people do not all inflate their responses to the same extent, When required to fake, and there is at least preliminary evidence that conscientiousness and emotional stability are related to faking (McFarland & Ryan, 2000; Ones, Viswesvaran, & Reiss, 1996). However, further study contributions on what particular trait or pattern of traits make some people better able to fake than others would be helpfial. Having discussed the relevant literature on biodata, SJ Is, coaching, and individual dlfl‘erences, I will now consider inflation as it relates to validity. 20 43.1 -‘H‘. ,- '1; i' l '. 1....\... 1 A. V . n...b¢a‘- sL l :1; - ‘1' n1 , . Qt ~LIV~IA AAAU ' I ~o . _ ’_ "ii‘ 0". fib~n.u‘\ I. ut‘u T“. r 0; t? , ‘ “U‘ 1‘: ‘.u ‘4. VOTE 7. l" ,' Effects on Validity It is apparent that the high value that applicants to college place on college admissions is likely to precipitate motivated responding on college admissions tests. Regarding ability, self-reports of ability have generally been found to be quite accurate across a number of research studies (Mabe & West, 1982), and with g—loaded tests it is di fficult to inflate one’s scores beyond one’s ability level. Noncognitive tests, however, are more malleable as the best response can in many cases be guessed. Zickar and Drasgow (1996) acknowledge that as tests are fakeable, it may be useful to focus on how to recognize the fakers after they have taken the test. One way to do this is with detection methods that identify patterns of inconsistent responding. They used two samples in the AB LE dataset from Project A. Polytomous item response models were used to score responses, and IRT model fit was examined. They then computed person-fit indices to idel‘lti fy possible fakers. They found that fakers who had been coached were easier to deteCt than those who were simply ad lib faking. They examine a theta-shift model to idellti fy fakers, conducting statistical manipulations to test for the effects of faking on Vali<1i ty. Their results show theta differences between honest responders, ad lib fakers, and Coached fakers, and they found larger effect sizes than shown by Ones, et al., Snggfiesting that faking may have more negative implications for validity than has been demonstrated. However, they did not report correlations between personality scales and cute()mes under the different faking conditions, so the issue of criterion-related validity Was not addressed directly. Some authors claim that construct validity is not changed by score inflation. Smith and Ellingson (2002), in comparing a sample of applicants and a sample of 21 .. u...” ixqo'l'fit‘i.‘ 15 :r UN.“ wk 5“. r “.34...” J v 0" ' ‘ 9' :M .1. outnuunbu 2" "V‘.7" ‘ ‘1 £9.10“ urpnkh Jig. ' O. ‘39.‘ f) ' ‘¥.EMAA1 tint A ”Cr‘hi .AnJl‘re. :‘Jdt’féi' I ‘l ‘4 “:- 3‘2“. W‘ ”‘4‘ u.» mum Ll 11.1 train, 1 ' i '31 ‘l .W“ .Nmuui . I 'Uiu.. Ti." ‘ . u.“ .I‘ 1!” 0‘“ ‘9Ai’ p c -U. : 1 "”9... molut§ “a, I - \ L 1'. ‘- ‘ ‘ . ¥4~\::|‘-‘I' _‘. a“, '- 3535. . . .h. . ‘~'.‘- N. 1 'hi in“- ‘|\.§i . students (non-applicants) found that applicants did not simply inflate their responses across all scales, and the relationship between social desirability scales and personality measures did not change significantly across samples. They concluded that social desirability is trait-based rather than situation-driven, and that construct validity is not attenuated by inflation. However, as there were slightly different patterns for applicants and non-applicants, where applicants scored higher on some positive dimensions that were different from the dimension for the non-applicant group, that conclusion may be premature. Anderson, Warner, and Spencer (1984) note that there are mixed opinions about t '- lir‘ the effect of inflation due to socially desirable responding. Inflation can be regarded as a healthy adjustment to a specific situation, and some researchers have found that inflated Scores due to socially desirable responding do predict performance in certain jobs (e. g., sales roles). Some researchers provide evidence that faking has little effect on criterion- rel ated validity. Barrick and Mount (1996) conducted a study examining the effects of sel f‘ecieception and impression management on the predictive validity of the Big Five. USing regression analyses and latent-variable modeling, they concluded that, in two 8311113 les of truck drivers, conscientiousness and emotional stability were valid predictors of Voluntary turnover and supervisory ratings of performance, and that the predictive Vali(lity of the personality measures was not significantly negatively affected by cornirolling for self-deception and impression management. They do note that adjusted Va‘1i<1ities were generally slightly smaller. Ones, Viswesvaran and Reiss (1996), in their metEl~analysiis, conclude that correcting for social desirability does little to increase the effe(:tiveness of Big Five measures as predictors, however, it would be inappropriate to 22 ' n 1.. ‘1'“ l 3‘ 3‘.) 5mm "‘ w‘ A.” . . ‘ in“ at a narrk‘ . I i "H . l 9 9' - "LC?” Ara-116‘ U‘ V" j 5")“ 3 \AAQ'JMJJ. . i l‘. . 3-7:: at :ha. I Some rc :. as next j: -— 1;. .... ~ #:1313503 DC' ‘ t . , and _ i Eada~1& £11.1r‘ \ 'V N. 5 A . i’”.*;f- \ ‘ .. VAN-fi‘l. . ‘ ‘1 2‘, i__ . ‘ u ("7; 1 - «u tornam ‘ “rwtot ;“i:q"l'n~ V ‘ .1 “NW‘JAD. “is: n “CO-fin: 1 ‘ ! .‘U‘i-fi ‘- “i\a,\y ‘1 u ?. AIL”. l ‘ l broadly claim as a result of their findings that faking does not affect validity. Their study 1 ooks at a narrow perspective — the use of specific social desirability measures in their model, rather than measures that are more situation- rather than personality-driven. Also, I suspect that social desirability scales do not fully capture the extent of inflation, partially because these measures, too, are susceptible to socially desirable responding (Vi swesvaran & Ones, 1999). They are also correlated with conscientiousness and emotional stability (Ones, Viswesvaran, & Reiss, 1996), and they do not appear to capture all that explains score inflation. Some researchers are more suspicious of inflation in self-reports of skills and abi liti es, viewing the inaccuracies as a real problem that may make the reports invalid Predictors of performance, and less useful as differentiators between applicants (e.g., Smith. & Ellingson, 2002; Topping & Gorman, 1997). According to a recent paper by Graham, McDaniel, Douglas and Snell (2002), “For biodata, the degree of prediction is likely enhanced by the accuracy of the self-report information” (p. 574). They conduct a Study comparing responses to biodata items under honest-responding and faking-good conditions, with job performance as the criterion of interest. They categorize biodata items according to various item characteristics, and demonstrate that criterion-related Validity of items with different characteristics varies across honest and faking conditions. For example, they show that for items rated as verifiable through hard records, validity for tl'le honest condition was .16, but .02 for the faking group. Items rated as verifiable through supervisors or coworkers showed validity coefficients of .12 for honest resDQnders, and .03 for fakers. It is apparent from this study that the item characteristics as Well as the situational instruction set are related to validity. 23 ( _A\ o Q‘I—-s§)fl ' . ‘».s.;..‘ 5. ... , ,0 !‘ n- ‘\ ‘7' 6 J, L... .. .r-D' \~, U bob ..¥u‘. . I]. .‘ -»---«~c ' l'.._' '“l 4....-. v «- --.--ii0' flash. .3 .4. y ‘M r‘.1¢. ' \ ”vs-Euru. .. . . n0~ 4 ‘Ipyukmasb .. . F‘fi ‘ \jno . ~>J¥m46 t . . . ‘.\ g i V ,. h -. p. 4 ' ‘Asn. ‘1le h. hug». -‘ “a‘.. u 9. ’-A\0\ ’b}\ C l.‘ . \ lg“, lu‘ a..x, ‘¥.l 'lr ‘— McFarland (2000), who found that the criterion-related validity of conscientiousness was not significantly affected from a statistical perspective, stated that the .05 difference in validity between the honest and applicant conditions might be of practical significance in a real selection context. Stark, Chemyshenko, Chan, Lee, and Drasgow (2001) cite disagreement about the effects of faking on the validity and utility of personality tests. They look at both trait and situational perspectives in faking, and using the Sixteen Personality Factor Questionnaire (16PF), in a sample of applicants and non- applicants, they showed significant differential item functioning (DIF) across groups of applicants and non-applicants, where the items appeared to operate differently for applicant and non-applicant groups. They conclude that the construct validity of the 1 6PF is negatively affected by faking among applicants. Ones, Viswesvaran, and Reiss (1996) note that job applicants may be faking in a j 0b desirable manner rather than a socially desirable manner. Presenting oneself in the best possible light in a college admissions context could be in a job desirable fashion, as Well as in a socially desirable direction. In the college context, the issue of job-desirable bel‘lawior is limited by the individual conception of what is valued as effective college performance. While some may consider job-desirable college performance as academically focused behavior, others may recognize the broader goals that include leadership behavior, interpersonal skills, multicultural tolerance, artistic appreciation, and so 011. Such differences in perceptions of the criterion space may affect individuals’ per(heived job-desirable behavior, and possibly also the criterion-related validity of the tests- Coaching is one way to provide information about what is job-desirable behavior. 1\’1iller(2001) demonstrates that those who receive coaching are far better equipped to . I ' . 0h ' F 9‘.“ i" :C L “5‘9 “A . . r -v. -?-\ .‘.b _C\ . . ' -. l 3 ~. .,.,_ .. “jag“ LAI- b . j ‘ {ix-“N ,. ~5.....\ l' a ' H i'” 'gns‘. _ -."“‘.tlll l\ ‘ .4 ‘31.:‘hv‘. . I “'W 4'. \dl: ‘(alk' “-14. ‘Hl'b‘ A‘ . "‘k Hal 1" perform well on personality and biodata items, with those who received coaching rising to the top of the distribution of performance scores, and those without coaching falling toward the bottom. In a top-down selection system based on test performance, coaching would have had a real effect on who was selected in this context, by filling the upper bands of performers with individuals whose performance has been enhanced by coaching, while those who have not had the opportunity to improve their performance through coaching might fall into lower performance bands, and find themselves not getting selected under a top-down system. Faking is apparent in testing in many selection contexts, and if faking as a result 0 f coaching affects test validity, it is essential that steps are taken to discourage faking, identify it, and correct for it. However, score improvements after coaching could be a I’eSLIIt of an artificial increase in the score, or a real improvement as a result of task familiarity (see Sackett, et al., 1989). Our goal is to have admissions tests that have both c0flstruct validity and predictive validity. Students are motivated to perform well to gain admission to college, and if biodata and situational judgment tests are to be used in an adnli ssions context, we need to understand better how they perform under coached conditions. Having noted important areas of the literature on inflation and possible related proIDIemS, I will now discuss ways in which researchers have attempted to control infl ation. Inflation Controls As I expect that faking is possible with non-cognitive measures, and likely in a Slhlation where there is a motivation to perform well on those measures, it is pragmatic to 25 * . ’ «A? I "7,” mo ‘ ‘1th fitwn- -- ' ’ .0, \‘ ”afifl‘.r 1‘ .5.\..->ub) \\ 1"" Will; . l .1 "“7 ‘f‘ . r “Jung ‘klll “Um V '9 an 39". 9‘ 51.n3 .i.‘. c >7) " (I ‘ I . 3 ... ,, , ‘ , .. Jud“, ,3: n ‘ V "v 4,... a _. ,;‘ y m ul)..L‘..C§. '. : lII :~ ‘ 'h'“ All GUT . 8;. nM‘r' .J . Mutt, l: t" try to prevent faking in the first place. Three possibilities of faking control are the use of warning statements, the requirement of item elaboration, and the consideration of certain characteristics in item writing. Warning Warning statements may be a practical and effective way to manage inflation on biodata and situational judgment inventories. Dwight and Donovan (2003), in examining faking on noncognitive selection measures, found that warning statements were effective in lowering predictor scores. They demonstrated that by providing a warning that included the risk of verification of dishonest responses as well as a negative consequence for di shonest responding, the least faking on the California Psychological Inventory (CPI) Would result. They showed an effect size of d = .75 on the CPI scale of well-being, and d 5 c - 6 l for dominance, when comparing those who were not warned with those who I'ecefived an optimally effective warning. Hough et al. (1990) also emphasize the importance of the use of warning statements, stating that the use of warning statements ‘Warrants greater attention as a method of reducing the amount and effect of intentional distortion” (p. 582). In a sample of employees, Becker and Colquitt (1992) included a WEI—thing statement that responses to biodata questions may be verified with other sources. They found significant mean differences between a test-taking group with no warning, a fakin g group with no warning, and an applicant group who was warned. Vasilopoulos (1 999) used a warning of response verification in a study of a selection system that inclllded personality and situational judgment measures, with a resultant mean drop in SCOFes on three of the personality scales for warned respondents, but not on the situational judgment scale. Warning statements are likely to be most effective in limiting faking ,3. ."> . \ ' . :f ‘b\ {33\ fl "5.. u. 1 1 \ R 70"? \“L. -,...A.- '1" v :«oa RY m 'I 3-.....8- .AAU v v .o .1 I Q . u§na in 4‘ 9 ‘r t ”a. .ub...‘ yOAAI‘ A 9 s «“m . u .t' “m n..‘, v I ~ -. “"4-,‘. ' ‘ , . tutu...“ A“ v " ha‘ ' i f 0.... ’ . V -‘ -.c I: I‘: v I :‘f'dr' -L. nuy, 3C ‘iLV‘F . 93‘ when they include both a warning about the potential identification of faking, and a warning about the potential consequences of faking (Dwight & Donovan, 2003). While a warning may be effective in reducing score inflation, the effect may vary from item to item due to varying item transparency (Kluger & Colella, 1993); hence consideration of item composition is important. Composition of Items Zickar and Robie (1999) conducted an item-level analysis of faking good on personality items. They criticize the focus of the research literature on scale-level analysis, and identify the need for item-specific examination as people “respond to (and fake) individual items, not scales” (p. 552). Whether or not a test is fakeable depends on the composition of the items. Becker and Colquitt (1992) note the discrepancies in f‘ltlciings with regard to biodata items being faked and posit that the differences have to do With the type of items. They examined a group of employees who were instructed to either be honest, or fake good on a biodata instrument, and job applicants, who completed the instrument as part of the application process. The authors concluded that the form Was fakeable, and that those who were faking good or were real applicants did have inf] ated scores compared to the honest condition. Items were examined for particular characteristics that may make them more or less fakeable, and using the framework of M331 (1991), they found that items that are more likely to be faked were less historical, obj ective, discrete, verifiable, and external than other items. They were also more releV'ant to the job. (Mael’s taxonomy is shown in Appendix A.) Similarly, Elliot, LaVVty-Jones, and Jackson (1996) found that responses to objective tests of personality Were relatively unaffected by instructions to fake. Dwight and Alliger (1997) conducted 27 nae: . ”1‘; .g- ‘\ ‘\ lo...- 1,? n) i O- -;_._‘q u. . §\~. o o.‘.. 9' “f h..*._.l,;, ‘~.s.) a. 3?! '.~M.‘_. r 1“ ”El“ “ 3"; Wu. ov‘u.“; V ‘ t .‘h I‘Z‘il ‘ *u ‘7" ‘ 1.. \\i u N» 4‘.“ . . P‘!“ V “ “‘9‘. ‘ t“ V s ‘ ‘ l l.x 4‘“ .(‘\ n‘._ ' s‘;\ H..- “—‘ » fa. “‘ \sA _ _ \ '\ a study of ratings of individual integrity test items, finding that the perception that an i tern would be easy to fake was related to the job relatedness (r = .50, p < .001) and i nvasiveness of the item (r=.25, p < .05). In their meta-analysis of the susceptibility of integrity tests to coaching and faking, Alli ger and Dwight (2000) conclude that the overt tests were more susceptible to faking and coaching than were the covert tests. Biodata and situational judgment items should be less fakeable where the items are more obj ective, verifiable, and not clearly related to college performance. Mael’s (1991) taxonomy of biodata items provides further ideas for item characteristics that may be used sensibly in biodata items, including equal access and being controllable. Respondents without perceived access to the biographical experiences to which an item re fers might be less likely to inflate their scores than individuals who have had such experiences. They may view such experiences as completely beyond the realm of possibility. Similarly, individuals may be unlikely to inflate responses when the issue is one in which they have little control. Graham, et al., (2002) examine biodata validity as explained by item characteristics using Mael’s (1991) attributes. They concluded that item attributes associated with validity are different for faking and honest respondents, leaV’ing the authors skeptical about the possibility of writing biodata items that are valid for both fakers and honest responders. For honest responders, item attributes most highly as’Sociated with validity were those that were controllable and concerned with the indi\/idual’s feelings or attitudes (r = .22), verifiability through hard records (r =.16), and Vefi flable through supervisors or coworkers (.12). For fakers, items that were verifiable throUgh friends (r =.11), controllable and concerned with actions that the individual chooses to perform (r = .07) were associated with validity. Schmitt, Gillespie, Kim, :2”-'7'\ ‘. “a-“ ‘ I, 1 ll ~r‘ r )“ £4; ts$.. ’1 ‘1 t ,L -‘w. . “laud“- \. Q \- ‘s‘. ‘~ "\ 3&1”... A 3f‘,a)",3" u..\u.t\lr n 7..., b - «aka... a . (1: Ramsay, Oswald, and Y00 (in press) found that biodata items that were more objective and verifiable were less correlated with the participants’ BIDR self deception and impression management scores. Requiring elaboration within biodata items is another method that may reduce the l ikelihood of individuals inflating their responses. In requiring elaboration, the test item specifies that the respondent should provide examples that reflect evidence of the level of experience that they indicated, rather than simply indicating the level of experience. An example of a biodata item with an elaboration requirement is shown in Table 1. Table 1 Sample Elaborated Biodata Item The number of high school clubs and organized activities (such as band, sports, newspapers, etc.) in which you took a leadership role was: a. I did not take a leadership role b. 1 c 2 d. 3 e. 4 or more If you answered b, c, d, or e, briefly indicate up to 4 clubs or activities and the nature of your role. 29 .3. H... A»-.. r. " 'A‘ . "' Abhn in." _‘ n ’P‘ . x 5:1" I Schmitt and Kunce (2002) found that by requiring item elaboration in a biodata test, they reduced mean scores by .7 to .8 standard deviation units. They also found carry-over effects of score reduction in nonelaborated items in the same instrument. By requiring elaboration on biodata items, it should make the items less likely to be faked. This effect of elaboration on biodata items was found again in a more recent study by S chmitt, et al. (in press); however, the carryover effect was not found. Schmitt et al. (in press) showed a difference in mean scores of .8 standard deviations between elaborated and non-elaborated items, where elaborated items had lower scores. It may be valuable to require elaboration to biodata items as a means of limiting faking in elaborated items and possibly, through a carryover effect, in non-elaborated items. However, one should bear in mind one possible explanation; the lower scores as a result of elaboration could be because the individual involved simply can’t remember the details of the biographical eVent, and they may limit their response to examples on which they can recall enough detail to elaborate. While attending to specific item characteristics when generating items is an important way of discouraging inflation, it is likely that some individuals will still inflate their responses, and I will now discuss the identification of inflation. Statistical Control using Social Desirability 0r Bogus Items While I expect some score inflation to take place, it would be useful to be able identify those who tend to manipulate their responses. Inflation identification and control DOSsibilities include the use of bogus items as a lie detection system, the use of scales that capture impression management, and the use of indices of improbable responding. Once faking has been identified, it is possible to statistically control for it by partialing out the 30 “ 0' ‘1. ,. . 5.3-3 V‘ ‘a I ' - '2 s .W‘aL. ' ‘ I. . wio'Hx'; ' H‘»\,|-~n ..7.‘ 7 ‘.'J \ .- ’V'H [.3 ' S >I-L H. h \— flil. A .. ‘~-.s.. 33.1...1' . . “ .-'}l. 10o,, si¥u shall. 3"?!” I. “t.'\. .6.“ "L .3 .- « - . K3,. 3\\.\. - ‘ v.3, 'nhn. . . ,1 b‘ 4.1-“; .1 . “Paw-1‘4 "" duct 5.. .. firm», .L\.l.) “6.. "‘fl.:‘-“‘ My .15) x.) .; .- . “i" \‘F. ‘ ‘L‘N. Q.- effects using the various faking indices as covariates in estimating the validity of biodata, SJ 1, or personality measures (see Barrick & Mount, 1996). Bogus items It is apparent that individuals do inflate responses on certain tests in certain contexts, and that if one expects that faking will happen, it is practical to plan for it. Using items that capture the claim of impossible experiences is one way to identify faking (e.g., Alliger & Dwight, 2000; Anderson, Warner, & Spencer, 1984; Pannone, 1 984). Anderson, et al. used such bogus items interspersed among real job task items in a test of a job applicant sample to establish whether applicants were faking. The level of their affirmations of experience in bogus items was used to adjust downward their experience score on real items. This was justified by the correlation between the inflation Scale scores and examinations for the particular occupational classes. They concluded that inflated scores are pervasive. In their sample, 45% of the participants indicated that they had observed or performed at least one bogus task. Anderson, et al. also concluded that the inflation scales had high reliabilities (average alpha of .86) and that the bogus items were useful. In a secondary component of this study a sub-group of the applicants Were also examined on a typing test after being asked how many words per minute they c0111(1 type. The authors found that applicants did inflate their self-report of typing skill, and that this inflation correlated with the scores on the bogus items in the more general S'kills measure. Correcting for inflation increased the usefulness of the test in predicting the criterion. Alliger and Dwight (2000) generated bogus items similar to those used by Anderson, et al. (1984). They state that the bogus scale is “a measure of fabrication, the demonstrated willingness of the person to create information about themselves that has 31 q. \fi .~ «.v‘f‘») \ l- J. LUJ‘W“ ”:14J"..lrll ‘. *“$:~. near : ‘I .9 Sue grew A as u “‘ b r — h "‘ is“: He Laps ... "v V 35»; 3L3'ru <- «w .UMJ. “.I{;.1‘t 1m v‘v! .evly Aliyn‘ . . ‘12- 44. . um. Lu. or. J’R‘Vm‘ ) 4'. llbt Lu -. .5..,'" A ’3') ‘ l- «. " - hire-53. SOLE; cl... ~ "' 1'10. Dakx 3;“: 'b\ I», . I 1 'Al““\ “1 suuL 2’13 H.." no connection to actual experience, rather than a measure of subtle faking (e.g., exaggerating positive attributes)” p. 10. They found that the optimal warning condition had the greatest impact on scores on the bogus item scale (d = .41) and suggest that bogus items are tapping a specific aspect of faking — the tendency to fabricate information. Social Desirability and Improbable Response Indices Social desirability scales have been used to identify those more likely to present a positive impression. Hough, Eaton, Dunnette, Kamp, and McCloy (1990) agree that people can distort their responses, having examined personality constructs and the effect Of response distortion on their validity, with four response validity scales that they created: social desirability, poor impression, self-knowledge and nonrandom response. The social desirability scale was patterned after other unlikely virtues scales, to detect those trying to appear more attractive, where the areas of experience being examined Were education, training, job involvement, job proficiency, delinquency and substance abuse. They concluded that people can fake when instructed to do so, and that the reSlxmse validity scales detected the distortion. Ellingson, Sackett, and Hough (1999) attempted a correction using a social desirability scale score and found that the corrected score was ineffective at improving validity. In their within-subjects design across faking/non-faking conditions, they used the SD scale to correct faking conditions to see whether scores approximated non-faked Scores. However, Paulhus (1984) emphasizes that the situation determines whether both c0Il'lponents of social desirability should be controlled. As self-deception is viewed as a Stable trait, there may be little value in controlling for it in examining the effectiveness of a test that is being used under specific situational constraints that are affecting the inclination to provide socially desirable responses. In such a case, it may be more relevant to control for impression management to reduce the effect of conscious faking. Controlling for social desirability with a scale such as the Marlowe-Crowne may not provide useful information as the scale is made up of both factors in social desirability; impression management and self-deception (Paulhus, 1984). Barrick and Mount (1996) examine the effect of self-deception and impression management on the predictive validity of the Big Five when looking at turnover and supervisor ratings. They found that distortion of personality constructs occurred through sel f—deception and impression management, but that the distortion did not negatively aff‘ect the validity. Christiansen, Goffin, Johnston and Rothstein (1994) attempted to = W" correct for inflation on personality scales by partialing out the effects of the inflation scale, however, they were not able to demonstrate that such a correction affected the Val idity of the personality measure as a predictor of performance ratings. Other researchers, however, have found that criterion-related validity and selection decisions can be improved by correcting for faking. Hough (1998) used results 0 f at) “Unlikely Virtues” scale to either adjust the individual’s score or to remove them from the applicant pool, finding both techniques to be effective in reducing the impact of See re inflation on hiring decisions and increasing the predictive validity. Anderson, Warner, and Spencer (1984) attempted corrections after using bogus items that are true lie scores and found that corrections improved the test’s usefulness as a predictor. Becker and Colquitt (1992) corrected the scores of their motivated group (the app] icants) by using an inflation index. The index was created by calculating the mean d1 ff‘erence between scores for applicants and non-applicants. This index was then used to 33 "‘ 'm ‘7‘ "‘V ‘WIV‘ . I ‘ .. ~ on ) 4" l ' 1| Burl-Oil). “It l stormed sc- rim \in . 5 it sane cucr scores to test v .‘..,..,' 'hch «. .2 I ”luv “A"; b ‘H _l .3?“ ”I, ‘9 no “l“kl-chfid .'\i:h&._ adjust the distribution of scores. When corrected scores were used to make hiring decisions, the correction changed the rank order of candidates to the extent that 17% of the candidates were faced with a reversal of the hiring decision made under the uncorrected scores. This correction effectively neutralized the inflation, however, one concern with such an approach is that it assumes that everyone has inflated their scores to the same extent. Christiansen, et al. (1994) used the Krug (1978) approach to correcting scores to test whether there would be different hiring decisions after correction. They found that without correction, discrepant hiring decisions were made in 16% of the cases at a selection ratio of .15, and they also note that overall, candidates moved an average of 2 3 positions in rank (SD = 36) when corrections were used. Rosse, et a1. (1998) also d emonstrated that by inflating scores, applicants may be able to affect who is hired. Although there are mixed findings about the effectiveness of corrections, they may be a practical way for test users to examine the effect of score inflation on criterion- rel ated validity. Such examination will be important if biodata and situational judgment items are to be used in a college admissions context in predicting student performance. Study Development To summarize, traditional measures such as high school GPA and SAT/ACT Scores are used to make college admissions decisions, and tests such as biodata and Situational judgment could also be used (Oswald, et al., in press). However, it is apparent that biodata scores can be inflated, although they may not be as susceptible to manipulation as are personality scales (Sisco, 1999). Situational judgment scores can also be improved by faking (Nguyen, 2002). Performance on these measures is moderated by the situational factors of the conditions under which the tests are taken. 34 :1 Hi?“ N “it," f“, \ ..- u- v . . «H '3 i‘ a» ‘ gas- - .v.’ .33.. u»...‘\..».. - a 4 '1'. “\u u . ‘ —v "-- JML'L t“; . ii’j- g“, V“‘-\~.A. -‘ ~“‘n 1 u\"' Q- :‘uw. ~ ‘ 'l ._V ‘fil 3: r V .,. - h. “l ‘1’”— u‘ ‘CEH \‘W- . «m, 4 w b L ‘ l. .A‘l “ Being motivated to perform well, experiencing coaching to be able to perform well, and receiving a warning statement about not exaggerating responses in an attempt to perform well all may contribute systematically to the score that an individual achieves. These factors are expected to interact, with some combinations providing a stronger likelihood o f high scores than others. There are also individual differences such as conscientiousness and emotional stability that are expected to relate to performance on biodata and situational judgment. While cognitive ability may also affect faking, ability is not the focus of this study. Rather, I consider the individual differences of personality traits as they relate to performance and inflation. Item characteristics, such as biodata e1 aboration requirements and item response verifiability, also play an important role, Where certain characteristics may make some items more vulnerable to inflation. While there are mixed views on the effects of faking on validity, recent work by Graham, et al. (2002) demonstrates that criterion—related validity of biodata items can be negatively affected by faking. Assuming, therefore, that inflation could be problematic, it may be helpful to control for score inflation. This could be achieved by identifying those Who respond to bogus items, those who score highly on an inflation index developed for SuCh a purpose, and those who score unusually highly on impression management, and c=<>rrecting for the inflation. Inflated scores may result in a change in the rank order of reSI:>(>ndents (e. g., Frei, 1998), and subsequently, a change in who would be selected in a toI>~down selection system in admissions; a critical issue in a college admissions context. By Selecting based on performance on these tests, we hope to be able to choose the best cal1didates, i.e., those with good actual college performance (college GPA and att erldance). 35 r..._.w~.—._:_ - . a v..a.a.-.e.~.:.:a~ ~==-~>~n~=n :3-7-93A- -uaui nugv-uaunnninJ-anu ~71 ...b\ w- ' ‘V' .6; .. u r l g...” \U. -- . ' 1’ ”“ 1 ....1L >-.. [u 4 .Ah' 5 ".$ ‘ “"Ic. 7-0 -..'.,:L; It . ,. , IN“: ' 'I ‘75. ... l l“" ‘O w‘ .4. inflating responses is most likely to score highest on the tests. Next highest would be someone who has had coaching and no warning, but no motivation. Individuals in these two groups would be more cautious in inflating responses when provided with a warning, but the warning is likely to have a more powerful effect for those who are motivated, resulting in those without an incentive having a higher score than those with an incentive. Those who are highly motivated to perform well have more to lose by being caught faking than those who are not particularly concerned about the test outcome. This effect of the interaction between warning and valence was demonstrated by McFarland (2000) when looking at the Big Five factor of openness. Performance is expected to be lowest in the group that has no coaching, no motivation. The warning statement is not expected to have any effect for this group, as they have nothing to lose nor gain by manipulating their scores. Table 2 shows these general expected score levels within the different situational conditions. Table 2 Expected Score Levels Coached Not coached Warned Not warned Warned Not warned Motivated Middle High Low Middle NOt motivated High High Low Low . The following graph (Figure 2) provides a visual depiction of the expected lnteraction. 39 In. .I'fl . . *f.-- L‘. Toast l—‘c—‘a rfc ”11 It) not") i H4: Figure 2 Expected Interaction of Warning, Coaching, and Motivation on Test Performance + Coached, Motivated (D 0 C E +Coached, Not 36:5 _ Motivated an . Not Coached, 7i; Motivated .23 PT —i_ '7’ m er“. Not Coached, Not Motivated No Warning Warning H4: Motivation, coaching, and warning will interact in affecting scores on biodata and situational judgment. The second question of the study addresses individual differences in faking. We know that conscientiousness and emotional stability are generally related to performance, and thus expect these characteristics to relate to biodata and SJ I performance. H5: Conscientiousness and emotional stability will be related to performance on biodata and situational judgment. Based on the findings that personality traits have been found to be associated with Social desirability (Ones, et al., 1996) and faking (McFarland & Ryan, 2000), I posit that 40 won't ‘ “(‘byiS‘Asl\ ' a b T‘ I; w“? . “rust-ju- '- in" 9 rinse .0 film 2 \ . lik'l mini-1L liti: k» il' A $1 ‘. , i‘ '\ "‘15” 14-: um.» I 1 I r ‘ ‘o . it.) LET“: I it“ are: < ‘v Kw.“ ii u LNLlw-ch i . ‘, mix "v ‘Mq"' r. ‘5 “he v u . ‘i Pu" 't . W conscientiousness and emotional stability will also be important correlates of inflated scores, and self-deception and impression management are also expected to be related to inflation. Inflation will be identified in three ways: by participants showing a positive response to bogus items, showing high scores on the inflation index or showing high performance on the BIDR impression management scale. «r7 H6: Conscientiousness, emotional stability, and social desirability will be associated with inflation captured by performance on a bogus item scale, the inflation index, and impression management. In this model, impression management is considered from two perspectives: first, as an individual difference that predicts biodata and SJ I performance and inflation on bogus items and the inflation index, and second, as an outcome when used as an indicator 0f possible faking. (The empirical generation of the inflation index, where items that Show extreme score differences between two manipulation groups in the study, is described in the Method section below.) The third question of this study addresses characteristics of items that relate to Whether the items are more or less likely to be susceptible to inflation. This research unstion is related to the work of Mael (1991), Becker and Colquitt (1992), and others, Who provide insight into item characteristics that make items more vulnerable to score inflation. Items on which it is easier to provide an inflated response are items that are not Obj ective and are difficult to verify (Becker & Colquitt, 1992). Items that are more relevant to the job (Becker & Colquitt, 1992) and overt (Alliger & Dwight, 2000) are InOre likely to be faked. I expect that items that are not viewed as relevant to the criterion of interest (here college performance) are less likely to be inflated as the link to desirable 41 ‘1‘") E ‘ - D xvi-1"]! 1 ,‘.. 1 ‘ s... 54"! O -"i "v" ‘\ vi ll...) ‘. " ”ad. A iCfiE. ‘ 5 " “.v0... .1. '..L.v\ 1i. ‘ fid- ‘3’._}l ‘ iii-i“ ‘ v...A ' academic behavior is difficult to make. Considering the examples of invasive and non- invasive items provided in Mael’s (1991) taxonomy (p. 773), invasiveness is expected to reduce inflation. Consider Mael’s noninvasive item “Were you on the tennis team in college?” Inflation is likely in an item such as this as there is little vulnerability created by answering positively. Now consider Mael’s invasive item, “How many young children do you have at home?” This item, and others that are invasive, are likely to receive muted rather than extreme responses, and so are less likely to be inflated. Items that are perceived as unequal in access are less likely to be inflated as the individual may see inflated options as so far beyond the realm of possibility that they would not choose them. This argument is supported by the example of a nonequal access item provided by Mael, “Were you captain of the football team?” For most women, this item would be beyond what they would regard as possible. Similarly, an item that is considered Controllable is more susceptible to inflation. Mael’s example, “How many tries did it tElke you to pass the CPA exam?” is an example of a controllable item that is likely to be faked, while an uncontrollable item such as, “How many brothers and sisters do you have?” is less likely to be inflated. Logically, any item that is judged by people to be uIlfakeable is unlikely to prompt inflation, although this is a speculative hypothesis. The following hypotheses were tested. H7: Biodata items that are rated as less objective, less verifiable, and more relevant to school work will show greater inflation than items that are rated as more objective, more verifiable, and less relevant to school work. Also, items that are rated as invasive, are outside the individual’s control, are unequal in terms of access on the part of students, as well as items that are 42 9;“ ~vJ~ whenurr. l . Van-19f , | ' asunuks. Ktlm’l". . “HAM.“ :3“er "rt-‘50} 3. "V! O l v Ni . 1 ““Dtk'lel D r . ‘x ‘v 1 i V - ~~.’v\ ~ . 1. .A. .1 \ ‘\\ rated as unfakeable are less likely to show inflation that might be the result of some form of faking. (Greater inflation will be demonstrated by differences in the correlations between ratings of item types and mean responses to the items across study manipulation conditions.) Another item type that has been shown to be relevant in reducing inflation is the requirement to elaborate on one’s response to biodata items (Schmitt & Kunce, 2002; Schmitt, et al., in press). H8: Elaborated items will be less likely to be inflated that non-elaborated items. The fourth question of the study addresses the validity of biodata and SJIs as tools used in selection. As inflation may not be an adaptive behavior, but rather an artificial self-presentation, it is likely that inflated responses are less effective predictors of Performance than non-inflated responses (see Graham, et al., 2002). If non-inflated reSponses show criterion-related validity, then validity of inflated responses may be improved by correction. This possibility can be addressed by using regression analyses t0 predict performance, partialing out the effects of faking as measured by one or more faking indices (see White, et al., 2001). As mentioned earlier, I attempt to identify inflation through bogus items, an empirically constructed inflation index and the BIDR impression management scale. Some conditions are more likely than others to permit or enCourage faking, and that will effect who would appear highest in a ranking in a top- dOWn selection system (e.g. Dwight & Donovan, 2003; Frei, 1998). H9: Controlling for responses to the three measures of faking will lead to a suppressor effect, with faking correlating with biodata and SJ I but not with performance criteria. Statistically controlling for faking should thereby 43 increase the amount of variance in college performance outcomes (GPA and attendance) predicted by biodata and situational judgment. More potential fakers (those marking bogus items, scoring high on impression management, and scoring high on the inflation index) will be identified among the top candidates if the faking goes uncorrected in top-down selection. H10: Correction by removing those candidates who are identified as having inflated scores, and replacing them with the next-highest scoring individual l.“- not identified as having inflated scores, will result in a better choice of candidates, based on actual college performance (GPA and absenteeism). 44 In? .mfl .g- ~Jv unob‘l I n -9... a 2 iota-ow ";'H ““1 ‘ . . w .,._ A b: 2“ “.5 i -. L . ~ ‘- N METHOD Sample Michigan State University has a large student population (about 35,000 undergraduate students), and at the undergraduate level has relatively open admissions standards when compared with other universities. Because of this, I am provided the opportunity to sample a group of students in a situation in which I am not faced with significant range restriction in cognitive ability (ACT scores in this sample range from 8 to 35 with a mean of 23 (N= 341 ), and SAT scores range from 560 to 1480 with a mean of 1,100 (N = 92). National mean scores for 1999 were 21 for ACT and 1016 for SAT. 'I‘l'lis sample will allow us to consider more accurately the effects of general cognitive ability as they relate to our constructs of interest. Not only is there heterogeneity in ability in the student population, but also a diversity of ethnicities is represented on this campus. The ethnic breakdown of the student population is roughly comparable to that of the United States college applicant pool, although this campus underrepresents minority groups relative to the population of the US. At the university, 77.3 % are White, 9 - 8% are Afi'ican American, 1.9 % are Hispanic American, 5.4 % are Asian American, and 5.6% Other. In the sample, 79.28 % are White, 6.08% are African American, 2.49 % are Hispanic American, 8.56 % are Asian American, and 5.6% Other. Of the sample 94% identified themselves as US citizens. Of the 5% who were non-citizens, 1% were Canadian, and 4% were of other citizenship. English was the primary language of 96% of the sample. Women account for 58.84% of the sample, and the campus has a student population that is 55% female. To ensure that we collected data from individuals who are C:IOSe to the typical age of students applying for college, we recruited only a subgroup of 45 (\- .4 1.; wi-‘ the student population; to be able to participate, students were required to have been in their first year of college. The age of the participants ranged from 18 to 22 years, with 71% of the sample being 18, 20% being 19, and 6% being 20 years old. The mean age of the sample was 18. Participants were recruited from the university’s Psychology Department Subject Pool, had not participated in other studies using similar measures, and received extra course credit for their participation. Study Design This study is designed as a 2 X 2 X 2 orthogonal design, both for situational judgment and independently for biodata. The biodata part of the study has an additional factor with two levels: repeated measures on elaborated vs. non-elaborated items. The study manipulations are identified in Table 3 below, where there are eight possible variations in experience during the study, excluding elaboration, with equal numbers of participants in each cell. Table 3 Study Manipulations and Participants per Cell Coached Not coached Warned Not warned Warned Not warned Motivated 43 45 45 45 Not motivated 44 48 46 46 There were approximately the same number of participants per cell in the research design, as shown in Table 3 above, and a total of 362 participants overall which provided sufficient statistical power to test for the main effects and interactive hypotheses 46 l ‘J b\ \ u. u, u... ‘1‘? (Hypotheses 1 - 4). I am assuming a medium effect size (d = .50), and an alpha level of .05. With 40 participants per cell there will be a power level of .88. Samples within the cells will be too small to provide sufficient power to reject the null hypothesis for analysis of validity and reliability across conditions. Procedure The questionnaire for this study was split into two booklets, and four forms of the questionnaire were distributed. The four forms were essentially the same, apart from the instructional set that was used at the beginning of the second booklet. The instructional set provided the warning and motivation conditions. The first booklet of each form contained a Big Five personality measure, social desirability and impression management scales, college GPA, absenteeism and demographic questions. After completing the first booklet, those groups receiving coaching experienced a brief coaching session relating to the questions in the upcoming booklet, while those groups not targeted to receive coaching got nothing. There was no placebo treatment for the no coaching group; they were moved directly to the next phase of the experiment, which for both coached and non-coached groups was the second booklet. Instructional sets at the beginning of the second booklet varied across the four forms, providing the study manipulations. A sample of the questionnaire form is show in Appendix B, although biodata and situational judgment items have been removed, due to the proprietary nature of the measures. Sample biodata and situational judgment items are shown in Appendix C and D, respectively. Apart from the instructional sets, forms were identical. The wording of all four of the instructional sets is shown in Appendix E. 47 a-fh'W} “.11; u)... . .4. I ‘ ‘ 5.x. 3.... .,. 1 ."\ 'P‘ P-v ~in‘x l: . \ ‘..- u “dun 5 tr- 11:". A..'. g . This study was approved by the University Committee on Research Involving Human Subjects (UCRIHS). Samples of the two informed consent forms, one for motivated participants and one for participants who were not motivated, are shown in Appendix F and G respectively. The forms for this project were pilot tested, and data collection was completed in Fall 2002 and Spring 2003 semesters. The study was advertised through the web page of the Psychology Department subject pool. Participation was restricted to freshman students who had not participated in other studies we have conducted using similar measures. Participants were offered extra credit in psychology for their participation, and the data collection sessions lasted 90 minutes. Participants, who signed up via the subject pool web site, were randomly assigned to different coaching conditions, and were randomly assigned to different forms. It is important in attempting to understand the effects of coaching through an experimental design that subjects be randomly assigned to the coaching condition (Messick & Jungeblut, 1981). Sessions were designed to seat up to 50 participants in a classroom setting, and were administered by research assistants working according to a written protocol (Appendix H). Informed consent forms, data release forms, questionnaire forms and scantrons were placed in envelopes for each participant and distributed according to the protocol. Half of the sessions were provided with a ten-minute coaching component afier participants had completed the first booklet, containing all measures apart from three measures: biodata items, bogus items and situational judgment questions. These three components were completed immediately after coaching for the coached group, and the presentation of the manipulated instruction sets. 48 “’51 g. \ n 0’ 3‘ g... h -0 .u»~‘ a ’3 a..5 { . . ~.. “a.“ P} I / xv“ “‘4 r ., .A-. h‘ \ » \Sa-v- Manipulations Rather than conducting a directed faking study, where individuals are directed to provide the best response even if that means lying, I have preferred to create situational motivators in the form of an external cash incentive, provide constraints by using warning statements, and present ideas on what the tests are capturing through a coaching session. These are a more realistic and subsequently useful approach than directed faking in examining inflation on biodata and 8115. Motivation To encourage students in the motivated conditions to do their very best on the test, and to make them most similar to motivated candidates in a college admission setting, a financial incentive was offered. Those who scored above the 50th percentile on the tests administered were promised and have been mailed $10. Those in the non- motivated conditions received no incentive. Warnings To create a similar effect to warnings that may appear on college application materials, the materials for those in the warning conditions included warning statements as shown in the instruction sets in Appendix E. An example of a warning statement is: “Note that we may verify a subset of your responses, and if you respond dishonestly, that may invalidate this test as well as your chance to receive $10 for high performance.” Coaching A ten-minute coaching component reviewed sample biodata and situational judgment items along with definitions of the performance dimensions that the questions were designed to measure. This coaching session was designed as an orientation to these 49 . 1 . v .4 :s a w: .\. a. ,.. .4\. 4... 1‘. o.. nah \1. r 5 ‘l 34. L1} particular selection devices. This form of orientation is common in formal coaching for selection tests (Sackett, et al., 1989), and is similar in length to the coaching for biodata provided in Miller (2001). Also, it is far more comprehensive than the written coaching statements found by Cunningham, et al. (1994) to be effective. We expected that this brief coaching would have an effect because it was exercise-specific (Sackett, et al.), and was provided immediately prior to the completion of the biodata and situational judgment inventory questionnaire. By reading aloud the directions (Appendix I) and handout material (Appendix J), the proctor of the session provided coaching for the participants. To ensure uniformity in coaching, the same proctor administered all coaching sessions, reading from a prepared script. Once the coaching component was complete, the proctor led immediately into administering the second form in the study, to avoid any discussion and questions during the testing session regarding the coaching. Measures College Grade Point Average To be able to evaluate the predictive validity of the measures being tested in this study, two outcomes were collected. Participants were asked to release their actual college GPA. A sample of this information release form provided to the university registrar is shown in Appendix K. Absenteeism Participants were also asked to identify how frequently they have missed class, and the reasons for their absence. 50 ".11- Neil ... my». I ~ 3.44\ ,. “w. ~_,~1,_ 5 5w 4 c.“ Gender and Ethnicity To be able to examine subgroup differences in responses, participants were asked to indicate their ethnicity and gender. High School GPA and SA T/A CT Scores Participants were asked to provide access to their high school GPA and SAT/ACT scores as provided by the university admissions office. A sample of the information release form is shown in Appendix L. SAT and ACT scores were converted to new variables through linear transformation based on national normative information on means and standard deviations, and then combined to create a composite cognitive ability index consisting of an average of all available test scores for each person. Personality A Big Five personality inventory based on the International Personality Item Pool (IPIP) was used to measure personality (Goldberg, 1999). Scale alpha levels are Conscientiousness (.81), Openness (.77), Agreeableness (.82), Emotional Stability (.89), Extraversion (.87). I presented hypotheses only for the Conscientiousness and Emotional Stability; other measures were examined on an exploratory basis. Social Desirability Self-deception and impression management dimensions of social desirability were measured using the Balanced Inventory of Desirable Responding (BIDR) scale (Paulhus, 1988). Because of concerns about the intrusive nature of one item in each scale (i.e., “I have sometimes doubted my ability as a lover” and “I never read sexy books or magazines”), these two items were not used. Scale alphas are .67 for self-deception and .80 for impression management. 51 Biodata Biodata items generated by Oswald, et al., (in press) were reviewed, and those that were empirically determined in their sample of first-year students to be the best predictors of college performance outcomes (GPA, absenteeism, and a self-assessment on a behaviorally anchored rating scale) were selected for this battery. While Oswald, et al. had written their items to tap 12 dimensions of student performance (see Appendix M), as the items were intercorrelated, they regarded their biodata scale as unidimensional. Nevertheless, to capture the breadth of student performance, items for this study were selected to ensure that the content of all twelve performance dimensions was addressed in the scale. Also included were all elaborated items used by Oswald et al., so that half of the items for this study were elaborated and half were not. Overall biodata scale alpha was .88. Alpha for elaborated items was .78 and .80 for non-elaborated items. Due to the proprietary nature of these items, a sample of the biodata items is provided in Appendix C. Bogus Items The biodata questions include four bogus items to assess faking. These items were based on bogus items used by Anderson, et al. (1984), and were interspersed with the real biodata items. The bogus items are identified in Appendix N. The bogus items scale alpha was .37. This was not unexpected as these items produced very little variance. If respondents were paying attention and were honest, I did not expect them to indicate any activity on these four items when there is no incentive to inflate responses. ~"H x'n‘ ‘ kn \ ~6.‘, . ru~ ‘Cyt .2. ”‘4“ ~. 5. x I Is :4; "c U» Situational Judgment Situational judgment items generated by Oswald, et al., (in press) were reviewed, and those that were empirically determined to be the best predictors of college performance outcomes (GPA, absenteeism, and a self-assessment on a behaviorally anchored rating scale) were identified for this battery. On dimensions where the predictive validity of the SI I items was low, the best items were rationally selected based on content. Two items per dimension were selected to ensure that the content of all twelve performance dimensions was addressed in the overall scale. Scale alpha was .77. Due to the proprietary nature of these items, a sample of situational judgment items is shown in Appendix D. ’“*' Inflation Index To be able to identify probable fakers, the mean score differences for items between the individuals in the manipulation condition where people are most apt to inflate their scores (coached, motivated, not warned) and in the manipulation condition where people would be least apt to inflate their scores (no coaching, no motivation, warned) were exarrrined. Eight items with the largest mean difference in responses comprised the inflation index. These items are shown in Appendix 0, with the difference scores in Appendix P. Item Type To generate assignments of item type to biodata items, two professors, four graduate students, and three undergraduate research assistants on the project provided ratings indicating the degree to which each biodata item was: objective, verifiable, controllable, equally accessible, relevant to college, invasive, and fakeable. Ratings were 53 1 «1‘: .1 .uuv " a»: C “’JL‘ ‘ . .10»— A4..\-\..r tact “if",lr- ‘~\.\" W C . K- ‘ll 7’ p A.“‘ .. ’3‘...) lbs““'\ ‘ u .‘\| Ls ‘ ”‘1'; ““‘~\ \. ,. 1. ~31" b.““'i.‘ s... u , I.‘ 1"“ “~43 fl. .. 1." w 2. \- I. ”5;, T G. ‘ l 5.7 .4. v. 1‘,‘ '—V made on a 5-point scale ranging from “Not at all” to “Completely,” one dimension at a time. See Appendix Q for the rating form, which includes definitions of the item-type dimensions. The 126 items were distributed evenly among the raters so that each rater assessed a unique set of 56 different items. In the end, this resulted in 4 ratings per item. Additionally, four sets of 14 items were rated by the same set of 4 raters. (Items 1, 10, 19... were rated by the same people; items 2, 11... were rated by another set of the same 4 people, etc.) In order to index the amount of agreement between these ratings, the internal consistency of the ratings across items was calculated, treating each rater as interchangeable. This analysis was conducted for each item-type dimension by aggregating ratings for the 126 items into four groups that consisted of ratings from the four, randomly assigned raters. The coefficient alpha estimates for the four groups of ratings were then examined for each dimension. Alpha coefficients were highest for college relevance, verifiability, objectivity, and controllability (see Table 4), and these dimensions were retained as relevant for further analysis. Low reliability on some dimensions was either a result of all the judges rating all the items the same way (6. g, fakeable, where all items were regarded as highly fakeable, with no variability across items), or inconsistency in how judges rated items (e. g., invasive). For objectivity, verifiability, relevance, and controllability the ratings across the four groups of ratings were averaged to compute an overall dimension value for each item. These values were then used in item analyses, the results of which are described below. 54 T151 4 Imap. ..u\ru p CI,” ‘9‘ ". v.1 r' " (2'5":- biAcsh I!" [1‘1in Table 4 Coefficient Alpha and Descriptives for Ratings of Biodata Item Characteristics Dimension Alpha N Min. Max. Mean SD Objective 0.70 126 1.00 5.00 2.89 0.93 Verifiable 0.75 126 1.00 4.75 2.49 0.91 Controllable 0.67 126 1.75 5.00 3.97 0.77 Equal access 0.37 126 2.5 5.00 4.1 1 0.62 College relevant 0.80 126 1.25 5.00 3.28 0.94 Invasive 0.06 126 1.00 4.00 2.75 0.73 Fakeable 0.06 126 2.50 5.00 4.26 0.49 Similarly, analyses of S11 item type were conducted, and rating instructions and item-type definitions are shown in Appendix R. Two professors and five graduate students provided ratings indicating the degree to which each situational judgment item response option was: objective, verifiable, controllable, equally accessible, relevant to college, invasive, and fakeable. Ratings were made on a 5-point scale ranging from “Not at all” to “Completely,” one dimension at a time. The 24 situational judgment items were distributed to all raters, with each rater rating all items. To assess agreement between these raters, we measured the internal consistency of the ratings across items, for each dimension. The alpha coefficients were highest for college relevance, verifiability, objectivity, and controllability (see Table 5), and we retained the data for these dimensions that demonstrated traditionally accepted levels of internal consistency. Again, we discarded ratings on categories where the raters provided little variance across 55 tens (6%- r“: tier» stress the s: 111168 \\ crcl men. labia 5 .:‘\ . ’ " "‘7 7 -, (Ml/shtl. flmensior Olajective l'erifiablc Controllal E94211 acc College It 151‘ Bite Editable items (e.g., fakeable, equal access) or were inconsistent in rating (e. g., invasive). The judges viewed biodata items as more fakeable than SJ 1 items. Next, we averaged ratings across the seven individual raters’ values to compute an index for each dimension. These values were then used in item analyses to identify items viewed as most vulnerable to inflation. Table 5 Coeflicient Alpha and Descriptives for Ratings of SJI Item Characteristics Dimension Alpha N Min. Max. Mean SD Objective 0.81 155 1.43 4.71 3.09 0.70 Verifiable 0.83 155 1.29 4.71 2.86 0.76 Controllable 0.78 155 1.71 4.86 3.67 0.67 Equal access 0.69 155 2.14 5.00 4.30 0.49 College relevant 0.86 155 1.86 5.00 3.77 0.91 Invasive -0.50 155 2.00 3.43 2.76 0.31 Fakeable 0.17 155 2.86 4.71 3.66 0.37 The results of the use of these measures are described in the Item Differences section of the results below. 56 To . .. : ,, t «tutorial .r 130:?» 3310‘: mm m I mews p0: I Y i 11”." . 1 Em»). )C: mun. - . ‘lACQ‘reb L 353013 \r , r..« .x , ‘V‘k‘hie a. L'v._id.~‘t;—! , ‘1‘ \ 1Y- ,. 1- w. , .' " .g‘eL RESULTS Situational Differences To address the first question of our study, and test Hypothesis 1-4, regarding the situational factors that affect faking, I conducted a 2 (coaching vs. no coaching) X 2 (motivation vs. no motivation) X 2 (warning vs. no warning) ANOVA separately for the biodata and SJ I measures. The biodata ANOVA had an additional factor (elaboration vs. no elaboration on the items), which was a within-subjects factor. Results for biodata are shown in Table 6 for an analysis that included no covariates and an analysis that included various possible covariates of biodata responses including sex, race (minority versus white), self deception, impression management, cognitive ability, high school GPA, and measures of the Big Five constructs. Table 7 contains the means and standard deviations of responses to the biodata for all conditions. As can be seen in Table 6, the motivational effect and the coaching effect are both statistically significant and the means in Table 7 indicate that the effect was in the predicted direction thus confirming our first two hypotheses. The Warning effect was nonsignificant indicating lack of support for Hypothesis 3. I hypothesized that these factors would interact, but did not observe the three way interaction depicted in Figure 2. For the analysis that did not include covariates, the interaction between Motivation and Coaching was marginally significant. Examination of the means for these conditions indicated that the combination of both Motivation and Coaching without Warning produced the largest biodata scores (Mean =3.41) as would be expected. Neither Coaching nor Motivation alone (Means =3.09 and 3.06) produced as large an increment in performance when neither Motivation nor Coaching were provided. Variance did 57 [£136 3UP > :0: \l'amr‘d is: people 7 “In facets inzer for btth an. “C‘Lid be \t‘ 35mm Ex 33),... - datum ‘ '1. 2'12... ‘I-fiiili) - change across manipulation conditions, with the greatest variability in the Coached and not Warned groups, suggesting that coaching does not necessarily standardize the way that people respond. When covariates were included in the analyses, the Motivation and Warning factors interaction was also statistically significant. Examination of the pattern of means for both analyses (with and without covariates) indicated that a warning that responses would be verified did appear to erase the inflation of responses that occurred when a monetary motivation to get good scores was provided. With a warning, the means of the motivated (Mean=3.1 1) and nonmotivated groups (Mean=3.01) were not very different compared to the two conditions in which no warning was present (Means: 3.24 for motivated, and 2.99 for nonmotivated). One of the covariates (Extraversion) interacted with the elaboration factor. Examination of this interaction indicated that the correlation between responses to elaborated biodata questions and Extraversion (.29) was higher than the correlation between Extraversion and the responses to nonelaborated items (.21). The impact of elaboration also appeared much smaller when all covariates were included in the analyses. 58 lab-Es 6 .lmirsis of Berreen SU Coaching (( limitation timing (\ fill Crll' llrll’ fill-1M1 Emir Table 6 Analysis of Variance Results for Biodata with and without the Inclusion of Covariates Between Subjects Source Coaching (C) Motivation (M) Warning (W) C x M C x W M x W C x M x W Error Within Subiects Elaboration (E) E x C E x M E x W ExCxM E x C x W E x M x W ExCxMxW Without Covariates With Covariates df F 1 24.95** 1 16.73** 1 .39 1 3.69 1 .02 1 2.11 1 .03 354 (.42)8 1 726.11** 1 .60 1 .07 1 .03 1 2.32 1 .72 1 .08 1 .63 59 df F 1 26.96** 1 13.62** 1 .54 1 1.52 1 1.92 1 628* *’ 1 1.22 312 (.32)3 1 3.38 1 .60 1 .25 1 .07 1 1.62 1 .01 1 .27 1 1.33 lab-C 0 C01 “BESSEP E1E1on :1 Ages: EXCURSLI at -1 \.| . ‘1 lol‘ 0. ‘1‘“15 .. t o 190:. Table 6 Cont. Within Subiects §Q~LC€ E x Self Deception E x Impression Management E x Extroversion E x Agreeableness E x Conscientiousness E x Emotional Stability E x Openness E x High School GPA E x Cognitive Ability E x Race ExSex Error alEquals the Mean Square Error. *p < .05. **p < .01. Without C ovariates With Covariates df 354 60 F (0.05)21 df 1 l 312 F .89 .09 4.81* 3.80 .95 .13 .81 2.14 .56 .21 .34 (0.05)8 m 24.2. docs—5358 e: mm? 805 8:865 c 68580 8:22:38 2: 8:86.: 0:0 .wcfioaoo can .wEEmB 523382 833?: £250 .585 .82.. me ovd 0.0.6. mmd ohm med mod we ovd 3N two _m.m med m5m me 86 :16. mod 09m 50.0 36 me nmd cod ovd wm.m med ww.m .3 2.0 mod omd Nm.m 93 2: we med eqm ovd omd med :8 we Ed mod mmd wm.m mmd vwd cw nmd ww.m med mmm nmd SN. 2 QM :32 QM :32 QM. :32 :38. Sega mumnomm 3:5:80 Baconflocoz 882m weakens—m E 580 5 EB .3 52 § :88 E Ema A c 82 3 :88 8V 583 .3 32 2: £80 8V 683 .5 82 8 580 E 683 .§ 52 2: £86 3 Es,» .2: .22 3 :88 av em? .2: 82 .8188 2: ES,» 5 82 “£62.5on .8953 8335\6 meoteémQ Samurai has .383 b 2an 61 was 51:17.11. effects of 1.18169) 1 In Table 8, I present the ANOVA results for the SJ I measure both with and without the inclusion of the covariates. Table 9 provides the corresponding means and standard deviations for the conditions in our study, and the variance of the SJ 1 responses was similar across all manipulation conditions. As was true for biodata responses, the effects of Motivation and Coaching were statistically significant and the means (see Table 9) were in the expected direction. The Warning effect was not statistically significant. One interaction, that of Motivation and Coaching, was statistically significant but the results were not consistent with expectations in that the presence of warning produced larger SJI scores than no warning, the difference being larger in the two conditions that did not receive the motivational manipulation. The means presented in Table 9 are consistent with our expectations in that the Motivated, Coached, No Warning condition produced the best SJ I responses, over a standard deviation higher than the warned condition that was not coached or motivated. However, several other conditions did not fit our expected pattern of results. The Coached, Motivated and Warned group performed as well as did a similar group with no warning. Results that included the covariates did not change the impact of the manipulations in any substantial way even though a number of covariates (self deception, impression management, agreeableness, conscientiousness, high school GPA, sex and race) were related significantly to SJ 1 responses. I will address these relationships in more detail later in the paper. Coach: \lcwti‘»: \Virnf: C 1 .\l C 111' iii K \V Table 8 Analyses of Variance for SJI with and without Covariates Between Subjects Coaching (C) Motivation (M) Warning (W) C x M C x W M x W C x M x W Error Self Deception Impression Management Extraversion Agreeableness Conscientiousness Emotional Stability Openness High School GPA Without Covariates With Covariates df F df F 1 40.60** 1 56.39** 1 14.65** 1 10.06* 1 1.41 1 2.41 1 988* 1 625* 1 .22 1 1.36 1 .09 1 .91 1 .10 1 1.20 352 (0.12)21 1 545* 1 400* 1 2.41 1 419* 1 4.71* 1 1.29 1 1.52 1 4.79* 63 Table 8 1 Rue HOT 217.11.1318 1 Table 8 cont. Cognitive Ability Race Sex Error aEquals the Mean Square Error. *p < .05. **p < .01. Table 9 Means and Standard Deviations of SJI Responses for Various Study Conditions Mot (0), Warn (0) Coach (0)8 Mot (0), Warn (0) Coach (1 ) Mot (0), Warn (1) Coach (0) Mot (0), Warn (1) Coach (1) Mot (1), Warn (0) Coach (0) Mot (1), Warn (0) Coach (1) Mot (1), Warn (1) Coach (0) Mot (1), Warn (1) Coach (1) Without Covariates With Covariates 1 .32 1 3.96* 1 18.44** 310 (0.09)8 Mean SD N .50 .32 46 .62 .37 48 .56 .32 46 .67 .37 44 .52 .33 44 .89 .32 45 .58 .35 44 .89 .33 43 aMot, Warn, Coach indicates Motivation, Warning, and Coaching. One indicates the manipulation occurred, 0 indicates there was no manipulation. 64 111 f7: 111C131 .. 37f . ,,_ (J -1- ' I l Rik). S‘“x Hypothesis 1, that motivation will increase scores, was supported for both biodata and situational judgment. Hypothesis 2, that coaching would increase scores, was also supported for biodata and situational judgment. Support for Hypothesis 3 was not found, with warning being ineffective at reducing scores. Hypothesis 4 was partially supported in the case of biodata, but not for situational judgment responses. It is apparent from these findings that both biodata and situational judgment scores are susceptible to coaching, as well as to motivation. It is important to consider that in this study, very brief coaching was provided, and if such tests were to be used in college admissions decision-making, it is reasonable to expect that more comprehensive coaching would become available as a result of the high-stakes nature of college admissions decisions. The results do indicate, though, that a warning statement can have some impact in reducing inflation at least for the biodata. An examination of the pattern of results for biodata (Figure 3) shows that the suppressor effect of a warning statement is most powerful for those who are motivated, lending some support to the idea that those who have something to lose are more likely to take a warning statement seriously, although this would need to be verified with further research. A limitation of the warning manipulation is the slight difference between the wording for motivated and not motivated groups, where the motivated groups risked losing the opportunity to earn extra cash if caught responding dishonestly, while the group without a cash incentive did not suffer this risk. According to the recent work of Dwight and Donovan (2003), potential consequences in faking are an important factor in the effectiveness of warning statements. 65 3.2 3.1 2.7 2.6 2.4 Figure 3 Interactions between Coaching, Motivation, and Warning for Biodata Performance 3.2 -— m“ - _ 3.1 —~ \ .___ 3 __ - _ +Coached, Motivated 2.9 __ I ' = W a +Coached, Not 2 8 i ___________ ___ _ -,__ Motivated ' ~- Not Coached, Motivated 2'7 it +Not Coached, Not Motivated 2.6 ——— ——— 444444-442— 2.5 2.4 - 2 No Warning Warning It is also possible that for someone who had not received a warning, their own suspicions about a bogus item may have raised the possibility that was a lie scale, and this may have effectively operated as a warning. It would be helpful in future research to include manipulation checks so that it is clear whether participants registered that they were being warned. In this study, motivation was generated by an offer of a small amount of cash, and this incentive was sufficient to influence score inflation. If biodata and SJ I performance were used to contribute to admissions ratings, the importance of admission to the college of one’s preference would probably be a more powerful motivator, resulting in significant 66 ; *1 1‘.-L* ire SCL 4-r ." '- bg-e\kl\Jr1 1. I16 ‘4 . u.-.. . x 6‘1 :noritV A, ‘d V ‘AA' m IY .A‘ \ .‘Wl J S [he a: m Cir . 5 ~\ .93 \ f! Ir: 1* .‘ ’.. score inflation. The risk of not gaining admission would be an important risk. The artificial context of the warnings in the study combined with the monetary motivation may have limited the effectiveness of warnings, but the effect was in the expected direction when the interaction of Motivation and Warning was examined for biodata. Individual Differences To address the second question of the study, test Hypotheses 5 and 6, and examine the impact of individual difference correlates of faking — race (white vs. minority), gender, ability (measured by ACT/SAT scores) and personality characteristics (specifically conscientiousness, emotional stability, and social desirability) — correlational analyses were conducted. Faking may be captured in three ways: by an individual showing 1) a positive response to the bogus items, 2) a high score on an inflation index, or 3) a high score on the BIDR impression management scale. Table 10 shows the correlation matrix for the variables of interest. Of the faking identification methods, scoring on the bogus items was associated with scoring on the inflation index (r=.3 8), but neither of these was strongly related to impression management. The bogus items and inflation index were related to coaching (r=.23 and r=.36, respectively), and the inflation index was also related to motivation (r=.24). Personality was not highly associated with inflation as indexed by bogus items, with only openness having a significant relationship (r=.16). For the inflation index, extraversion (r=.22), agreeableness (r=.21), openness (r=.21), and self-deception (r=.22), showed significant relationships. However, personality was more strongly associated with impression management; conscientiousness (r=.36), agreeableness (r=.33), emotional stability (r=.22), and self deception (r=.39) were correlated significantly with impression 67 P“. I"I “,fiob‘ . 1 . ‘n- "1‘ LI‘C'U. v u 'e {1‘ 4 s o“ n [.\:"j.‘ ,,,_. Act-guy .'_. (x ‘k. oa>\ management. Methods of identifying inflation appeared to be unrelated to race, cognitive ability, and age, apart from impression management, for which age was negatively related (r=-.12). The only relationship between inflation identification scales and outcomes was for impression management, which was negatively related to absenteeism (r=-.23). Biodata scores were associated with situational judgment scores (r=.47), and both biodata and SJ I scores were related to multiple personality traits, however, the patterns of relationships were slightly different. Biodata scores were most highly related to openness (r=.33), self deception (r=.27), extraversion (r=.26) and agreeableness (r=.25). Also related to biodata were agreeableness (r=.25), impression management (r=.16), conscientiousness (r=.15) and emotional stability (r=.13). Situational judgment, on the other hand, was most closely related to impression management (r=.29), conscientiousness (r=.24), and self-deception (r=.22). Also related to SJ I performance were agreeableness (r=.21), extraversion (r=.16) and openness (r=. 12). Cognitive ability was related to biodata performance (r=. l 8), but not SJ I performance. Race was unrelated to either biodata or SJI performance. Both race (r=.20) and ability (r=.3 7) were related to GPA. Biodata was unrelated to the outcomes of absenteeism and GPA, and SJ I performance was negatively related to absenteeism (r=-.20). Age was positively related to absenteeism (r=.17) and negatively related to GPA (r=-.16). Conscientiousness was positively related to GPA (r=.l3) and negatively related to absenteeism (r=.37). Emotional stability (r=-.l 1) and self-deception (r=-.l6) were also negatively related to absenteeism. 68 I. I Zrllll .»~ .w~.~....~. moo- 00.0 no.0- .430 2.030 1.2.0 *N_.0 v0.0 5.0- No.0- :30 zomo gin—.0 moo- :omd *3 .0 00.0- 00.0- #00. as .0- no.0- no.0- *m 0 .0- .120. ow< ‘ihf‘ v0.0 woo- woo. 00.0- 1.00 No.0 ammo- moo moo 1.2.0. 05.? 02::on 00.0 00.0- moo- $3.0- 00.0 5.0 5.0. moo no.0 00.0 *imd comm 1.70:0 moo “SEO 33:0 “.20 no.0 *avNo Elmo sic—.0 no.0- moo v0.0- :m *ammo no.0- **o~.o Inmo **mm.o *m_.o aim—.0 :mmo .Ibmd no.0- **w~.0 00.0 “info 3285 So. tome 8d :3- 3o tome to? 3de 3o .12 .3de so to? So :26 .13 3o INS Ed- So- 3o 26 3o- 85 zone $5.885 :26 30.233 3: 3.0 L805 soon—E; 32.0 00.0. 00.0 00.0 *©~.o moo 3.0 no.0 #00 No.0- no.0 «0.0 3.5.0 .2130 no.0- 00:33.0 002.53 003282 Om 0 mm ow< 05.? 8mm 5 5265 3: 3.3.0 305 coomofi 28m mswom page: 20.039209 S 053. 69 Ulllllllt'dl I alth- IU 1‘. 5:?“ .00. V 9.3.. .mo. V a... .833 Em 98 8805 Bob @3062 soon 32 x09: cosmofi :50 25: 025.6 523 E20508 2865 385580 5 555 505 c2305 53, $8028.50 cm 05 8365.. v0.0 woo- So- 28 **0Nr “commie. 3.2.0- 3.2.0 ow< 00.0- 00.0 02380 :nmd no.0 .95.? 93::on no.0- No.0 No.0- BEES tow-.0 5.0- coax no.0- 00.0- _0.0- 00.0 080382 00.0 _.._..0N.0- Em no.0- 3.2.0.. No.0 moo- v0.0 Om 00.0 moo- 282m 000 m0.0 00.0 00.0 _0.0 1.80 00.0 :mmo- S: no.0 “L _.o- no.0 moo #00- 1.3.0 3.0 mm no.0 no.0- $805 5033 *m o .0 <00 **nm.0- EmmooEomnxx No.0- 02380 no.0- 00:33 00.0 35.502 .1. _ m .0 Om 3o- 0 1 _ .0 mm D 00.0 <00 5.0 «568531.. 23m mawom 325:8 0. 035- 70 . \ n A.- CO‘ .r. m. hn‘. . \.\ . Hypothesis 5 was supported for biodata and partially supported for SJ 1. Both conscientiousness and emotional stability were associated with performance on biodata, while only conscientiousness was related to SJ I performance. Hypothesis 6 was also partially supported. Conscientiousness and emotional stability were related to impression management, yet performance on the bogus items and the inflation index were not related to either of these personality traits. Self-deception was related to the inflation index and impression management. These results provide support for the idea that positive responses to bogus items and inflating one’s score on other items is less personality-related and more situation- driven, whereas the tendency toward impression management is more personality-related. It is apparent from this study that individuals do claim experience with non- existent things, and that this scoring on bogus items is weakly related to openness. It is possible that people are being more “open” about how they interpret these particular bogus items, and may be drawing parallels between an experience that they have had and that which is captured by the bogus item, generously deciding to claim that they have suitable enough experience to identify it as such. However, as bogus item scores were more strongly associated with coaching, I expect that situational determinants have a more powerful effect on marking bogus items, where individuals who have had coaching on what dimensions are desirable may be overly enthusiastic about demonstrating experience on items that they see as matching those dimensions, rather than responding in an honest fashion. While it may be ethically questionable to include bogus items in a 71 5'3 n TC”. 1\ '2 1‘ Li - wt of}; college admissions application, this does provide useful information about the ways that people may fake. As the inflation index used here is created empirically, based on the difference in performance between the optimal performance group and the reference group, it is not surprising that the index is very highly correlated with biodata and SJ I performance. Nevertheless, once overlap items are removed from biodata and SJ I scales, the index remains highly correlated with performance. It is interesting to note that conscientiousness, although significantly related to biodata and SJ 1 performance, is unrelated to the index. As the inflation index is related to coaching, this reinforces the notion that situational factors are important in explaining faking. It should be noted that the inflation index created empirically for this study would require cross-validation for use elsewhere. (The items that made up the index can be viewed in Appendix 0.) That personality traits of conscientiousness and emotional stability were related to self-deception is consistent with research on social desirability, and it is not surprising to note that agreeableness is also related to self-deception — it is comforting to think of oneself as agreeable. These three personality traits appear to be the most important in explaining moderately inflated performance, however, they do not appear to be as useful in cases of extreme faking. This suggests that presenting oneself in a positive light is adaptive, and that items to which individuals respond probably include a level of social desirability and/or job desirability. That cognitive ability was related to biodata but not SJ I performance was surprising, but may be explained by the breadth of dimensionality captured by the SJ 1, which moves beyond typical college academic issues to include issues of social CO LU consciousness, multicultural tolerance, artistic appreciation, etc. That cognitive ability was weakly related (r = .18) to biodata could be a result of socioeconomic status, where those with ability were also exposed to greater opportunity for suitable experiences. Item Differences To address the third question of the study, and examine the effects of certain item characteristics as they affect inflation, the following analyses were conducted. The item characteristic ratings made up of the mean rating provided by experts (described earlier) were correlated with the biodata item responses for each participant. The r was then transformed to a Fisher 2. ANOVA was used to test for differences in Fisher 2 correlations across the study manipulation conditions. Overall mean levels are shown in Table 11, ANOVA results for the four item characteristics are shown in Table 12, and levels across conditions are shown in Table 13. Table 11 Overall Means and Standard Deviations of F our Item Characteristic Fisher 2 for Biodata Dimension N Mean SD Objectivity 362 -O.22 0.21 Verifiability 362 -O.18 0.21 Controllability 362 -0.15 0.16 Relevance 362 0.10 0.16 ‘5‘" .5. v 9.... .3. v HH... .Hohm Bazom :82 20 £959 .33 Ham .3: Hi 0.3 an .060 am... 85 moo H 8.0 H .26 H HH.o H 22.2 x o 8.0 H Ed H Rd H HS H B x 2 8H H 8.0 H Hg H one H B a. o .52. H :3 H oH.o H a; H 2 x 0 23 H 8.0 H 23 H m S H 90 3.53 mg H .. How H 2: H A? H 06 @3282 3d H the H mom H 1.85 H 60 9.288 850m mHooEHHm ”H no u \Hc a to a \H. gnu. 8§>23H 5:526:80 3:585.» $280.30 8535 nonmotwtotexufi ENS .Sok .6\N Sink age-03.3% 825qu \o .333on N_ 035. 74 Hf“ .HHouHHSESHE o: m?» 805 8:00:05 0 600.5000 2002:0088 05 00:00:05 0:0 $550000 98 $52.83 528382 88205 00000 .503 .82.. a. 2.0 NHHH 2.0 2.? H3 2.0- H20 £5. a. 2.0 8.0 2.0 2.0- mg 8.0- H2 25. 9. 20 NS 2.0 2.0- mg 2.0- 20 03- Q 20 NS 2.0 :0- Ed :0- 20 8.0- 3. 3 mod 2.0 2.0- NS :0- 20 one- 2. 2.0 o; m; 2.0- 8.0 H20. NS ad- Hz. 20 3o :0 2.0- H20 2.0- 2.5 NS- 9. 20 m3 «.3 2.0- H20 N20- 20 Rd- 2 00 :82 on :82 a. :82 9... :82 85.53 0:323:80 $585, $308.30 H0 €80 :0 H53 .00 022 80 £80 3 EB .8 H22 H H0 580 H8 855 .HHV 62 He €80 He 555 .3 82 H0 680 :0 Es; .80 82 H8 €80 H 0 as,» .80 82 :0 £80 He ES, .80 82 .80 580 80 EB .80 82 8300.5 xo\m=o.HH.HH0=oD .0050. 020.20; .5\N .00me 0520330be :0: 30.50 0:0.2030Q 330280. 020 0:00: 2 050.0- 75 0 \ 3K — The correlation between item objectivity and item performance does vary significantly as a result of the manipulation conditions of Coaching and Motivation (see Table 12). Mean levels of the Fisher 2 for objectivity across conditions are shown in Table 13 below. It should be noted that all the correlations are negative indicating that the more objective the item was judged to be, the lower were students’ scores on the biodata items. As would be expected the strongest negative correlations were in the two manipulation groups without motivation or warning, and the weakest were in the groups with both motivation and coaching. The correlation between item verifiability and item performance does not vary w: significantly across manipulation conditions, and mean levels across conditions are not very different, as shown in Table 13. Again, all correlations are negative indicating different items that are more verifiable produce lower biodata Scores. For controllability, the correlation with item performance does vary significantly as a result of the manipulation conditions of coaching and motivation. Mean levels of the Fisher 2 for controllability across conditions are shown in Table 13, where the largest gap in means is between the groups without coaching or motivation, and the group that has both coaching and motivation, but no warning. Correlations between controllability and biodata scores are once again negative, but slightly lower than correlations between objectivity and biodata performance. This is contrary to the direction expected for controllability. College relevance did not show significant differences in correlation with item performance across manipulation conditions, however, there is a significant Coaching by Motivation interaction. This suggests that those who are both motivated and coached 76 have a greater likelihood of scoring highly on biodata items, when those items appear relevant to academic performance, than those who only experience one or the other of those conditions. This also suggests that the coaching on dimensionality of biodata items is effective in helping individuals who are motivated to identify dimensions that are relevant to college performance. All correlations between item relevance and biodata item responses were positive, but lower than those with other item characteristics. These correlations indicate that the higher the perceived relevance of the item, the higher the response to biodata items. Similar analyses were conducted for SJ 1 correct response options, where the dimensions of objectivity, verifiability, controllability and college relevance of each correct response option for the question, “What would you be most likely do?” was correlated with SJ I performance, and then converted to Fisher 2. Overall mean levels are shown in Table 14. ANOVA results for the four item characteristics are shown in Table 15, and mean levels across manipulation conditions are shown in Table 16. Table 14 Overall Means and Standard Deviations of F our Item Characteristic Fisher 2 for SJI Dimension N Mean SD Objectivity 362 0.02 0.22 Verifiability 362 -0.01 0.21 Controllability 362 0.13 0.18 Relevance 362 0.10 0.13 77 e [til .5. V 92.... .mo. V a... .2025 895m 5222 222 Enigma .282 am .2223 2.3 .222: 2.2 .232 2.2 28.22 1222.22 2 8.22 2 22.2 2 .222 2 222 x 22 22 0 222.22 2 8.22 2 8.22 2 2.22 2 3 x 22 22.22 2 22.22 2 ~22 2 222.22 2 35 22mm 2 02.22 2 22.2 2 222.2 2 223 22.2 2 22.22 2 2.22.22 2 8.22 2 S: 22225223 2.2.22 2 2.2.2 2 3.2: 2 2 18.8 2 2222 2522222822 8.22 2 2 2.2 2 $22.3 2 £22.22. 2 62 2222222228 ouSom $8326 2 x22 .2 x222 2 x222 .2 222 gm 822g232 22222222282228 222222222252, 222222282220 Na. xoxdohwtmtexgb ENC 230K 20.? 22222.3 goKazzmmm 8:22.252 \0 2.222532% 2 @3222. 78 Y Q?" dorm—222222222222 ca 33 8222 8228222222 0 .32-222822 28222223222288 22222 3228222222 0:0 92222222200 222222 $222535 2222223202 8228222222 22022220 .22-2222,? .2222 22. 2.2.22 2.2.22 222.22 :22 :22 222.22 222.22 22.22 22. 2.2.22 2222.22 22.22 22.22 22.22 222.22- 2.2.22 2,222.22- 22. 22.22 222.22 22.22 22.22 222.22 22.22 22.22 22.22 22. 22.22 222.22 22.22 22.22 22.22 222.22- 8.22 222.22- 2.2. 22.22 222.22 2 2 .22 22.22 22 2.22 222.22- a 2 .22 8.22 22. 2.22 2.2.2 2.2 222.22 222.22 222.22- 2222 2222.22- 222. 2.2.2 222.22 22.22 22.22 222.22 222.22 2.22 2.22.22 22. m 2 .22 222.22 22.2.22 2 2 .22 222.22 2 2.22- 22 2 .22 222.22- 2 22.2- 22822 Q2 :82 a. 22822 Q2- 2282 82.225232 22222222228228 22222222222232 22222822222 222 22222222 222 22.222222 .222 22.2 2222 222822 222 E22222 .222 22222 222 2288 2222 E22222 .222 22.2 2222 22222222 2222 E22222 .222 202 222 2222822 222 22.22.25 .2222 22222 2222 2222822 222 E22222, .222 22222 222 22222222 2222 8223 .2222 22222 .2222 22228 2222 2:223 .2222 2222 EM. Ska-22222222220 22.32% 3222.232 onN .222-2.2K .222-2223292222225 ESN 2220,58 22.222.223.2222Q «ravagw 2228 2.2233 02 22222-2. 79 The correlation between item objectivity and item performance does vary significantly as a result of the manipulation conditions of Coaching and Motivation (see Table 15). Mean levels of the Fisher 2 for objectivity across conditions are shown in Table 16 below. Where the correlations are negative, this indicates that the more objective the item was judged to be, the lower were students’ scores on the SJ I items. As a would be expected for the strongest negative correlations were in the two manipulation groups without motivation or warning, and the positive correlations are the groups with coaching. The largest mean correlations were for the coached and motivated groups. The correlation between item verifiability and SJ I item performance varies ”W significantly across manipulation conditions of coaching and motivation, and mean levels across conditions vary, with negative correlations for the groups without coaching, as well as the group with coaching and a warning statement with no motivation. Again, negative correlations indicate different items that are more verifiable produce lower SJI scores under conditions that do not precipitate high performance. Coaching and motivation result in high performance, as well as positive correlations, suggesting that under certain conditions, verifiability' may no longer suppress scores on SJ Is. For controllability, the correlation with item performance does not vary significantly as a result of the manipulation conditions of coaching, motivation or warning. Mean levels of the Fisher 2 for controllability across conditions are fairly stable. Correlations between controllability and SJ 1 scores are positive, as expected for controllability. College relevance did not show significant differences in correlation with item performance across manipulation conditions, however, there is a significant Coaching by 80 Motivation by Warning interaction. All correlations between item relevance and SJ I item responses were positive. These correlations indicate that the higher the perceived relevance of the item, the higher the response to SJ I items. One limitation of this analysis is the dichotomization of scores for SJ I performance, where I have considered a correct response a score of 1, and an incorrect response a performance score of 0. These results provide support for Hypothesis 7 on two dimensions; objectivity and college relevance, for biodata, and significant findings for controllability for biodata, but in a direction Opposite to that hypothesized. This suggests that these item characteristics are related to the likelihood that the biodata item score may be inflated. f Future work on biodata should take into account the characteristics of the item where i ' i m... they may be able to be formulated in a fashion that limits faking. A closer examination of the issue of controllability shows mixed results of the effect of controllability on biodata validity, depending on the type of controllability (see Graham, et al., 2002). This suggests that hypotheses about controllability must consider specific issues of whether the individual can choose to perform an action, has no control over the action, has shared control, or the individual’s feelings or attitudes are the issue of interest. Each of these categories may provide conflicting results, and such specifications were not taken into account by the expert raters in this study. The results also provide support for Hypothesis 7 on dimensions of objectivity, verifiability, and college relevance, for SJ 1. It appears from the change in correlation direction fiom negative to positive under high performance conditions that scores on SJ ls are less likely to be suppressed as a result of item characteristics than are scores on biodata items under similar conditions. 81 The issue of an elaboration requirement as a means of reducing inflation on biodata items was tested, with results shown earlier in Table 6. While there was a significant main effect for elaboration, the impact of elaboration also appeared much smaller when all covariates were included in the analyses. Hypothesis 8 was supported. It is possible that there is an item type confound where the items that were chosen by Oswald, et al. (in press) for elaboration may have also been those that were more fl verifiable. Since they were verifiable, respondents were less likely to inflate responses. Validity and Selection Decisions l The fourth question of the study addresses validity and the effects of selection _ v decisions. Table 17 shows correlations between the predictors and outcomes, as well as their descriptive statistics. Table 17 Correlations and Descriptive Statistics for Predictors and Criteria Year 1 Absentee- SJ I total N Mean SD ism GPA Year 1 GPA ‘ 353 2.88 0.73 Absenteeism -0.20** 362 3.1 l 1.1 l SJI total 0.06 ~0.20** 360 0.65 0.37 Biodata total 0.09 -0.05 0.47** 362 3.09 0.48 **p < .01. To test the validity of these predictors of student performance, and to test Hypothesis 9, the difference between the zero-order validity coefficients and the validity 82 coefficients with the faking identification scales partialed out were calculated. Results are shown in Table 18. Table 18 Zero Order and Partial Correlations between Situational Judgment and Biodata and Two Criteria Controlling for Measures of Faking Situational Judgment T Zero order Bogus Inflation Impr.Mgt. All three I. r Partial Partial Partial Partialled GPA 0.06 0.05 0.02 0.03 0.01 £ Absenteeism -0.20* -0.20* —0.23* -0.l4* -0.l9* Em Zero order Bogus Inflation Impr.M gt. All three r Partial Partial Partial Partialled GPA 0.09 0.07 0.05 0.08 0.03 Absenteeism -0.05 -0.06 -0.05 -0.01 -0.05 *p < .05. For the GPA outcome, neither biodata nor SJ I proved to be useful predictors of academic performance, whether or not they were statistically adjusted using the faking measures. SJ I did predict absenteeism, but the zero-order correlations were comparable to all four partial correlations. The partial correlation when Impression Management was controlled was somewhat lower, though not statistically different from the other correlations. In any event, there is no evidence that inflation, as measured by our three 83 indices, is attenuating the validity of these measures to predict grade point average and absenteeism. As McFarland (2000) noted that faking may affect reliability, the alpha coefficients across manipulation conditions were examined for biodata and SJ 1, and are shown in Table 19. While they do not differ dramatically across conditions, it is interesting to note that the manipulation condition which would be expected to display fl the greatest degree of inflation (Motivated, Coached, not Warned) does have the highest reliability for biodata. However, the same group has the lowest reliability for SJ 1, but none of the reliabilities are very different across conditions, and all are relatively high. i I Table 19 Coeflicient Alpha of Biodata and SJI Scales for Various Study Conditions Biodata SJ I N Mot (0), Warn (0) Coach (0)a .80 .70 46 Mot (0), Warn (0) Coach (1) .89 .74 48 Mot (0), Warn (1) Coach (0) .87 .71 46 Mot (0), Warn (1) Coach (1) .78 .77 44 Mot (1), Warn (0) Coach (0) .82 .70 44 Mot (1), Warn (0) Coach (1) .94 .69 45 Mot (1), Warn (1) Coach (0) .84 .77 44 Mot (1), Warn (1) Coach (1) .90 .66 43 aMot, Warn, Coach indicates Motivation, Warning, and Coaching. One indicates the manipulation occurred, 0 indicates there was no manipulation. 84 ‘tiir- .222. v 22*... .222. v 22... 28222222222258 on £23 22.2222 822222222222 22 .222-22822 25222222222232: 222 822222222222 22220 .chomoU 22:22 .wEEaB 22222222322222 822222222222 2.2022220 .22-2223 .2222.2 9. 2222.22 2222.22 2222.22- mmd 2.2. 2. 2 .22- 2222.22 2222.22- 300 3. 222.22- 2222.22- 2222.22- 2222.22 2.2. mmd- 2.22.22 2222.22 2222.22 2.2. 3.2.22- 2222.22 5 2 .22- 2222.22- 9. 2222.22- 2222.22 2222.22- 222.22 2222 232.0022- 2 2.22 22 2 .22- mmd o2. 2.2-N22- 222.22- 2222.22- 1. 22.22- >2 252822282222. <25 2528222322222. 22.220 32.2.22 222 .2228 222 E23 .222 2222 2222 22228 222 E22222, .222 2.22 222 22222222 2222 E22222 .222 22222 2222 22222222 2222 E22222, .222 2222 222 2222.220 222 E22222 .2222 202 2222 222822 222 E22222, .2222 2222 222 22222222 2222 8.22222 .2222 22.2 .2222 22222222 2222 E223 .2222 222 22822222222220 223$. 2.282.232 2.82.222 222-2.2822232; 2228 32$ 3229223222 22.2 2.222226% 25% 2252 83.3.5 20.? 2222222222.. 22m 022222.2- The amount of variance in GPA and absenteeism, predicted by biodata and SJ I, across conditions is shown in Table 20. There are no distinct patterns to validity across conditions, and validity comparisons across groups can not be clearly interpreted as they are limited by the small sample size and low overall criterion-related validity of the biodata and SJ I measures. To test Hypothesis 10, I examined the responses of two groups of respondents. Respondents in a group that should be most apt to inflate their scores (the Coaching, Motivation, no Warning group, which I refer to as the high performance group) were compared with those in the group in which faking should be minimized as the individuals would be least apt to inflate their scores (the no Coaching, no Motivation and a Warning group, which I refer to as the reference group). Table 21 below presents descriptive data for each of the two groups. The high performance group shows a higher mean score on all three of the faking identification scales, although the largest gap between the two is on the scales comprising bogus items and the inflation scale on which the mean differences are over .5 standard deviation units. Recall that the Inflation Index was generated using these two conditions, so this large difference is artificially inflated, and this index would need to be cross-validated. A much smaller mean difference was obtained on the Impression Management scale, suggesting that this is more stable and personality-related, rather than being prompted by situational factors, which seems to be the case for the Bogus items and items making up the Inflation Index. 86 : . .1“ .II 22.22- 322 222.2 22. 2.22.2 222.2 22. 22.22 222.22 222.2 22. 2.22 2.2.2 2.2. 22.22 22.222 222-2.2 22. 222 2 22.222 22. 22. 2- 9.2 2.22222 22. 8.2 222.2 2.2. 22.22- 2.22 222.2. 22. 2.2 2.2.2 22. 2. Q2. 5222 2 a. 92...: >2 oocohumom 00222295922022 2222222 2m- 22 2222282582.. 222.2.-222 <20 2 28> $22 2 -0 22 2522-2832332 2828222222222 87 2 22.222 222222 25222222222 28-2.2 28m 22222222 8.2296 mono-2.28% .2252 2.252222222me 22$: 26223222222226me 2N o22222-2- Given these data, a large number of positive responses to the Bogus items definitely seems to indicate that the respondents’ answers are generally suspect. The same might be true of the Inflation index, but, as I noted above, results for that scale should be cross-validated. To examine how the use of these scales to correct for faking might affect who is selected or admitted to college, I examined possible admissions decisions based on a selection of the top 10%, 25% and then 50% of the participants, based on their biodata and S] I scores. To control for faking, I identified those who scored above a cut-point on the faking scales. The Bogus scale is presented as an example; similar analyses were conducted with the inflation index and impression management, with similar results. A cut-point of 4 on the Bogus items scale was chosen as this reflected any positive response on the Bogus scale items. As the Bogus items are made up of experiences that are impossible, I can presume that anyone who claims experience on these items has not been accurate about their experience, and it may be reasonable to exclude them. Having excluded these individuals, I then proceeded down the list, selecting the next best candidates in terms of biodata and SJ I performance, who do not meet the cut score on the Bogus items scale. The actual college performance (GPA and absenteeism) of those who were excluded was then compared to the actual college performance of those who were selected as alternatives. Looking first at the .10 selection ratio, those chosen for best performance included 26 individuals with scores above 4 on the Bogus scale. These individuals were removed, and replaced with the next-highest scoring individuals who did not have scores above the 88 cut on the Bogus scale. The differences in college performance for these two groups is shown in Table 22 below. Table 22 Descriptive Statistics for Selection Ratio of . I 0 Absenteeism Year 1 GPA Removed from top 10% selection N Mean SD Replacements for top 10% selection N Mean SD d 26 3.08 1.06 26 2.77 0.90 26 3.08 0.84 0.00 26 2.85 0.82 ~0.09 The same process was followed for a selection ratio of .25 and the results are shown below, demonstrating little difference between those excluded and those pulled in as replacements. Table 23 Descriptive Statistics for Selection Ratio of .25 Absenteeism Year 1 GPA Removed from top 25% selection N Mean SD Replacements for top 25% selection N Mean SD d 58 3.02 1.03 58 2.89 0.78 89 58 2.95 0.96 0.07 55 2.87 0.73 0.03 The same process was followed again for a selection ratio of .50, with similar findings. Table 24 Descriptive Statistics for Selection Ratio of .5 0 Removed from top Replacements for top 50% selection 50% selection 7 N Mean SD N Mean SD d Absenteeism 109 3 .02 1.02 109 3 .05 1.15 -0.03 Year 1 GPA 108 2.93 0.70 103 2.89 0.79 0.06 ‘FVH2H It appears that, regardless of the level of selectivity, using Bogus items as a means of removals does not change the quality of the student population, when considering actual college performance. This study demonstrates those who are coached and motivated to achieve can produce high scores on biodata and situational judgment, and those scores may be a result of inflation due to faking. Inflation scales can be a useful method of identifying response manipulation but in this case are less useful in correcting for it. Further research into inflation scales and corrections would be necessary to develop protocols for their use. It may be helpful to conduct further analyses with a broader criterion measure that is conceptually more closely related to the biodata and situational judgment dimensions. Results here are probably affected by the very low criterion-related validity of the biodata and SJ I measures, and Graham, et al. (2002) have linked faking in biodata to reduced criterion-related validity. 90 CONCLUSIONS Discussion This study has addressed four major questions. First, how do different situational differences such as testing conditions influence how an individual performs on biodata and SJ Is? Second, what are the individual difference characteristics of people who are most effective in inflating their responses? Third, what are characteristics of the test items that seem most susceptible to inflation? Fourth, does the susceptibility of biodata and SJ Is to faking affect their predictive validity if they were to be used in college admissions decision-making? Situational Factors : '; 9w Situational factors can be very important determinants of performance on biodata and situational judgment items. Brief coaching was shown to improve scores on both of these non-cognitive measures, an important issue in the context of test application. According to Sackett, et al. (1989), “In the typical performance domain, the examinee may adopt an explicit self-presentation strategy in responding to the selection device based on a hypothesis about what the employer is looking for in an applicant.” (p. 148). Coaching appears effective in aiding examinees in generating hypotheses and presenting themselves appropriately. Were biodata and SJ Is to be used in a college admissions selection process, those who have access to coaching would be better able to improve their scores. Such susceptibility to coaching is of great practical significance to the College Board, the owners of these measures, as they would need to be able to defend the reliability and validity of the measures, or ensure that examinees have received equal access to coaching (see Standards for Educational and Psychological Tests and 91 Manuals). Considering the effectiveness of the written coaching provided in Cunningham, et al. (1994), it may be appropriate for researchers to consider future studies that examine the difference in effectiveness between face-to-face coaching programs, and materials that can be provided in hard copy or via electronic media. Also, it may be useful to conduct an examination of the relationship between warning statements and coaching, as coaching may effectively negate the power of the warning statement. If these tests were to be used to contribute to college admissions decisions, it is realistic to assume that college applicants would be highly motivated to perform well. Motivation was shown to have a significant effect on performance on these measures, and the interaction of coaching and motivation for SJ Is in this study suggests that maximal performance in these tests in an applied setting will be best facilitated by the combination of personal motivation and coaching. Warning statements, although not found in this study to be a significant factor in suppressing inflation overall, did operate in the direction expected when individuals are motivated to perform well. Warning statements are not equally relevant for biodata and SJ I items. While objective and verifiable biodata items, such as the number of leadership positions that a person held in high school, could be verified by contacting the high school, verification would be more difficult to conduct for SJ Is. While peers or teachers could be contacted and asked whether an individual is actually likely to behave in a certain way in a certain situation, the possibility of this verification is clearly more remote, suggesting that warning statements about verification of responses are less useful 92 for SJ Is, and may even make matters worse by actually planting the idea that one could answer dishonestly. To be able to implement the biodata and SJ I in an admissions context, these measures would need further examination, ideally with a sample of individuals who really have a personal desire to score well on the tests and a great deal to lose if they do not, to fully understand the power of a warning statement and the interaction effects of motivation and warning. To ensure the most powerful warning effect it may be best to include both the warning that faking could be identified, as well as the warning that there would be potential negative consequences for faking (see Dwight & Donovan, 2003). According to Dwight and Donovan, both of these characteristics of a warning statement play a part in limiting faking. A study using a sample of real college applicants who believe that these tests are being used in their admissions decision-making process would provide an excellent framework for building our understanding of the effects of motivation and warning on biodata and SJ I score inflation. Identification of Inflation Three inflation identification methods were used, providing valuable information about individual differences and inflation. Inflation captured by these scales was unrelated to race, gender, or cognitive ability. First, Bogus items were considered. Bogus items were successfully used to identify those who had claimed experiences that were impossible, and scoring on Bogus items was more situational than personality-related. Bogus item scoring was weakly related to openness to experience, which might suggest that those who are more creative in framing their own personal experiences are able to warrant marking experience on 93 ' it!“ 2.,__, _. g Bogus items, perhaps viewing the experience being addressed as in some way related to an experience they have had. Future research with bogus items should include a broader selection of bogus items, with varying degrees of obviousness. There may be particular bogus item characteristics that make some more effective as flags of inflation than others. Second, an Inflation Index was created. Inflation captured with an empirically- derived Inflation Index was more personality-driven that the Bogus items, and was Fl weakly related to extraversion, agreeableness, openness and self-deception. The Inflation Index proved to be a useful technique in identifying those with unusually high scores. r Nevertheless, using such an index as a way to flag problematic responders could cause 7 j a ‘ headaches for the College Board, who would need to be able to defend their practices : in.“ l against those arguing that they did not cheat, but were responding honestly, but simply have unusual experiences that result in their being flagged by the index. It should be noted that while the empirical Inflation Index proved useful, results should be cross- validated. Third, the BIDR Impression Management scale was used as an indicator of inflation. Impression management appeared to be the most personality-related of the identification scales. Impression management was more strongly related to conscientiousness, agreeableness, emotional stability, and self-deception. This suggests that inflation related to impression management may be more adaptive than inflation related to Bogus items, or the Inflation Index. That impression management appears to be trait-related lends support to the argument that it should not be used for controlling for socially desirable responding. It is apparent from this study that inflation is largely a 94 filnction of the situation in which the test is taken, rather than the characteristics of the individual, and this study lends support to the view of impression management as a trait. Item Characteristics Item characteristics can play an important role in creating a test that is less vulnerable to inflation. This study found that biodata items that were judged as being less objective, less controllable, and more college relevant were more susceptible to inflation under different manipulation conditions. SJ I items that were objective or verifiable tended to have lower scores, unless the individual was coached and motivated to do well, in which case, the suppressor effect faded. The suppressor effect of item characteristics was not weakened for biodata to the extent it was for SJ I items, when coaching and :5 it motivation to perform were provided. To limit inflation, it may be important for biodata test generators to choose items that are objective and verifiable where possible. However, one disadvantage of trying to ensure that items are objective and verifiable is that biodata inventory builders may lose the valuable multidimensionality captured from a broader pool of items. Also, there may be resistance to removing items that are fakeable when those items are perceived as the most valid. While college relevance was shown to increase inflation, there is an inherent tradeoff in seeking to minimize faking by trying to disguise college relevance — tests that are viewed as relevant are more likely to be perceived as fair (see Schmitt, Oswald, Kim, Gillespie, & Ramsay, in press), and trying to reduce college relevance would not be recommended. That less controllable items were more susceptible to inflation suggests that further research will be necessary to resolve the different categories of control, and the 95 consequences of having items that tap those categories. Graham, et al., (2002) indicate four categories of controllable items, and these may all produce differing results in terms of inflation and validity. Judges in this study were not given multiple categories for rating controllability of items, and the work of Graham, et al., provides guidance for improved evaluation of controllability in future research. An elaboration requirement in biodata items appears to be effective in suppressing scores, adding to existing literature in this area (e. g., Schmitt & Kunce, 2002). However, not all items are equally suitable for an elaboration requirement. It may be that more verifiable items are more easily written with an elaboration requirement. Also, there is an issue of memory that may be affecting responses to elaborated biodata items. Respondents may limit their responses when elaboration is required only because they can’t remember the details and so would be unable to record them. Ramsay, Kim, Gillespie, and Friede (2003) have considered memory as a factor in a biodata elaboration study, but used a memory test that has questionable construct validity, and may be more a test of general knowledge. Further research in this area may provide answers to the questions surrounding elaboration requirements and the factors that contribute to their effectiveness. These are issues that test developers will need to balance as they attempt to create inventories that are reliable and useful. Considering the relationship of item characteristics to inflation, and that validity may also be affected by vulnerability to inflation linked to item characteristics (Graham, et al., 2002) item characteristics as a means of understanding inflation may be a fruitful area for future research. 96 Validity The issue of validity is of critical importance if the tests used here are put into practice for selection in an applied setting. It appears from this study that inflation on biodata and SJ I does not attenuate the criterion-related validity, however, we are examining a set of tests that show limited criterion-related validity without inflation. To reach a definitive conclusion about the relationship of inflation and validity, it may be useful to conduct further research with performance measures that are multidimensional and more closely linked to the dimensions captured by the biodata and SJ I questions. Identifying participants who appeared suspicious, based on the three inflation identification methods, and then replacing them with participants who were not suspected of inflation did not prove effective in selecting a higher performing group in terms of college performance. However, criterion-related validity was low, and such corrections may have different effects if one were examining a broader criterion space. Limitations One limitation of this study is the slightly different warning manipulations for the motivated and not motivated groups. While both groups received the same statement about the possibility that responses would be verified, the consequences of being found providing dishonest answers had a greater negative consequence for the motivated group, who would lose a cash payout for high performers. These differences mean that the manipulation was not equivalent across groups. It would have been helpful to have included a manipulation check regarding the warning statements to establish that they had been attended to. This sample was not completing these measures under the real expectation that responses would contribute to college admission decisions, limiting the 97 power of the warning statement that there would be negative consequences for responding dishonestly. In a real world scenario, the risk that one’s college application may be rejected based on dishonest responding would be of great consequence. Therefore, future research will need to examine warning statements in a sample of real applicants. As the inflation index was generated in the study sample, it would need to be cross-validated as similar results may not be found in a different sample. Another possible limitation is that elaborated items in the study may have been items that were more verifiable. This verifiability, rather than the elaboration requirement itself, may have limited the respondents’ inclination to inflate responses. In addition, it is difficult to draw conclusions about any changes in validity, or the effectiveness of correcting for inflation, when there is very limited criterion-related validity for the biodata and SJ I measures in this sample. Practical Implications The use of biodata and situational judgment measures have shown promise as predictors of student performance (Oswald, et al., in press). However, in this sample, these measures did not prove effective as performance predictors. For these tests to be implemented in a college admissions context, greater criterion-related validity would need to be demonstrated. Considering the vulnerability of these measures to inflation prompted by motivation and coaching, it may be most practical to reserve measures such as these for informational or developmental purposes, rather than for admissions decision-making. Bearing in mind that respondents can improve their scores when coached very briefly on how to do so, it is problematic to promote this test as one that could be used in 98 admissions decision-making. If the tests were to be used in such a way, test owners would need to provide access to coaching materials, so that all examinees have an equal opportunity to improve their scores through knowledge about the test. While written or web-based materials could easily be generated by the test owners, it is reasonable to assume that the implementation of tests such as these would prompt market forces to produce tests preparation training classes that could provide greater advantages to those with the resources to attend such training. This could precipitate sub-group differences in test performance, one of the issues that these tests are designed to avoid. Test owners would be well-advised to implement a warning statement in the test administration, once further research has been conducted that clarifies the most effective type of warning statement in a college admissions context. It is expected that a warning statement, claiming that dishonest responses may be verified and that there may be real negative consequences for inflating responses, may be effective. However, the power of the warning statement is expected to vary based on the purpose for which the testing tools are used. High-stakes decisions based on test performance may result in greater responsiveness to a warning statement that includes negative consequences for dishonest responding. In compiling an inventory of biodata items, test developers should consider items that are less likely to be inflated. This may be achieved by focusing on items that are written in a way that maximizes their content verifiability and objectivity. Controllability of item content should also be considered, but further research is necessary regarding the dimensions of controllability. 99 There may be items that have high criterion-related validity, yet are vulnerable to inflation. Test developers will need to balance the issue of transparency and possible inflation with prediction of performance. It would also be advisable to include an elaboration requirement for biodata items, where possible, to limit the item susceptibility to inflation. While high scores on the BIDR’s impression management scale are personality- related, and may be adaptive, scores on bogus items are not. The inclusion of bogus items is one method of identifying individuals who may be responding in a dishonest manner. However, some members of the public may find it inappropriate to ask “trick I questions”, and could generate publicity that makes the items themselves less useful. An : QM. inflation index is another suitable means of identifying responses that are suspicious, and the index used here will need to be validated with another sample. While use of one or more of these scales to identify inflation is possible in an applied setting, their implementation may be difficult. While conceptually, bogus items identifying those who have lied, and the empirical soundness of an inflation index, should be defensible reasons to exclude the scores of a respondent, the public relations problems that could ensue with implementation of such measures could make use of these scales infeasible. One problem that may arise with the use of corrections using inflation identification scales is the partialing out of variance that may be an important predictor of performance. Also, people do not necessarily inflate their responses in the same way, and standard correction method may not account for this (see discussion in Christiansen, et al., 1994). Also, some individuals may honestly have marked responses that indicate 100 their real but very unusual levels of experience, and such candidates may find themselves discriminated against, if such methods were put into practice. While we can use these techniques to identify inflation, what to do about respondents suspected of faking in a college admissions context is debatable. Decisions to dismiss dishonest applicants in such a high stakes setting will need to be based on far more extensive research than has been provided in this study, to withstand public scrutiny and possible litigation. If these tests were simply to be used for developmental purposes, bogus items and the inflation index would be useful markers for a counselor reviewing test results. Further research on these biodata and situational judgment measures would allow : in“. them to be refined so that they effectively capture the dimensionality of the criterion space that they are designed to predict. To be of practical value they should have construct and criterion-related validity, and this will need to be examined in different samples, while considering different item types. Research should also seek to establish the feasibility of using these measures in an admissions context and should investigate other possible uses for these tests. 101 APPENDICES APPENDIX A Mael’s Taxonomy 103 ; ’lili" Mael 's Taxonomy of Biodata Items Historical How old were you when you got your first paying job? External Have you ever been fired from a job? Objective How many hours did you study for your real-estate license test? First-hand How punctual are you about coming to work? Discrete At what age did you get your driver’s license? Verifiable What was your grade point average in college? Were you ever suspended from Little League? Controllable How many tries did it take you to pass the CPA exam? Equal access Were you ever class president? Job relevant How many units of cereal did you sell during the last calendar year? Noninvasive Were you on the tennis team in college? Future or hypothetical What position to you think you will be holding in 10 years? What would you do if another person screamed at you in public? Internal What is your attitude towards friends who smoke marijuana? Subjective Would you describe yourself as shy? How adventurous are you compared to your coworkers? Second-hand How would your teachers describe your punctuality? Summative How many hours do you study during an average week? Nonverifiable How many servings of fresh vegetables do you eat every day? will“ Noncontrollable How many brothers and sisters do you have? Nonequal access Were you captain of the football team in high school? Not Eb relevant Are you proficient at crossword puzzles? Invasive How many young children do you have at home? Note. Adapted from “A Conceptual Rationale for the Domain and Attributes of Biodata Items,” by F. A. Mael, 1991, Personnel Psychology, p. 773. 104 APPENDIX B Sample Questionnaire PID:A FORMl The first booklet asks questions pertaining to how you approach other people or life in general. You will also complete some demographic questions. You will use the first scantron for these questions. The second booklet contains questions that ask you about your history and life experiences. You will also be presented with descriptions of problem situations, and you will indicate which action you would be most likely to take and which action you would be least likely to take. These are situational judgment tasks. You will use the second scantron for these questions. As you are answering these questions, please record your answers on the scantron form. For each question, please fill in completely the circle you choose. Where you are asked to elaborate on your answer, please write your response on the lines provided in your exercise booklet. First, please take a moment to complete the following areas of your first scantron: PID — Please write in your PID, and then fill in the corresponding circles. Form — Please indicate Form 1 A Also, please indicate your PID on the cover of this booklet, at the top right hand comer. You will have 90 minutes to complete this study. 106 The following pages contain phrases describing people's behaviors. Use the rating scale below to describe how accurately each statement describes you and please provide answers that describe yourself as you generally are now, not how you wish to be in the future. Describe yourself as you honestly see yourself, in relation to other people you know of the same sex as you are, and roughly your same age. So that you can describe yourself in an honest manner, your responses will be kept in absolute confidence. Please read each statement carefully. Please use the five-point scale below: 1 = Very Accurate 2 = Moderately Accurate 3 = Neither Accurate nor Inaccurate 4 = Moderately Inaccurate 5 = Very Inaccurate 90. Make people feel at ease. : «m» 91. Am not interested in abstract ideas. 92. Change my mood a lot. 93. Don't like to draw attention to myself. 94. Talk to a lot of different people at parties. 95. Have excellent ideas. 96. Insult people. 97. Follow a schedule. 98. Am exacting in my work. 99. Get stressed out easily. 100. Seldom feel blue. 101. Don't mind being the center of attention. 102. Worry about things. 107 Please use the five-point scale below: 1 = Very Accurate 2 = Moderately Accurate 3 = Neither Accurate nor Inaccurate 4 = Moderately Inaccurate 5 = Very Inaccurate 103. 104. 105. 106. 107. 108. 109. 110. 111. 112. 113. 114. 115. 116. 117. 118. 119. 120. Have little to say. Don't talk a lot. Use difficult words. Keep in the background. Have difficulty understanding abstract ideas. Make a mess of things. Pay attention to details. Am always prepared. Feel little concern for others. Have a rich vocabulary. Like order. Often feel blue. Am full of ideas. Spend time reflecting on things. Take time out for others. Have frequent mood swings. Have a soft heart. Am quick to understand things. 108 Please use the five-point scale below: 1 = Very Accurate 2 = Moderately Accurate 3 = Neither Accurate nor Inaccurate 4 = Moderately Inaccurate 5 = Very Inaccurate 121. 122. 123. 124. 125. 126. 127. 128. 130. 131. 132. 133. 134. 135. 137. 138. 139. Am interested in people. Start conversations. Am the life of the party. Neglect my duties. Am relaxed most of the time. Am not interested in other people's problems. Often forget to put things back in their proper place. Feel others' emotions. Sympathize with others' feelings. Do not have a good imagination. Get irritated easily. Am easily disturbed. Get chores done right away. Am not really interested in others. Am quiet around strangers. Feel comfortable around people. Leave my belongings around. Have a vivid imagination. Get upset easily. 109 ' UH." Please answer this next set of questions using the five-point scale below. I = Very true 2 = Mostly true 3 = Somewhat true 4 = Mostly untrue 5 = Very untrue 140. My first impressions of people usually turn out to be right. 141. It would be hard for me to break any of my bad habits. P1 142. I don't care to know what other people really think of me. 143. I have not always been honest with myself. 144. I always know why I like things. _. B 145. When my emotions are aroused, it biases my thinking. 146. Once I've made up my mind, other people can seldom change my opinion. 147. I am not a safe driver when I exceed the speed limit. 148. I am fully in control of my own fate. 149. It's hard for me to shut off a disturbing thought. 150. I never regret my decisions. 151. I sometimes lose out on things because I can't make up my mind soon enough. 152. The reason I vote is because my vote can make a difference. 153. My parents were not always fair when they punished me. 154. I am a completely rational person. 155. I rarely appreciate criticism. Please stop and check that you have just completed the first side of the first scantron. 156. I am very confident of my judgments. 157. It's all right with me if some people happen to dislike me. 110 Please continue using the five-point scale below. 158. 159. 160. 161. 162. 163. 164. 165. 166. 167. 168. 169. 170. 171. 172. 173. 174. 175. 176. 177. 1 = Very true 2 = Mostly true 3 = Somewhat true 4 = Mostly untrue 5 = Very untrue I don't always know the reasons why I do the things I do. I sometimes tell lies if I have to. I never cover up my mistakes. There have been occasions when I have taken advantage of someone. I never swear. I sometimes try to get even rather than forgive and forget. t I always obey laws, even if I'm unlikely to get caught. 7 i '1‘ I have said something bad about a friend behind his or her back. When I hear people talking privately, I avoid listening. I have received too much change from a salesperson without telling him or her. I always declare everything at customs. When I was young I sometimes stole things. I have never dropped litter on the street. I sometimes drive faster than the speed limit. I have done things that I don't tell other people about. I never take things that don't belong to me. I have taken sick-leave from work or school even though I wasn't really sick. I have never damaged a library book or store merchandise without reporting it. I have some pretty awful habits. I don't gossip about other people's business. 111 178. Please enter 8 on your scantron for this item. Please fill in the appropriate answer on your form according to the responses provided. For questions 90 and 91, use the following scale: = less than 1.00 _ = 1.00 to 1.49 a = 1.50 to 1.79 = 1.80 to 2.09 = 2.10 to 2.39 = 2.40 to 2.69 = 2.70 to 2.99 H = 3.00 to 3.39 . =3.40 to 3.59 .. 2,... = 3.60 or greater - renew: rm 9.0.6:» 179. Cumulative college GPA: 180. GPA for this past semester: 181. Indicate the extent to which you have missed regularly scheduled class(es) in the past six months. I have never missed class Imissed 1-3 classes. I missed 4-8 classes. I missed 9-15 classes. I missed more than 15 classes. .09-99‘?» 112 If you have missed class in the past six months, indicate the reasons you missed class. Please mark: a = Yes b=No 182. You were faced with an emergency. 183. You were sick. 184. You partied too much the night before. 185. You were tired or you failed to get up in time. 186. You were talking or socializing with friends. 187. You were involved with another university event and couldn’t go. 188. You found the class boring. 189. You did not believe the instructor would cover anything new or important. Demographics — Please respond to the following questions to the best of your ability. 190. 191. 192. “-"r'e‘qo wee-99‘s» .09-99‘!» What is your age? 18 19 20 21 22 23 24 25 26 27 + What is your gender? male female What is your year in school? freshman sophomore junior senior 5‘h year + 113 at!!!- 193. 9‘1” 1 94. 195. r'P‘qo rm .o-P .ch Which of the following best characterizes you? US. Citizen Non-citizen — Canadian Non-citizen — other Is English your primary language? yes no What ethnicity do you consider yourself to be? Mexican American Puerto Rican Other Hispanic American Indian or Alaskan native Asian Black/African American Caucasian/White/Not of Hispanic origin Native Hawaiian or other Pacific Islander Other You have now completed the first section of your form. Please raise your hand to let the proctor know that you have finished this section. Now please wait quietly until everyone has completed this section. 114 ~ in.» 1‘3, - - PIDzA You are now beginning the second section. Please take a moment now to prepare the second scantron: PID — Please write in your PID, and then fill in the corresponding circles. Form — Please indicate Form 1 B Also, please indicate your PID on the cover of this booklet, at the top right hand comer. - - * M" 115 Please read these instructions very carefully: Imagine that you are applying for admission to Michigan State University, and your responses to these questions could influence the decision on whether or not to accept you for admission. In other words, imagine that this questionnaire is part of the test requirements for college admissions, and admission here is very important to you. Complete this questionnaire in a way that presents yourself honestly but in the best light possible so that you are most likely to get admitted to the university. As an added incentive to do well, participants in this study who score above the 50th percentile on this questionnaire will receive $10. : " it!!- Now please proceed, answering the remainder of the questions in this study. 116 Request for Payment Information Please provide the following information to allow us to mail a check to you if you meet the required score. Those participants who score above the 50th percentile on this questionnaire will be mailed $10. Name: Address line 1: Address line 2: City: State: Zip: 117 Following are some biographical data questions: (Biodata items are inserted here) You will be presented with descriptions of problem situations. Each problem has between four and seven alternative actions that might be taken to deal with the problem. You are to make two judgments for each problem. First, decide which alternative you would be MOST LIKELY to take in response to the problem. It might not be exactly what you would do in the situation, but it should be the alternative that comes closest to what you would actually do. Record your answer on the scantron form. Second, decide which alternative you would be LEAST LIKELY to take in the situation, and record your answer on the scantron. Please read all of the alternatives before deciding. (Situational judgment items are inserted here) You have now completed the study. Please wait quietly until everyone has finished their work. Thank you for your participation. 118 APPENDIX C Sample Biodata Items 119 : Viki." 1. During the past year, how many times out of self-interest have you searched for information about other regions, countries, or cultures (at the library or on the Internet)? f. 0 g. 1-3 h. 4-7 i. 8-12 j. more than 12 If you answered b, c, d, or e, briefly describe up to 5 countries or cultures and the topic that you investigated. 2. How many times in the past year have you tried to get someone to join an activity ‘ ”W in which you were involved or leading? a. never b. once c. twice (1. three or four times e. five times or more 3. In the past six months, how often did you read a book just to learn something? a. never b. once c. twice d. three or four times e. five times or more If you answered b, c, d, or e, briefly describe up to 4 books you read and what you wanted to learn. NOte. For a further information on the complete set if items, please contact the College Board, New York, New York. APPENDIX D Sample Situational Judgment Items 121 tfuj.‘ . You will be presented with descriptions of problem situations. Each problem has between four and seven alternative actions that might be taken to deal with the problem. You are to make two judgments for each problem. First, decide which alternative you would be MOST LIKELY to take in response to the problem. It might not be exactly what you would do in the situation, but it should be the alternative that comes closest to what you would actually do. Record your answer on the scantron form. Second, decide which alternative you would be LEAST LIKELY to take in the situation, and record your answer on the scantron. Please read all of the alternatives before deciding. . (II. 2.97 Your grade for a particular class is based on three exams, with no class attendance requirement. All of the homework requirements for the class are posted on the professor’s web site. What would you do? 3. Attend class for as long as you feel that it is helping your grades. b. Do all the homework but only go to some of the lectures. It’s the exams that count. i c. Go to all the classes anyway. The professor may say something important. . . o . . -: V i"? d. Skrp classes, but if you did poorly on the first exam, start gorng to classes. C. There is no need to go to classes. Just get the homework done, and pass the exams. 4. What would you be most likely to do? 5. What would you be least likely to do? You are finding your freshman year very difficult. The courses are hard, and you feel your grades are not satisfactory. Material in class seems to be covered very quickly. What would you do? 3. Talk with the professors and TAs to get help on how to study. b. Find a study partner and work on homework and class material together. c. Talk to your parents and an advisor. (1. Study hard, try your best, and don’t worry about it. 6. Talk to my advisor and teachers; see if there are study groups or review sessions I can attend. f. Hire a tutor for the difficult classes. 6- What would you be most likely to do? 7- What would you be least likely to do? Note. For a further information on the complete set ifitems, please contact the College Board, New York, New York. APPENDIX E Wording of Instruct-ion Sets -;. 7* ihlv" I...“ I 'flt” Motivated Warned Group: “Please read these instructions very carefully: Imagine that you are applying for admission to Michigan State University, and your responses to these questions could influence the decision on whether or not to accept you for admission. In other words, imagine that this questionnaire is part of the test PS requirements for college admissions, and admission here is very important to you. Complete this questionnaire in a way that presents yourself honestly but in the best light possible so that you are most likely to get admitted to the university. 3' As an added incentive to do well, participants in this study who score above the 5 0(11 percentile on this questionnaire will receive $10. Note that we may verify a subset of your responses, and if you respond dishonestly, that may invalidate this test as well as your chance to receive $10 for high Performance. Now please proceed, answering the remainder of the questions in this study.” Motivated Not Warned Group: “Please read these instructions very carefully: Imagine that you are applying for admission to Michigan State University, and your responses to these questions could influence the decision on whether or not to accept you for admission. In other words, imagine that this questionnaire is part of the test requirements for college admissions, and admission here is very important to you. Complete this questionnaire in a way that presents yourself honestly but in the best light possible so that you are most likely to get admitted to the university. « ,. m:- As an added incentive to do well, participants in this study who score above the 50‘h percentile on this questionnaire will receive $10. Now please proceed, answering the remainder of the questions in this study.” a"'. 9““.‘I '_ . 'D .2 l I Not Motivated Warned Group: “Please read these instructions very carefully: The following questionnaire is being tested as a way to collect information about high-school students who are applying to go to college. We would like your straightforward, honest answers to these questions. Your responses are strictly confidential, and they will not be used to evaluate you in any way, so please provide answers that are as honest and accurate as possible. Note that we may verify a subset of your responses, and if you respond dishonestly, that may invalidate this test. Now please proceed, answering the remainder of the questions in this study.” 126 Not Motivated Not Warned Group: “Please read these instructions very carefully: The following questionnaire is being tested as a way to collect information about high-school students who are applying to go to college. We would like your "1 straightforward, honest answers to these questions. Your responses are strictly I confidential, and they will not be used to evaluate you in any way, so please provide answers that are as honest and accurate as possible. _ 3"! .. v‘rbt“ Now please proceed, answering the remainder of the questions in this study.” APPENDIX F Informed Consent Form — Motivated Group ) ‘lP-‘i' 128 Predictors of Student Success - Informed Consent Please read and sign below: In the project in which you are participating, we will be asking you to respond to a series of questions. The first two sets of questions are measures of judgment and of background experiences and preferences; they are experimental measures designed to be related to outcomes of students attending a college or university. The major purpose of this project is to investigate how well students do on these measures, given the instructions to take them, and whether the measures of judgment and background are related to your MSU grades. We are also asking you to respond to some commonly used personality measures which will help us interpret the meaning of your responses to the judgment and background measures. Because a major purpose of our study is to determine if your responses to the judgment and background measures are related to your performance as a student at MSU, we will also be asking your permission to allow the registrar to give us access to your grades and to the Office of Admissions to allow access to your high school grades and ACT/SAT scores. In order to link your responses to the measures with your college and high school grades, and ACT/SAT scores, we will be asking you to provide your PID. All information you provide will be completely confidential. Only the project team (two faculty members and three graduate students) will have access to the password-protected data file with the original PID attached, and all data will be reported at the group level so that no one will be able to identify a particular person. As soon as we link your responses to the data from Admissions and the Registrar’s Office, your PID will be deleted from our data file. Your privacy will be protected to the maximum extent allowable by law. We expect that it will take 90 minutes for you to complete this study, for which you will earn extra credit in Psychology 101. Participation in this study is voluntary. As an alternative to participation in this study, you may do other work, such as a paper, that is Coordinated with your instructor. As an incentive to do well on this questionnaire, those Participants who score above the 50’h percentile on this questionnaire will be given $10. By signing below you indicate that you are free to refuse to participate in this project or any part of the project. You may refuse to answer some of the questions and may discontinue your participation at any time without penalty. If you have any questions or concerns about your participation in this project, you can Call Neal Schmitt (517-355-8305) or send an email message to Schmitt@msu.edu. If you have questions or concerns regarding your rights as a study participant, or are dissatisfied at any time with any aspect of this study, you may contact - anonymously if you wish - AShir Kumar, M.D., Chair of the University Committee on Research Involving Human Subjects (UCRIHS) by phone: (517) 355-2180, fax: (517) 432-4503, e-mail: uCrihs@msu.edu, or regular mail: 202 Olds Hall, East Lansing, MI 48824. Your signature below indicates your voluntary agreement to participate in this study. Signature Date 130 APPENDIX G Informed Consent Form — Not Motivated Group 131 ,llkii' Predictors of Student Success - Informed Consent Please read and sign below: In the project in which you are participating, we will be asking you to respond to a series of questions. These questions are measures of background experiences and preferences, experimental measures designed to be related to outcomes of students attending a college or university. The major purpose of this project is to investigate how well students do on these measures, given the instructions to take them, and whether the measures of judgment and background are related to your MSU grades Because a major purpose of M! our study is to determine if your responses to the judgment and background measures are related to your performance as a student at MSU, we will also be asking your permission to allow the registrar to give us access to your grades and to the Office of Admissions to allow access to your high school grades and ACT/SAT scores. In order to link your responses to the measures with your college and high school grades, _ J and ACT/ SAT scores, we will be asking you to provide your PID. All information you provide will be completely confidential. Only the project team (two faculty members and three graduate students) will have access to the password-protected data file with the original PID attached, and all data will be reported at the group level so that no one will be able to identify a particular person. As soon as we link your responses to the data from Admissions and the Registrar’s Office, your PID will be deleted from our data file. Your privacy will be protected to the maximum extent allowable by law. rd We expect that it will take 90 minutes for you to complete this study, for which you will earn extra credit in Psychology 101. Participation in this study is voluntary. As an alternative to participation in this study, you may do other work, such as a paper, that is coordinated with your instructor. By signing below you indicate that you are free to refuse to participate in this project or any part of the project. You may refuse to answer some of the questions and may discontinue your participation at any time without penalty. If you have any questions or concerns about your participation in this project, you can call Neal Schmitt (517-355-8305) or send an email message to Schmitt@msu.edu. If you have questions or concerns regarding your rights as a study participant, or are dissatisfied at any time with any aspect of this study, you may contact — anonymously if you wish - Ashir Kumar, M.D., Chair of the University Committee on Research Involving Human Subjects (UCRIHS) by phone: (517) 355—2180, fax: (517) 432-4503, e-mail: ucrihs@msu.edu, or regular mail: 202 Olds Hall, East Lansing, MI 48824. Your signature below indicates your voluntary agreement to participate in this study. Signature Date NH.“ 133 APPENDIX H Faking Study Protocol am.“ 134 PROTOCOL — Faking Study College Board — Fall 2002 data collection Before test administration: Subiects: 1.) Find out how many subjects will be participating and bring enough of all supplies for each subject plus a few extras. (check subject pool for # of subjects signed up 30 minutes 1 prior to proctoring). 3_ How to check for number of subjects: http://psychology.msu.edu/SubjectPool/Welcome.asp Sign in, view/modify experiment sessions on lefi side, then choose “Predictors of Student Success”, and look at “subjects signed up”. 2‘? IN" 2.) Print offthe list of names. 1 Materials: There are four slightly different forms for this study. Materials have been prepared and placed in envelopes numbered one through four. As forms have slightly different consent and personal information collection forms, all of these are already in the envelopes. You will need to take enough materials so that you have (the number of participants divided by four) of each packet, plus a few extras in case of problems. Please have these materials organized so that materials can be distributed fairly quickly. 1.7.1 Sign-in sheet (fill in date, location, and proctor name). Take a new sign-in sheet each time. Sufficient copies of questionnaire envelopes Extra 10-option scantrons Debriefing forms Pencils -- please sharpen them all beforehand. Make sure you have a watch or some way to tell time with you. Stamp and stamp pad Green Sheets for giving credit Elli—11311311311318 During the test administration: Procedure: 0 Arrive at least 15 minutes before the start of the session. 0 Place the envelopes on the desks, every other seat, or spaced apart further than that if there are not many pe0ple signed up. Distribute the four different forms systematically, one through four, as you lay them out, so that the different forms are evenly distributed around the room. 0 As people enter room, ask them to sign the sign-in sheet. Check ID to make sure Tl that face, name, and what they write on the sign in sheet all match. 0 If an individual shows up who has not signed up through the website, thank them for coming and ask them to sign up online for a different session. If subjects forget their ID but they are signed up, let them participate. J 3 an- 0 As they come in, tell them that they may open their envelopes, and review the . informed consent and date release forms while they wait for the session to begin. It is okay for subjects to sign forms in pencil. 0 Start about 5 minutes after designated start time, or earlier if all participants have arrived. Those who are late may stay late. 0 Read the script verbatim. 0 Collect Informed Consent and Data Release. 0 Time for test: 90 minutes. 0 Participants will stop working at the end of the first section, and raise their hands to let you know they are done. When everyone has completed section one, you will tell everyone to begin section two at the same time. They should leave their envelopes containing section 2 under their seats until it is time to move on to section 2. (This is done because the room having a coaching session will also be stopped at this point to begin coaching before moving on.) 0 As you administer the test, (a) at 7:00 PM post the time remaining as 30 minutes, and at 7:15 PM as 15 minutes, (b) announce the time aloud as you post the time. o If they finish early, you may start to collect their forms ~10 minutes before the end of the session. 0 As you collecting the materials, check for the following, with the student: PID (on both scantron sheets) Form # and A or B (on both scantron sheets) Completed scantron sheets (should be completed through #106 on first scantron sheet and #97 on the second sheet) Booklets with elaborations, with PID and name on the booklets If a participant has incomplete information, ask them to stay to complete the form. 131131 1311313 When you collect materials, give the student the debriefing form. Stamp the white cards that the students brought and fill in 3 half-hour credits. If they forgot their card, fill out a green sheet for them. Name of experiment is Predictors of Student Success. Experiment stamp # is MSU/PSYCH/ 101, Experimenter is Neal Schmitt. If a participant decides not to participate, just let them leave and thank them for their time. Do not answer any content related questions nu.»- Script “Hi. Please sign-in and may I check your ID? You may take a seat, open your packet and look at the consent forms, but do not begin.” “Hello, Thanks for participating in our study about the characteristics and experiences of college students. In order to ensure that the instructions are identical each time we run this study, I’m required to read these instructions verbatim. My name is and I will be proctoring this data-collection session (“along with ” — add this if you have two people proctoring). This session lasts about an hour and a half, and so let me say in advance that we appreciate your time and help with this project. Hw- It is important that this is your first year of college, and that you have not previously participated in this study. If either of these criteria render you ineligible, please let me know now. What I(we) have provided is an envelope containing two booklets containing questions for you to read over and complete. You’ll fill out the scantrons, and record some answers on your booklets. I(we) will post and announce the time when you have only 30 minutes left to complete it. The time is posted just to make sure everything runs smoothly — you should have plenty of time, so don’t worry about having to rush. Please answer everything thoughtfully. It is important that you answer these questions seriously and follow the instructions closely. When you have completed the first section, please stop, and raise your hand. Do not move on to the second section until I ask you to do so. When you are through with the entire study, please wait patiently until everyone else is finished with theirs. If you know now that you will not be able to stay for the entire hour and a half, please sign-up for a different session of the study where you will be able to stay for the entire duration. Now, please read and sign the lnfonned Consent form, which provides more information on the nature of the task. With the lnfonned Consent form are two Data Release forms we also need you to sign. When you are finished we will collect those forms.” 138 “Please bubble-in your PID and form number and letter A on the first scantron immediately. Please write your PID at the top right corner of the question booklet. Keep the questionnaire for the first section on your desk, and place the envelope containing the second questionnaire packet under your seat.” “Please raise your hand any time if you have any questions or problems, for example, if there is a page missing in your packet or you don’t understand something you’re reading. 51 Do you have any questions?” 1' You may now begin.” ‘ “Please bubble-in your PID and form number and letter B on the second scantron. Please 5’ "i" write your PID at the top right comer of the second section question booklet. It is very important that you follow all instructions in this section carefully. You may now begin the second section.”