I * LIBRARY
Michigan State
University

 

This is to certify that the
thesis entitled

SOURCE OF EXPERTISE IN SCORING KEY
DEVELOPMENT AS A DETERMINANT OF THE NATURE OF
THE CONSTRUCTS MEASURED

presented by

ABIGAIL K. QUINN

has been accepted towards fulfillment
of the requirements for the

MA. degree in Psychology

 

 

Lima/M

1"" "
O

Major'Professor’s Signature
/
5 //s /m?
/ / '

Date

MSU is an Afﬁrmative Action/Equal Opportunity Employer

 

.u.-n-—.-.—._.—-—-.--—.—-.—-c-a-.-.-.-.—.--— —A i— 4 — —— -

 

PLACE IN RETURN BOX to remove this checkout from your record.
TO AVOID FINES return on or before date due.
MAY BE RECALLED with earlier due date if requested.

 

DATE DUE DATE DUE DATE DUE

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

5/08 K:lProjIAoc&Pres/ClRC/DateDue.indd

 

SOURCE OF EXPERTISE IN SCORING KEY DEVELOPMENT AS A
DETERMINANT OF THE NATURE OF THE CONSTRUCTS MEASURED
By

Abigail K. Quinn

A THESIS
Submitted to
Michigan State University

in partial fulﬁllment Of the requirements
for the degree of

MASTER OF ARTS
Psychology

2009

ABSTRACT

SOURCE OF EXPERTISE IN SCORING KEY DEVELOPMENT AS A DETERMINANT OF
THE NATURE OF THE CONSTRUCTS MEASURED

By

Abigail K. Quinn
The purpose of this study was to examine whether the selection of subject matter experts (SMES)
who provide scoring judgments during the development of a situational judgment test (SJT) has
the potential to affect which constructs the SI T ultimately measures. It was hypothesized that
different groups of SMEs would have different implicit theories of performance because Of
shared traits and experiences. Three groups of SMEs, graduate students, resident advisors, and
undergraduate students provided scoring judgments for an SJ T designed to measure college
student performance. The scoring judgments made by the three groups were fairly similar.
Correlations of scores based on the scoring keys developed by each group with a variety of
performance criterion measures demonstrated that the scoring keys predicted performance
equally well. Results suggest that SME choice did not impact the constructs measured by the
SJT. Results of verbal protocol analyses conducted with a sub-group of each SME group
indicate that the group members did provide different reasoning for their responses, but the
differences did not affect construct measurement. Manipulation checks, however, demonstrated
that the three groups chosen as SMEs did not differ in predicted ways. For this reason, the

theory cannot be fully discounted. Limitations of the methodology are discussed thoroughly.

TABLE OF CONTENTS

List Of Tables ..................................................................................................................... iv
List of Figures ..................................................................................................................... v
Introduction ......................................................................................................................... 1
Review of Literature on SIT Validity and Subgroup Differences ................................... 3
Development of SJT s ....................................................................................................... 6
Empirical versus Rational Keying ............................................................................ 13
From where do the Criteria SMEs Use Come? ........................................................ 15
Hypotheses .................................................................................................................... 27
Method .............................................................................................................................. 3O
Participants and Procedures ........................................................................................ 30
Measures ....................................................................................................................... 34
Hypothesis 1 .................................................................................................................. 43
Hypothesis 2 .................................................................................................................. 49
Hypothesis 3 .................................................................................................................. 53
Discussion ......................................................................................................................... 63
Limitations and Future Research .................................................................................. 66
Practical Implications ................................................................................................... 71

Appendix A: Expected Relationships between Scores Based on the Scoring Keys

Developed by each SME Group and External Measures and Manipulation Check ......... 73
Appendix B: Coding Exercise for [/0 Graduate Students ................................................ 74
Appendix C: Situational Judgment Test ........................................................................... 75
Appendix D :_ Behaviorally Anchored Rating Scales ......................................................... 86
Appendix E: Behavioral Observation Scales ................................................................... 89
Appendix F: IPIP Sub-scales .......................................................................................... 114
Appendix G: Rating Task and Demographics for SMEs ................................................ 1 16
Appendix H: Instructions for Verbal Protocol Task ....................................................... 119
Appendix I: Coding Scheme." .......................................................................................... 120
References .......................................................................................... 1 08

iii

LIST OF TABLES

Table 1. Coding Of response options in phase 1 ............................................................... 30
Table 2. Manipulation checks ........................................................................................... 37

Table 3. Comparison of effectiveness ratings of the SJT response options across SME
groups ................................................................................................................................ 45

Table 4. Comparison of differences across groups for ratings Of effectiveness of Options
in the academic dimension (k = 34) .................................................................................. 48

Table 5. Comparison of differences across groups for ratings of effectiveness of options
in the leadership/service dimension (k = 14) .................................................................... 48

Table 6. Comparison Of differences across groups for ratings of effectiveness of Options

in the social dimension (k = 21) ........................................................................................ 49
Table 7. Comparisons Of correlations for hypothesis 2 .................................................... 50
Table 8.Results from verbal protocol analysis .................................................................. 54

iv

LIST OF FIGURES

Figure 1. SJ T items and response Options from the verbal protocol and the most common
codes ................................................................................................................................. 57

Source of Expertise in Scoring Key Development as a Determinant of the Nature of the
Constructs Measured
Introduction

My purpose in writing this paper is to challenge the practice of selecting subject
matter experts based on convenience to help with test development and scoring Of tests
based on convenience. If our tests are designed to distinguish between individuals who
will truly perform desirably versus individuals who will not, we need to ensure that the
subject matter experts we select tO provide information are actually experts in the
occupation or discipline of interest. If subject matter experts (SMES) are selected fairly
arbitrarily (for example, graduate students high in cognitive ability but without
appropriate expertise), then do the measures they help us develop truly tap the constructs
we are targeting? In other words, if a graduate student without expertise in an area
develops a scoring key for a selection instrument designed for a speciﬁc occupation, then
won’t the individuals who score highly on that instrument be more similar to the graduate
student than to the experienced worker in that occupation?

One area in which SMES play a critical role is in the development of scoring
rubrics for exercises in which several alternative courses of action are available. The
concern about the level of expertise necessary to play this important role has surfaced
most recently in the literature on situational judgment tests (SJ Ts).

I suggest that different SMES utilize different criteria for scoring SJTs. I will
present research suggesting that SMES drawn from the same groups (either broadly or
speciﬁcally deﬁned; for example, all employees within one organization or the members

of a specific team within an organization) should reference similar criteria because, over

time, these groups tend to become fairly homogenous with respect to past experiences
and traits, both of which contribute to decision making in a test-taking situation.
Similarly, they should reference different criteria than SMES in different groups. If
different groups of SMES are referencing different criteria when they develop scoring
keys for SJTS, then we need to be precise about whom we select as our group Of SMES to
ensure that they are referencing the criteria we purport to measure.

I will begin by reviewing literature that demonstrates that SJTS are valid and
practical predictors of performance and suggest that, although researchers have yet to
discover what variables affect the constructs measured by an SJT, trying to do so is a
worthy goal because of how widely SJTs are used in practice and because understanding
the constructs measured will allow users to better tailor the SJ T items to the knowledge,
skills, and abilities required on the job. In order to better understand, predict, and control
applicant responses, we really need to know what the tests measure. For example,
knowledge of what a test measures can help us in understanding why subgroup
differences occur and can also help us to make informed decisions about how to put
measures together in a single selection system without creating redundancy.

I will describe the process by which an SJT is typically developed and review
how different aspects of the development process may influence ultimate construct
measurement. I will focus speciﬁcally on the development of a scoring key. I argue that
in order to fully understand the constructs measured by an SJ T, we have to understand the
nature of the scoring for that SJT. I will present evidence to suggest that individuals
drawn from different groups reference different criteria when responding to test items

(which, in the case of SJTS, is how we most commonly develop scoring keys) and that

individuals within groups are more likely to reference similar criteria than individuals
between groups.
Review of Literature on SIT Validity and Subgroup Differences

A situational judgment test (SJ T) is a low-ﬁdelity simulation in which a test-taker
is presented with a work-related situation accompanied by a list Of responses to the
situation and is asked to select either the best and worst options or those he or she would
be most and least likely to perform (Motowidlo, Dunnette, & Carter, 1990). SJTs are
widely used in employee selection contexts for three main reasons. The ﬁrst reason is
that they have been shown to be highly predictively valid; the second reason is that SJTs
exhibit smaller subgroup differences than other common selection procedures, and the
third reason is that applicants tend to react favorably tO the measures and to perceive
them as face valid.

In 2001, McDaniel. Morgeson, Finnegan, Campion, and Braverman conducted a
meta-analytic review of the criterion-related validity of SJTs and estimated the
population validity Of such measures to be .34. This indicates that approximately 12
percent of the variance in subsequent job performance can be explained by a test-taker’s
score on the SJT. They also suggested that this value may be downwardly biased because
the estimates of SJ T validity were not corrected for range restriction. They estimated that
the criterion-related validity of most SJ Ts should fall within the range of .21 to .41 and
that the speciﬁc validity coefﬁcient for an individual test may be moderated by a number
of factors such as the content included in the test or whether or not the test was developed

based on a job analysis. Perhaps another moderator of SJT validity, which will be

explored further in this paper, might be the suitability Of the rubric developed for scoring
the test.

SJTS have typically been shown to have lower levels of adverse impact than
cognitive ability or verbal ability measures. Pulakos and Schmitt (1996) compared the
adverse impact levels of a verbal ability measure with those of an SJ T. Whereas the
verbal ability test led to a standardized mean difference (d) between Whites and African
Americans of 1.03, the SJT had a d Of .41, a reduction in d of .62. The ﬁndings of
Motowidlo and Tippins (1993) of a White-African American (1 of .32 and of Motowidlo
et a1. (1990) of White-African American d’s of .14 and .29 in two different samples
support the ﬁndings of Pulakos and Schmitt and indicate that the adverse impact levels of
SJTs are typically lower than those of traditional cognitive ability tests (White — African
American d = 1.00; Sackett, Schmitt, Ellingson, & Kabin, 2001). Whetzel, McDaniel,
and Nguyen (2008) conducted a meta—analysis of race differences on SJT performance.
They found a White — African American (I of .38, a White — Hispanic d of .24, and a
White -— Asian (1 of .29. They also found evidence that mean race differences in SJ T
scores are largely due to differences in cognitive ability. The more an SJT overlapped
with a measure of cognitive ability, the higher the racial differences found for that SJT.
Although the reasons for why SJ Ts demonstrate less adverse impact than more traditional
tests are not yet deﬁned, Sackett et al. (2001) described evidence that supplementing
cognitive predictors with noncognitive predictors relevant to job performance (yet not
correlated with the cognitive predictors) can lead to reductions in adverse impact. It is
probable that SJ Ts are measuring a variety of constructs in addition to cognitive ability

that are related to performance.

Test takers and test developers tend to like SJTs because they at least appear to be
more highly related to the jobs of interest than many more typical selection assessments
(Bauer & Truxillo, 2006). Because SJTs include descriptions of situations that
individuals might encounter on the job, they appear more face valid than more typical or
abstract assessments, such as cognitive ability measures, which may seem less related to
experiences on the job to a lay person.

Despite the validity and practicality of SI Ts, researchers have been unable to
determine exactly what SJ Ts measure. A number of researchers have attempted to
understand the construct validity of SJ Ts by correlating SJT scores with other established
measures. The results of such studies suggest that different SJTs are not consistently
measuring the same constructs, but that individual SJTs can be designed to measure
different constructs.

Perhaps the largest and most conclusive body of work in this area is the literature
that relates SJT scores to cognitive ability. McDaniel et a1. (2001) conducted a meta-
analysis and found a mean corrected correlation of .46 with general cognitive ability
measures, although there was signiﬁcant variability around the mean (with a credibility
interval of .17 to .75). Their results indicated that SJTs vary considerably in regard to
their relationship with cognitive ability.

Because SJTS tend to include job-speciﬁc situations, it might be expected that job
knowledge would be highly correlated with SJT scores. Similar to cognitive ability,
however, meta-analytic results do not support consistency across SJTs. McDaniel &
Nguyen (2001) reported an average correlation of .07, excluding a large study in which

the correlation was negative (Clevenger & Haaland, 2000). Once again, the credibility

interval was large (-.14 to .29), indicating that SJTs vary considerably in the strength of
their relationship with job knowledge.

Perhaps not surprisingly, SJTs have also been found to vary considerably in their
relationship to personality measures. In their meta-analyses, McDaniel and Nguyen
(2001) and McDaniel et a1. (2007) examined the relationship between SJT scores and the
Big Five personality traits. For almost all of the traits (excluding Openness, for which
there were fewer estimates of effect size), the credibility intervals were large, once again
indicating that SJ Ts vary greatly in their relationships to various constructs (Chan &
Schmitt, 2006). The widely varying correlations with measures of relatively well-
established constructs suggests that SJ Ts may be developed, either purposefully or
inadvertently, to measure a variety of constructs.

If SJTs can be developed to measure a variety of constructs, the question remains
as to what aspects of SI T development affect construct measurement. In the following
section, I explore the little empirical evidence and speculation that exists about how SJT
development may affect construct measurement.

Development of SJT s

There are four basic steps in the development of an SJ T that may affect construct
measurement. The ﬁrst step is the generation of a set of item stems, or situations, that
serve as the base of each question. Next, the response options for each item stem are
developed. Test developers must make a decision about what type of instructions to
provide to test takers and then, ﬁnally, a scoring key is developed. Although my focus in
this paper is on how scoring key development affects construct measurement, I will ﬁrst

discuss how the three other parts of development may also do so.

The development of an SJ T begins with the development of item stems, the
descriptions of situations which are the most basic part of each question. The situations
used in the ﬁnal form of an SJ T are typically derived from critical incidents collected
from subject matter experts. The critical incidents are typically culled and reﬁned by the
test developer, who seeks to avoid too much redundancy and to cover as much of the
desired topic area as possible. After choosing a ﬁnal set of critical incidents, the test
developer transforms the incidents into situation-based questions of similar length that
use consistent language.

McDaniel and Nguyen (2001) described the following four characteristics along
which most item stems could be distinguished: ﬁdelity, length, complexity, and
comprehensibility. I will use these categorizations to examine how each characteristic
may affect the construct measured in the ﬁnal SJ T.

The ﬁdelity of the item stem refers to how similar the presentation of the situation
in the item stem is to how the situation would occur in real life. Chan and Schmitt (1997)
compared performance on two versions of the same SJ T with varying ﬁdelities. The item
stems of the ﬁrst SJT were presented as videotaped vignettes whereas the item stems of
the second SJ T were presented as written descriptions of the vignettes. The video-taped
vignettes would be considered to be of higher ﬁdelity than the written descriptions.
Although Chan and Schmitt were able to demonstrate that the content (and thus the
intended constructs measured) in the two versions of the SJ T was identical, they found
that performance on the written version was correlated with scores on a reading

comprehension test (an additional and unintended construct) whereas performance on the

videotaped version was not. Chan and Schmitt’s ﬁndings suggest that the presentation
style of item stems may affect which constructs an SJ T measures.

McDaniel and Nguyen’s (2001) second, third, and fourth categories (length,
complexity, and comprehensibility) are probably strongly related, so I will address them
together. Item stems can be written very simply or can involve complex situations
(necessitating the need, for example, to deal with multiple actors with conﬂicting
interests). In many cases, more complex item stems will be longer because it takes more
words to describe more intricate situations. The complexity and length of item stems
may also be related to the level of comprehensibility, the ease with which the meaning
and implications of the situation are discerned. Sacco et al. (2000) found evidence in two
studies (Sacco, Scheu, Ryan, & Schmitt, 2000; Sacco, Schmidt, & Rogg, 2000) that
performance on item stems with greater detail and complexity was related to reading-
level indices. In other words, the implication is that SJTs consisting of item stems with a
higher level of detail may be measuring more cognitively-loaded constructs, such as
reading level, regardless of whether or not such constructs are relevant to performance in
the situations described. McDaniel et al. (2001) reported seemingly contradictory
ﬁndings in their meta-analytic study. They found that SJTs with less detailed questions
were more highly related to general cognitive ability (r = .56) than those with more
detailed questions (r = .47). Although the empirical ﬁndings are contradictory, all three
studies indicate that the complexity and comprehensibility (and possibly the length) of
item stems may affect what constructs are measured.

Finally, it seems necessary to point out that item stem content may affect the

constructs measured by an SJ T. Although there is little empirical evidence to suggest

that intentionally writing item stems to measure speciﬁc constructs is the primary
predictor of the constructs ultimately measured, the underlying theory or set of
competencies used to develop a set of item stems is often presumed to affect construct
validity as intended.

Once the item stems for an SJT have been developed, response options are
developed for each stem. Response options are typically developed by a pool of subject
matter experts who provide examples of typical or desired responses to the situation
described in each item stem. Response options may also be written by the test
developer. In either method of response generation, SMES typically review the
developed responses to weed out unrealistic and redundant responses.

McDaniel and Nguyen’s (2001) four characteristics of item stems (ﬁdelity, length,
complexity, and comprehensibility) that may affect construct measurement are probably
applicable to item responses, as well. Most SJTS, even those that present item stems by
video, provide written item stems. It is plausible, however, to present videotaped item
responses along with item stems, which suggests that the evidence of the effects of
ﬁdelity on construct measurement are applicable to a discussion of item responses as well
as item stems. It also seems plausible to extend the ﬁndings and predictions that item
stem complexity, length, and comprehensibility may affect construct measurement
(possibly by affecting the cognitive resources required to respond). Sacco et al. (2000),
however did not ﬁnd evidence to support the idea that the stem-level reading effects
would extend to the level of the response option.

In several studies, researchers have attempted to develop response options that

reﬂect varying levels of a personality trait with the goal of developing an SJT that

measures that trait. For example, Motowidlo, Hooper, and Jackson (2006) developed an
SJ T with ﬁve item stems designed to tap extraversion, ﬁve item stems to tap
agreeableness, and ﬁve item stems to tap conscientiousness. For each item stem, they
then developed response options designed along a continuum of the trait of interest. So,
for example, for an item designed to measure agreeableness, the item responses described
behaviors that ranged from disagreeable to agreeable behaviors. The responses were also
coded by graduate research assistants and the coding was used to determine if a response
indicated a high or low level of the trait. Test-takers’ responses were correlated with
their scores on personality measures of agreeableness, extraversion, and
conscientiousness. For agreeableness and extraversion, the SJT was correlated with these
two personality scores, but correlations with conscientiousness were low and
nonsigniﬁcant. Several other studies have found similar, conﬂicting results (Beauregard,
2000; Motowidlo, Diesch, & Jackson, 2003; Ployhart & Ryan, 2000; Porr & Ployhart,
2004; Trippe & Foti, 2003), but evidence suggests that it may be possible to intentionally
design item response options to tap speciﬁc constructs.

A number of researchers have suggested that the transparency of response Options
is yet another possible aspect that may affect construct measurement, although this idea
has yet to be tested empirically (Hooper, Cullen, & Sackett, 2006; Weekley, Ployhart, &
Holtz, 2006). If certain response options are more socially desirable and test takers
endorse those items, then the SJ T would have the potential to become a test of social
desirability rather than a valid predictive measure. Potential solutions to this problem

include designing response options to be of equal social desirability or matching each

10

socially desirable response option with one that is equal on social desirability but not
predictive of performance.

In addition to developing the item stems and response options, test developers
have to decide what type of response instructions to provide to test takers. The two most
common instruction types are behavioral tendency instructions and knowledge
instructions (McDaniel, Hartman, Whetzel, & Grubb, 2007). Behavioral tendency
instructions ask test takers to select the responses that best describe how they would
behave in the given situation. In contrast, knowledge instructions ask test takers to
respond with what they think the most (and least) effective response would be. There is
some evidence to suggest that the type of instructions selected by test developers may
affect what constructs an SJT measures. In their meta-analysis, McDaniel et al. (2007)
found evidence that behavioral tendency instructions are more highly correlated with
personality measures whereas knowledge instructions are more highly correlated with
measures of cognitive ability, regardless of SJT content. Ployhart and Ehrhart (2003)
made a similar distinction between two types of instructions (“would do” versus “should
do”) and found that scores on an SJ T identical in content, but with different response
instructions, were more similar across conditions using the same instruction type. In
other words, instructions asking what an individual “would do” measured something
different than instructions asking what an individual “should do” in a given situation,
despite the item stems and response options being identical. McDaniel and Nguyen
(2001) suggested (although they did not test empirically) that SJTs with knowledge
instructions may be less resistant to faking (or measuring social desirability) than

behavioral tendency instructions. They suggested that SJ Ts with knowledge instructions

11

measure the same type of knowledge from all participants whereas SJ Ts with behavioral
tendency instructions measure behavioral tendency from non-fakers and knowledge about
effectiveness from fakers. The described research provides evidence and theory that
suggests that even the seemingly inconsequential choice about how to instruct applicants
to complete an SJ T can have implications for what constructs are measured by the test.

Although there has been little empirical work investigating how the development
of item stems, response options, and response instructions may affect construct
measurement, the work that does exist suggests that even slight or unintentional
alterations in any of these three parts of an SJ T may affect which constructs are
measured. For example, in the example of the ﬁdelity of item stems, Chan and Schmitt
(1997) showed that identical situational content presented in written versus video format
adds the measurement of an additional construct. For this reason, developers of SJTs
must be cognizant of all of the choices they make about how to develop each aspect of an
SJ T and how that may inﬂuence construct measurement.

The fourth part of SJT development which I believe may inﬂuence construct
measurement is the development of a scoring rubric. Although I believe that this process,
which is often heavily inﬂuenced by SMES, may impact construct measurement, there is
little empirical work on this topic. The development of a scoring key and how it may
affect construct measurement is also probably the aspect of SJ T development that has the
most relevance for the development of other measures. This aspect of SJ T development
and how it may affect construct measurement will be my focus for the remainder of this
paper.

Scoring Key Development

12

Empirical versus Rational Keying

The two most common methods for developing scoring keys for an SJ T are
empirical and rational keying. In the empirical method, response options are selected and
weighted based on their ability to differentiate membership in higher and lower
performing criterion groups. In the rational method, response options are weighted based
on the opinion of SMES that the options tell us something important about the constructs
targeted in the instrument.

In order to create a scoring key using the empirical method, one must have a
criterion measure of interest on which there is signiﬁcant variability in order to create at
least two dichotomous groups (high and low performers), although empirical keys can be
developed against continuous criterion measures as well. The main beneﬁt of this
method is, of course, that it maximizes the prediction of a speciﬁc, external criterion
(England, 1971; Mitchell & Klimoski, 1982). In this case the SJT should be measuring
the same construct(s) as the chosen criterion, although, if there is ambiguity about the
nature of the constructs underlying the criterion, this will translate to the SJT as well.
There are the following three main methodological problems with empirical keying:
reliance on an external criterion, a potential lack of generalizability, and the potential
decay of validity over time.

The effectiveness of the empirical key depends entirely on the adequacy of the
criterion measure used to represent the construct(s) of interest (Thayer, 1977). By using
an external criterion to create a scoring key, the prediction of that external criterion is
maximized, but that creates uncertainty about why the measure is effective. By using an

external criterion measure to score an SJT, the SJT takes on any problems or

13

inadequacies in the criterion measure. While the empirical keying method ensures that
there is a relationship with some outcome variable of interest, it also has the potential to
add another level of error in the measurement of the underlying constructs. Not only are
we leﬁ uncertain about why the SJT accurately predicts scores on the criterion measure,
but also we have to assume that the criterion measure adequately tapped the underlying
constructs of interest in the ﬁrst place.

In addition, the generalizability of the scoring key is dependent on both the
reference group and the sample who takes the SJT and the criterion measure. First of all,
the sample who takes the measures must adequately represent the group of people to
which the scores are supposed to generalize (the reference group) in order for validity
estimates to generalize. Second, scoring weights developed from a speciﬁc sample will,
to a certain extent, capitalize on sample-speciﬁc factors, which will result in high validity
estimates (Hogan, 1994). Thus, it is essential to conduct cross-validation studies on
multiple samples representing the reference group in order to avoid the effects of
idiosyncratic factors in the data.

A third methodological issue with empirical keying is the tendency for the
validity of the measurement to diminish over time (Reilly & Chao, 1982; Thayer, 1977).
Three possible reasons for this documented decay in validity of empirically-developed
scoring keys over time are changes in the nature of the external criterion over time, shifts
in the nature of the reference group, and lack of security of the scoring rubric (or an
increased familiarity with desirable responses).

When developing a scoring key using the rational method, subject matter experts

(SMES) are typically asked to select the best and worst responses and/or to rate the

14

effectiveness of each response option. The scoring key is then developed by weighting
response options based on the best and worst ratings of the SMES and based on item
effectiveness (“best” items should also be rated as relatively effective and “worst” items
should be rated as relatively ineffective; Motowidlo et al., 1990). The main beneﬁt to
this method is that it is theoretically based. It is assumed that subject matter experts are
using their expertise to critically analyze and judge how the response options should be
weighted using their “theory” of what constitutes effective performance. Unfortunately,
however, in practice, we generally do not know the rationale that the “experts” use to
determine their responses. In other words, their “theories” are implicit, rather than
explicit. Asking SMES to respond as though they were test takers assumes that they are
capable of accurately introspecting and reporting how they would behave or how they
think others should behave. SMES are generally selected because of their expertise in the
ﬁeld (or their demonstration in the past that they can behave desirably, which has led to
their success in their ﬁeld or organization), but they may or may not be aware of the
reasons why they have been successful or what behaviors have best served them. When
we use SMES to rationally score SJTS, we assume they are using their underlying
“theory” of successful job performance to develop a scoring key. The “theories” are
never explicitly stated and may vary considerably across SMES or groups of SMES.
From where do the Criteria SMES Use Come?

In this section, I argue that SMES, because of their background and expertise, may
differ in their theories of performance. Moreover, I argue that there are subgroups of
SMES whose implicit theories are similar to other members’ of the subgroup and

dissimilar to the theories of members of other subgroups. First, however, I consider what

15

processes might create similar views of performance or implicit performance constructs
that SMES may use in making their judgments about scoring items.

Theories about the cognitive processes of test-takers (or SMES) suggest that
individuals responding to Speciﬁc items reference criteria that come from their past
experiences as well as individual differences (Motowidlo, Hooper, & Jackson, 2006;
Ployhart, 2006). It is probably not accurate to assume that all individuals whom we
might consider experts in a ﬁeld consider successful performance or successful resolution
of a situation in a given SJT item in the same way. If they are not viewing successful
performance the way that we think they are, then our test is not measuring the constructs
that we think (or go so far as to claim) it measures.

Ployhart (2006) described a model of determinants of predictor response
processes. The basic concept was that there are a variety of ways in which latent
constructs affect test takers’ responses to an individual test item. I believe that we can
analyze a SME’s judgment about the best and worst response to an SJT item in the same
way. A test taker (or SME) engages in the following four related and sequential phases
of cognitive processing in determining his/her response: 1) comprehension of the item (in
this case the situation), 2) retrieval of relevant information from long-term memory, 3)
the forming of a judgment using that information, and 4) the choice of a response option
based on that judgment. All four of the phases are informed by latent constructs or
criteria engaged by the test-taker (or SME). When we ask SMES to help us with scoring,
we tend to assume that the constructs they reference at each step in the process of making
their responses will be fairly consistent across SMES, but we have no evidence to support

this. For example, in rating the same response options for an item stem, one subject

16

matter expert may retrieve information and make a judgment based heavily on cognitive
ability whereas another SME may do so using agreeableness as the main construct of
interest. In reality, of course, each SME is probably referencing multiple constructs at
each step of the process of responding to a single item, only some of which may really be
relevant to the task. It is by averaging responses across multiple SMES that we may be
able to average across idiosyncrasies to ﬁnd the responses that are overall selected based
on the true constructs of interest. Of course, this assumes that groups of SMES do not
share idiosyncrasies, a point I will challenge later. This brings me to the other theory I am
utilizing in my explanation.

Motowidlo, Hooper, and Jackson (2006) have suggested the relevance of implicit
trait policy (ITP) to test-taker endorsement of SJ T response options. Similarly to
Ployhart’s theory, I am applying this theory to subject matter experts determining
response effectiveness. ITP theory suggests that there are stable differences between
individuals in their beliefs about the importance of various personality traits for
determining behavioral effectiveness. In other words, to frame this in terms of Ployhart’s
predictor response processes model, individuals’ beliefs that certain traits are important to
use in certain situations will affect the criteria they reference after reading the item and
how they apply those criteria to their judgment of which response option to choose. For
example, an individual whose ITPs weigh agreeableness highly will judge the
effectiveness of the response options based on how well they represent agreeable
behavior whereas an individual whose ITPs weigh conscientiousness highly will use that
as his/her criterion of interest in determining response option effectiveness. It is

necessary to emphasize again that, when we ask SMES to create a scoring rubric, we are

17

assuming that the SMEs we choose have the most relevant ITPs for the situations in our
measure. These ITPs come from their pre-existing individual differences and are shaped
by personal experiences.

Theoretically, it appears likely that different individuals will rate the effectiveness
of different response options differently based on their own criteria of interest. This
leaves open the question of from where these criteria come or how individuals determine
which criteria to reference. The theories about the cognitive processes test-takers use in
making decisions about how to respond to test items suggest that the criteria that SMES
use in their judgments come from individual differences and from past experiences. In
the past, it has been assumed that all individuals designated as SMES use the same
criterion of interest when evaluating response effectiveness. I am not arguing that each
SME will employ entirely different criteria in judging response options, but I am arguing
that similarities or differences in background and experience will produce subgroups of
SMES who will provide different judgments about scoring keys. In the next section, I
consider what about individuals chosen to serve as SMES makes them similar or different
in terms of what might inﬂuence their judgments about scoring the response options.
Inﬂuences that Produce Judgment Similarities/Dijﬁrences

There are a number of theories or hypotheses in a variety of disciplines within
psychology and sociology that suggest that individuals within groups will have more
similar traits and experiences than individuals between groups. Some of these theories
include attraction-selection—attrition, socialization or ﬁt, organizational demography,
evolutionary theories about tradeoffs between personality traits, and niche-picking.

These theories suggest that individuals with similar individual differences and

I8

experiences tend to aggregate together or to be drawn to similar work groups,
organizations, and occupations and that membership in these groups tends to lead group
members to become more similar in a variety of ways including their ideas and
judgments over time. I will brieﬂy describe each of these ideas and provide exemplary
research studies that support these propositions.

Schneider’s theory of Attraction-Selection-Attrition (ASA; 1987) suggests that
organizations are likely to attract and select individuals who have similar personality
traits and work values and that, over time, individuals who are not a good ﬁt with the
organization will leave (Cable & Judge, 1996, 1997; Judge & Cable, 1997). In the
recruitment process, individuals who are similar to existing employees are both more
likely to be recruited by the organization and to apply for jobs. Individuals who observe
a misalignment between their characteristics and those of organization members may
avoid applying to the organization in the ﬁrst place, may remove themselves from the
application process, or may turn down a job offer. If an applicant whose characteristics
are not aligned with those of existing employees is offered a job and accepts it, he or she
is more likely to leave the organization. Over time, this process leads to the
homogenization of or a restriction in the range within an organization in terms of
personality traits and personal values. This theory has been extended further through
empirical work to apply to the work group level and also to apply to past experiences in
addition to traits and values. If ASA theory is correct, we should see a greater similarity
in the judgments or approaches to problem situations among a set of experienced SMES

than among those who are relatively recently confronted by the organization.

19

To support the attraction portion of ASA theory, Judge and Cable (1997) found
evidence that the types of organizations to which job seekers are attracted are related to
their personality traits. Speciﬁcally, they found evidence that job seekers who score high
on neuroticism are less attracted to innovative and decisive organizational cultures, that
job seekers who score high on extraversion are attracted to aggressive and team-oriented
organizational cultures and less attracted to supportive cultures, that job seekers who
score high on openness to experience are more attracted to innovative organizational
cultures and less attracted to detail-and team-oriented cultures, that job seekers high on
agreeableness are attracted to supportive and team-oriented cultures and less attracted to
aggressive, outcome-oriented and decisive cultures, and ﬁnally, that job seekers high on
conscientiousness are attracted to detail-oriented, outcome-oriented, and rewards-oriented
cultures and less attracted to innovative cultures.

To support the selection portion of ASA theory, Cable and Judge (1997)
examined the hiring decisions of interviewers. They found that interviewers who
perceived an applicant’s values to be highly congruent with those of the company
predicted that those employees would ﬁt well with the organization. They were also
highly likely to recommend that those employees be hired.

To support the attrition portion of ASA theory, Jackson et al. (1991) found that
heterogeneity of top-management teams predicted turnover within the groups. The
attributes on which they measured heterogeneity included age and experience outside of
the industry (both of which were signiﬁcant predictors of turnover), as well as tenure,
education level, college alma mater, possession of a business management degree, and

military experience (none of which were signiﬁcant predictors).

20

According to this theory, there is reason to believe that SMES drawn from the
same groups will possess similar traits and experiences because, over time, new members
who are attracted to the group tend to have a lot in common with current members,
members who stay with the group are those who are most similar to begin with, and
because members of groups become more similar to one another over time.

Socialization research leads to similar premises. There is evidence to suggest
that, during the process in which new employees are socialized to an organization, their
values and those of the organization become even more closely aligned. Cable and
Parsons (2001) proposed that when newcomers to an organization learn during the
socialization process that their values differ from those of the organization, they
experience dissonance because their values do not match those normative for success.
They suggested that two alternative ways of dealing with this dissonance are for
individuals to alter their values or to leave the organization. This process would ensure
that, over time, those employees that stay with the organization will have values aligned
with those of the organization. Cable and Parsons found that, even when they controlled
for initial (or pre-employment) congruence of personal and organizational values, new
employees’ values tended to become aligned with their organization when employers’
used speciﬁc socialization tactics which led to positive social interaction and support with
existing employees.

Cable and Judge (1996) conducted a study in which they examined job seekers’
perceptions during a recruiting cycle at a university. At Time 1, in the spring, job seekers
rated the attractiveness of a job’s attributes, their perceived ﬁt with the company and job,

and their perceptions of the company’s values. At Time 2, the job seekers provided

21

measures of their individual differences, mainly in their values, and the importance of ﬁt
in theirjob search and choice. At Time 3, after working for approximately 5 months, the
participants completed a survey about their perceived ﬁt with the organization and job
and their job attitudes. Cable and Judge found evidence that perceived values congruence
between applicants and organizations was predictive of their ﬁt perceptions, that ﬁt
perceptions predicted job choice intentions, that perceived values congruence with the
organization at which the participants ultimately accepted a job positively affected their
perceptions of ﬁt as employees, and that ﬁt perceptions predicted positive outcomes, such
as reduced turnover.

Another related theory, organizational demography, suggests that a measure of the
aggregate of the demographic information about organization members inﬂuences
behavior independently of individual-level attributes. The idea behind this theory is that
demographic variables can serve as a proxy for measures that are less objective, such as
attitudes, because demographic variables are directly measurable. Therefore, there is an
assumption that individuals from the same demographic groups have similar attitudes,
behave similarly, and make similar judgments. This may be a big assumption, but there is
some evidence to suggest that the methodology of organizational demography can be
useful, which indicates that it may be accurate at least some of the time (Lawrence, 1997;
Pfeffer, 1983). According to this theory, there is reason to expect that SMES drawn from
the same environments will, in general, possess similar demographic descriptors (for
example, organizational tenure, marital status, or gender) and that, those who do possess
similar descriptors will produce similar judgments. This is because those demographic

variables serve as indicators or predictors of similar attitudes, which inﬂuence how they

22

make similar decisions. This theory lacks a process explanation for how demographic
variables inﬂuence attitudes which inﬂuence outcomes (in this case, ratings of response
options for SJT item stems); it simply predicts (with some success) that they will.

Wagner, Pfeffer and O’Reilly (1984) found empirical support for the theory of
organizational demography. They examined the demography of top-management teams,
predicting that members most similar in terms of date of entry into the organization
would be less likely to leave the organization than those members who differed along that
variable. Overall, they found support for this hypothesis; the larger the distribution of
dates-of-entry within a group, the higher the proportion of the group that left. The theory
of organizational demography predicts that groups consisting of members who are more
demographically similar will be more successful in the long term. One mechanism that
predicts this success is that homogeneous groups are less likely to have high rates of
turnover.

There is some theory in evolutionary psychology that suggests that the expression
of certain personality traits has beneﬁts for certain groups, so individuals with those traits
tend to be attracted to those groups where they can be successful (Nettle, 2006). One
example that has been discussed is that, among university students, academic success is
strongly positively correlated with neuroticism among those students resilient enough to
cope with its negative effects (McKenzie, 1989; McKenzie, Taghavi-Knosary, & Tindell,
2000). This theory developed from a theory of tradeoffs, which is an explanation of
heritable variation between humans. In other words, it is an attempt to explain why
humans have different traits if a speciﬁc proﬁle of traits would be most adaptive for the

human experience. The theory suggests that there is not a speciﬁc proﬁle of traits that

23

would be most adaptive to the human experience, but rather that individuals who are
successful in life (or are considered “ﬁt” for survival) are attracted to environments
where the traits they possess are adaptive.

In most cases, SMES are chosen to provide expertise on devising a scoring key
because they have been successful performers in their environment (i.e., they are high
ranking employees within an organization or high performing college students). Thus,
according to evolutionary theory, it is likely that SMES from the same environment will
possess similar traits because they will have been attracted to their environment
speciﬁcally because of the adaptive utility of their particular traits within that
environment.

In their study, McKenzie, Taghavi-Khonsary, and Tindell (2000) examined a
group of university students who, over a period of three years, were enrolled in a higher
education course of study in London. The program offered an opportunity to leave with a
Certiﬁcate of Higher Education after one year of successful study or to leave with a
Diploma of Higher Education after two years of successful study. Successful attainment
of the Diploma allowed the students to continue with one additional year of study to earn
one ﬁnal award. McKenzie et al. found that, for students with high scores on a measure
of coping, neuroticism was highly correlated with academic achievement; this correlation
increased for each higher level of educational attainment in the program (i.e. from
certiﬁcate to diploma to degree). Students who possessed high levels of neuroticism, but
also scored highly on a measure of coping ability, were able to channel their capabilities
towards higher educational attainment. This ﬁnding supports the idea that individuals

who are successful in speciﬁc domains may possess similar proﬁles of personality traits.

24

SMES chosen for their expertise in a speciﬁc area (i.e. an area in which they have been
successful) thus, seem likely to have a similar proﬁle of traits.

Caspi, Roberts, and Shiner (2005) described niche-building processes, whereby
individuals create, seek out, and/or end up in environments that are highly correlated with
their personality traits. Once individuals are in those environments, there may be
processes within the environments which then promote the persistence of trait-related
behaviors and inhibit or discourage opportunities for changing those behaviors. These
environments can range from occupations to workplaces to social situations and more.
According to the niche-building theory, SMES with similar traits will create, seek out,
and end up in environments that foster those traits. Thus, individual environments (for
example, organizations) will tend, over time, to have and to keep employees with similar
personality traits. For example, an individual who is highly extraverted is likely to seek
out a workplace where he or She feels comfortable expressing that trait and also where
that trait is encouraged. The individuals in that workplace, where extraversion is
encouraged, are, in turn, more likely to recruit and hire an extraverted person for
positions in their work group. Thus, certain workplaces will be more likely to contain
extraverts than introverts and vice versa.

Magnus, Diener, Fujita, and Pavot (1993) conducted a study in which they sought
to determine whether personality traits have the capacity to inﬂuence objectively positive
and negative life events (for example, getting married was coded as a positive event
whereas the death of a close family member was coded as a negative event). In 1986,
they collected measures of the Big 5 personality traits from a group of undergraduates.

Four years later, the former undergraduates responded to a mailed checklist about

25

objective life events they had experienced. Those events were coded as positive or
negative by an independent sample. Magnus et al. found, as they predicted, that
individuals high in extraversion reported more positive events, whereas individuals high
in neuroticism reported more negative events. Because the events checklist consisted of
objective and fairly salient events, it is unlikely that the difference in reported events was
due to a reporting bias (i.e. highly neurotic participants were probably not simply
remembering more negative events than positive events). The ﬁndings of this study
support the niche-picking process described by Caspi, Roberts, and Shiner (2005)
whereby the personality traits that individuals have inﬂuence the situations they ﬁnd
themselves in. Because people with similar personality traits are likely to ﬁnd
themselves in similar situations, it seems likely that because SMES from the same group
have been attracted to a single situation, they probably possess similar personality traits.

These theories drawn from disparate ﬁelds within psychology and sociology all
suggest that members of a group are likely to have more similar individual differences
and past experiences than non-members. The more speciﬁc the group, the more likely an
individual is to be similar to the other members. For example, an employee of an
organization is likely to be more similar to the members of his or her work group than to
the members of another work group within the organization and is likely to be more
similar to members of the organization than to members of a different organization.
These similarities in individual differences in traits and past experiences are likely to
affect the judgments that an SME makes when evaluating situations and response options
to create a scoring rubric for an SJ T. Based on these theoretical notions, I develop the

hypotheses presented in the next section of the paper.

26

Hypotheses

To investigate whether group membership does affect judgments made by SMES
in creating scoring rubrics for an SJT, I plan to use an SJT designed to measure college
student performance and three groups of SMES; graduate students, undergraduate
resident advisors (mentors), and undergraduate students drawn from the Psychology
subject pool (largely ﬁrst and second year students). I have three hypotheses about the
nature of the differences between these three groups.

H I .' There will be significant differences between the three groups of SMES in the

effectiveness ratings of the response options in the SJT, such that graduate

students will rate options related to academics highest, mentors will rate options
related to leadership and service highest, and undergraduates will rate options
related to social life highest.

Several theories and bodies of research (i.e., ASA, socialization, etc.) suggest that
individuals with similar backgrounds or experiences, traits, and values will process
information and make decisions in similar ways. My premise is that graduate students
highly value academic pursuits, partially because they have been successful in their own
academic pursuits and because they have been rewarded for their efforts in this domain. I
also posit that their family background was one in which academic pursuits were valued
and rewarded and that their parents themselves may have served as high-achieving
academic role models. In addition, evidence (McKenzie, Taghavi-Khonsary, & Tindell,
2000) suggests that high achieving graduate students may be highly neurotic in addition

to being highly conscientious.

27

l posit that mentors will also value academic performance, but will value a
broader array Of aspects of student life in making their judgments. They have been drawn
to serve the university, which indicates that they value leadership and service. Because
of their training and socialization as RAs, they will value interpersonal competence. I
also predict that they will possess personality attributes that are relevant to both academic
success and sociability, including high levels of conscientiousness, extraversion, and
agreeableness.

Finally, I suggest that undergraduate students in the Psychology subject pool tend
to be students relatively new to the university. As they adjust to life away from home,
largely surrounded by peers for the ﬁrst time, their judgments of how to succeed in
college will be based mainly on interpersonal and social competence. I also suggest that,
similarly to the mentors, the undergraduate students will, as a group, score highly on
measures of extraversion and agreeableness. Please see the attached table (Appendix A)
for a visual presentation of the predicted differences among the three SME groups.

H2: There will be signiﬁcant diﬂerences in the correlates of scores based on

scoring keys developed from different SME groups.

Given my arguments above about the nature of the three different SME groups, I
predict that ( H2a) scores based on the scoring key developed by graduate students will
correlate more highly with a measure of students’ academic success and with a measure
of neuroticism than will the scoring keys developed by the other two SME groups; that
(H2b) scores based on the scoring key developed by mentors will correlate more highly
with a measure of leadership and service-related performance than will the scoring keys

developed by the other two SME groups; and that (H2c) scores based on the scoring key

28

developed by undergraduates will correlate more highly with measures of students’ social
competence and sociability than will the scoring keys developed by the other two SME
groups.

H3: There will be signiﬁcant differences between groups of SMES in references

made to speciﬁc dimensions based on a verbal protocol analysis of remarks made

while they are making judgments about option favorability, such that graduate
students will tend to refer to the importance of academic success, mentors will
refer to the importance of leadership and services, and undergraduates will refer
to the importance of social success.

I plan to ask a small number of SMES in each of these three groups to talk through
their thought process out loud while making their scoring judgments. My speciﬁc
predictions are that (H3a) graduate students will make the highest number of
academically-related spoken references while making judgments in the SJT task, (H3b)
mentors will reference a broad array of criteria (academic, leadership and service-
oriented, and social) for making judgments during the verbal protocol task, and (H3c)
undergraduates will make the highest number of socially-related spoken references

during the verbal protocol task.

29

Method
Participants and Procedures

Phase I: Coding of SJT Response Options

In this phase, 13 graduate students in Industrial/Organizational Psychology read
each of the SJ T items and response options and rated whether they believed each
response option was a measure of academically-oriented behavior, interpersonally-
oriented behavior, or a behavior inﬂuenced by a broad array of both academic and
nonacademic domains. The participants were also asked to provide an estimate of how
conﬁdent they were in their rating of the category to which each response option belongs.
The purpose of this phase was to demonstrate that the SJT instrument would allow for
sufﬁcient differentiation between the groups along the predicted dimensions. Please
reference Appendix B for the instructions for this task.

The ratings of the response options along with ratings of conﬁdence in the choice

can be found in Table 1.

Table 1

Coding of Response Options in Phase 1

 

Item Option Dimension % Endorsing Conﬁdence (M and SD)
1 a Leadership/Service 76.9 =4.54, SD=.66
c Leadership/Service 92.3 M=4.46, SD=.52
d Leadership/Service 76.9 M=3.85, SD=1.07
2 c Academic 76.9 M=3.08, SD=.86
(1 Social 84.6 M=4.3 1, SD=.75
3 b Leadership/Service 76.9 M=4. 1 5, SD=.55
d Academic 76.9 M=3.92, SD=.86
f Social 100.0 M=3.85, SD=.80
4 c Academic 84.6 M=4.38, SD=.51
(1 Academic 76.9 M=3.85, SD=.80
7 b Academic 92.3 M=4.23, SD=.93
c Academic 84.6 M=4.46, SD=.88
8 a Leadership/Service 92.3 M=4.00, SD=.91
9 a Social 92.3 M=4.23, SD=.83

30

Table 1 (cont’d)

10 c
f
11 c
e
12 b
13 b
c
d
e
14 b
f
15 c
d
e
16 a
b
d
17 d
18 e
19 b
20 a
21 a
b
22 a
b
d
23 a
b
c
25 a
b
26 a
c
27 b
c
d
29 a
d
30 a
b
c
31 b
e
g
33 a

Academic

Social

Academic
Academic
Academic
Leadership/ Service
Leadership/ Service
Social
Leadership/Service
Academic
Leadership/ Service
Leadership/ Service
Social

Academic

Social

Social

Social

Social

Academic

Social

Academic
Academic
Academic
Academic
Academic

Social

Academic
Academic
Academic

Social

Leadership/ Service
Academic
Academic

Social

Social

Social

Leadership/ Service
Social

Academic

Social

Academic
Academic
Academic

Social

Academic

84.6
84.6
92.3
76.9
92.3
92.3
92.3
84.6
92.3
92.3
76.9
76.9
92.3
100.0
92.3
84.6
84.6
100.0
75.0
100.0
84.6
84.6
76.9
84.6
76.9
92.3
76.9
92.3
76.9
76.9
84.6
76.9
100.0
100.0
84.6
76.9
92.3
92.3
76.9
76.9
92.3
92.3
92.3
84.6
84.6

31

M=4.23, SD=.73
M=4.3 1, SD=.75
M=4.62, SD=.51
M=3.62, SD=.96
M=3.92, SD=.76
M=4.00, SD=1.00
M=3.69, SD=1.18
M=4.46, SD=.52
M=3.77, SD=.93
M=4.00, SD=1.00
M=4.08, SD=.49
M=4.38, SD=.65
M=4.3 l , SD=.63
M=4.46, SD=.52
M=3.92, SD=1.19
M=4.38, SD=.65
M=3.15, SD=.99
M=4.85, SD=.38
M=3.25, SD=1.14
M=4.38, SD=.77
M=l.31, SD=.85
M=4.15, SD=.90
M=3.54, SD=1.05
M=4.62, SD=.65
M=3.69, SD=.85
M=4.54, SD=.78
M=3.77, SD=1.17
M=4.54, SD=.66
M=3.77, SD=.93
M=4.3], SD=.75
M=3.62, SD=.96
M=3.54, SD=.78
M=4.92, SD=.28
M=4.38, SD=.65
M=3.92, SD=1.04
M=4.00, SD=1.00
M=4.38, SD=.51
M=4.23, SD=.83
M=4.00, SD=1.22
M=4.15, SD=1.07
M=4.23, SD=1.17
=4.08, SD=1.19
M=4.28, SD=1.12
M=4.3 1 , SD=.63
M=4.23, SD=.93

Table l (cont’d)

b Social 92.3 M=4.00, SD=.91
c Academic 100.0 M=3.92, SD=1.19
d Academic 84.6 M=3.54, SD=.97
34 a Academic 84.6 M=3.92, SD=1.04
b Academic 76.9 M=4.3 1 , SD=.63
c Academic 84.6 M=3.54, SD=.97
(1 Academic 76.9 M=4.00, SD=1.08
e Social 84.6 M=4.08, SD=.76
35 c Leadership/Service 76.9 M=4.3 1 , SD=.75
d Leadership/Service 76.9 M=4.00, SD=.82

 

Out of 189 response options in the SJT instrument (see Appendix C), at least 70%
of the Phase 1 participants agreed on a single dimension for 69 (37%) of the response
options. Thirty-four (49.3%) of the 69 response options were classiﬁed as related to the
academic dimension, 14 (20.3%) were classiﬁed as related to the leadership/service
dimension, and 21 (30.4%) were classiﬁed as related to the social dimension. Phase 1
was considered a successful demonstration that the SI T contained sufﬁcient content to
discriminate between the three SME groups along the predicted dimensions.

Phase 2: Scoring Key Development

The participants of Phase 2 were 28 graduate students (different from those
recruited for Phase 1), 28 mentors (resident advisors), and 33 undergraduate students.
For each item, participants were asked to select the response option that they would be
most and least likely to select if faced with that situation. They were also asked to rate
the effectiveness of each response option for that item. The participants were also asked
to provide information about themselves to be used as “manipulation checks” to ensure
that the groups of SMES really did differ along the variables predicted (measures of

academic performance, leadership and service-related behavior, interpersonal

32

competence, personality, values, family support of education, and parental education).
Please reference Appendices D, E, F and G for the instructions and content of these tasks.

The responses of 2,753 applicants to the university (collected in 2004 for another,
ongoing research effort; Schmitt et al., 2007) were used to assess the correlates of the SJT
scored based on the judgments of the different SME groups. Of the 2,696 students who
reported their sex, 35.6% were male and 62.3% were female. Of the 2,524 students who
reported their ethnicity, 55.2% were White, 23.2% were African American, 7.5% were
Asian, 5.7% were Hispanic, and 8.3% identiﬁed as multi-racial or other.
Phase 3: Verbal Protocol Analysis

In this phase, ﬁve graduate students, ﬁve resident advisors, and ﬁve
undergraduate students were trained in the verbal protocol technique. They were given
the following instructions, “As you answer the following six questions, please try to say
all of your thoughts out loud as you go. Please describe your thoughts, feelings and
choices about what you are doing and reading. It might be your reactions, reasoning, or
even something you are reminded of. Please don’t censor your thoughts. Even if a
thought does not seem relevant to the task, it is of interest to me. Remember, everything
you say to me today will be kept conﬁdential.” After the training, the participants were
given two practice SJ T items (See Appendix H). They completed those two items using
the think-aloud protocol and the researcher answered any questions they had and
encouraged them to feel comfortable. After each participant felt comfortable with the
verbal protocol technique, he or she was digitally recorded while completing six of the

SJT items using this technique.

33

The six SJ T items were selected for the verbal protocol based on the coding by
the Phase 1 participants. The items (2, 3, 14, 15, 22, and 29) were chosen because the
Phase 1 participants had categorized the response options for each of the items as
belonging to multiple dimensions. For example, Phase 1 participants identiﬁed 15c as
leadership/service, 15d as social, and 15e as academic. Therefore, the item was expected
to elicit different patterns of response from each of the three SME groups.

After completing those six items, the participants completed the ﬁnal 30 items
using the same technique as the other participants participating in the scoring key
development. The participants were also asked to provide information about themselves
to be used as “manipulation checks” to ensure that the groups of SMES really did differ
along the variables predicted (measures of academic performance, leadership and service-
related behavior, interpersonal competence, and personality). Please reference
Appendices D, E, F and G for the instructions and content of these tasks. The researcher
created a typed transcript of each recording.

Measures

The SJT used in this study was developed to reﬂect 12 dimensions of college
performance. The development process is described fully by Oswald et al. (2004). To
summarize, the item stems were taken from existing measures and adapted by the
researchers to reﬂect the 12 dimensions of college student performance. Additional item
stems and response options were developed by undergraduates. The ﬁnal measure
consists of 36 items (see Appendix C). The items reﬂect a range of academic,

interpersonal, and intrapersonal situations.

34

Indicators to be Linked with Scores Based on SMES ’ Scoring Keys

GPA. Yearly and cumulative four-year grade point average information were
collected from the registrar’s ofﬁce at each of the universities for the years 2004-2008.
In addition, the universities provided the four-year cumulative GPA for each of the
students. Because the different schools from which data were collected varied in
selectivity, college GPA was corrected. First, GPA for each university was standardized.
The standardized grades were then regressed on the scores for college admissions tests
along with a set of dummy variables representing each college and university. The
coefﬁcients for the dummy variables indicated the differences in grades that would be
expected for students with comparable admissions scores at the different universities.
Finally, GPA was adjusted for each participant by their school’s regression coefﬁcient so
that students at universities with higher average admissions scores received a relatively
higher adjusted college GPA, and students at universities with lower average scores
received a relatively lower adjusted college GPA. This correction was made for all of the
GPA variables.

BARS and BOS. Seven items from the behaviorally-anchored rating scale and
seven sub-scales from the behavioral observation scale were used as indicators of the
dimensions which the SME groups were predicted to value differently. Those items and
sub-scales were designed to measure knowledge and mastery of general principles,
continuous learning, leadership, interpersonal skills, social responsibility, adaptability,
and ethics and integrity (see Appendices D and E). The BARS measures were collected

from students who participated in follow-up data collections at the end of their ﬁrst,

35

second, third, and eighth semesters. The 808 measures were collected from students
who participated in the follow-up data collection at the end of their third semester.

Personality measures. The subscales of the 50-item International Personality
Item Pool (Goldberg, 1999) designed to measure conscientiousness, neuroticism,
extraversion, and agreeableness were administered to the applicant pool (see Appendix
F). The personality measures were collected at the ﬁrst data collection in early fall of
2004 when participants had just started at their universities.

Measures of Differentiation between SME groups (“Manipulation Check”)

In order to measure whether the SME groups really differed from one another
along the predicted dimensions (see Appendix A), the SMES were asked to provide self-
ratings on several measures. First, they were asked to ﬁll out the BARS and B08 of the
seven dimensions described above (see Appendix D) in reference to their own behavior.
Next, they completed a rating scale of the importance of the 12 dimensions of college
student performance (see Appendix G). This scale was to be used in conjunction with
self-ratings to determine whether the personal characteristics and values of members of
each SME group differed in the predicted ways. Next, they were asked to complete the
Conscientiousness, Neuroticism, Extraversion, and Agreeableness dimensions of the IPIP
(Goldberg, 1999). Finally, they were asked to provide information about their parental
education and an estimate of their college GPA. The predicted differences between SME
groups can be seen in Appendix A.

Results

Manipulation C heck

36

In order to examine whether the members of the SME groups differed along the
predicted dimensions, I conducted mean difference tests on each of the indicators to
determine whether the members of the SME groups differed on each measure as
predicted (see Appendix A for predictions and Table 2 for results of analyses). ANOVA
tests that indicated signiﬁcant differences between groups were followed by post hoc

Tukey tests to determine the nature of the group differences.

Table 2
Manipulation Checks

 

Measure Support for
Hypotheses?

 

College GPA Predicted: Graduates would be highest Yes
Found: Graduates (M = 3.73, SD = .23) and
mentors (M = 3.54, SD = .23) had
signiﬁcantly higher GPA than undergrads
(M = 3.12, SD = .54). Graduates and
mentors did not differ signiﬁcantly
(F(2,88) = 19.91,p < .001).

Self-rated Behavior (BOS and BARS)

Knowledge Predicted: Graduates would be higher than Partial
undergrads
Found: Graduate students (M = 2.34, SD = .66)
and mentors (M = 2.20, SD = .49) both
had signiﬁcantly higher scores on the
BOS than did undergraduates (M = 1.55,
SD = .34), but were not signiﬁcantly
different from one another (F (2,88) =
21.35, p < .001). The groups did not
differ on BARS ratings (F (2,84) = 2.24,
ns).
Continuous Predicted: Graduates would be higher than Yes
Learning undergrads
Found: Graduate students (M = 3.69, SD = .76)
had signiﬁcantly higher scores on the
BOS than did both mentors (M = 3.23,
SD = .72) and undergraduates (M = 3.04,
SD = .65), who did not differ from one
another (F(2,88) = 6.75, p < .05).

37

Table 2 (cont’d)

Leadership Skills

Interpersonal
Skills

Social
Responsibility

Adaptability

Ethics

Knowledge
Continuous
Learning

Artistic

Multicultural

Predicted:
Found:

Predicted:

Found:

Predicted:
Found:

Predicted:
Found:

Predicted:
Found:

Predicted:
Found:
Predicted:
Found:
Predicted:
Found:
Predicted:

Graduates (M = 4.11, SD = .74) were
higher than undergraduates (M = 3.40,
SD = .84) on BARS. Mentors (M = 3.61,
SD = .79) did not differ from either group
on BARS (F(2,87) = 6.12,p < .01).
Mentors would be highest

Graduate students (M = 2.33, SD = .82)
and mentors (M = 2.76, SD = .77) both
had signiﬁcantly higher BOS scores than
did undergraduates (M = 1.88, SD = .58),
although they did not differ from each
other (F(2,88) = 11.26, p < .001).
Mentors (M = 4.39, SD = .63) were
higher on BARS than both grads (M =
3.39, SD = .74) and undergrads (M =
3.63, SD = 1.10), which don’t differ from
one another (F (2,87) = 10.45, p <.001).
Mentors and undergraduates would be
higher than graduates

No differences (BOS: F(2,88) = 1.32, ns;
BARS: F(2,87) = 1.50, ns).

Mentors would be highest

Mentors (M = 2.42, SD = .52) were
higher than undergraduates (M = 1.95,
SD = .58) on BOS, but graduates (M =
2.28, SD = .82) did not differ from either
group (F(2,88) = 4.30, p < .05). The
groups did not differ on BARS (F (2,87)
= 1.27, ns).

Mentors would be highest

No differences (BOS: F(2.88) = 1.03, ns;
BARS: F(2,87) = .18, ns).

Mentors would be highest

No differences (BOS: F(2,88) = .16, ns;
BARS: F(2,87) = .81, ns).

Partial

No

Partial

No

No

Importance Ratings

Graduates would be highest

No differences (F(2,88) = 1.53, ns).
Graduates would be highest

NO differences (F (2,88) = .64, ns).
No prediction (exploratory)

No differences (F (2,88) = 1.94, ns).
No prediction (exploratory)

38

Table 2 (cont’d)

Leadership

Interpersonal

Social

Responsibility

Health

Career

Adaptability

Perseverance

Ethics

Agreeableness

Conscientiousness

Emotional
Stability

Extraversion

Found:

Predicted:

Found:

Predicted:

Found:

Predicted:

Found:

Predicted:

Found:

Predicted:

Found:

Predicted:

Found:

Predicted:

Found:

Predicted:

Found:

Predicted:

Found:

Predicted:

Found:

Predicted:

Found:

Predicted:

Found:

NO differences (F(2,88) = 1.17, ns).
Mentors would be highest

Mentors (M = 6.04, SD = .84) and
Undergrads (M = 5.94, SD = .83) thought
it more important than graduates (M =
5.32, SD = .86), but did not differ from
each other (F(2,88) = 6.04, p < .05).
Undergraduates would be highest

No differences (F (2,88) = .64, ns)
Mentors would be highest

Mentors (M = 6.00, SD = 1.02) and
undergrads (M = 5.88, SD = .93) thought
it more important than graduates (M =
4.82, SD = 1.66), but did not differ from
each other (F (2,88) = 7.96, p = .001).
No prediction (exploratory)

No differences (F(2,88) = .35, ns)

No prediction (exploratory)

No differences (F(2,88) = .18, ns)
Mentors would be highest

No differences (F(2,88) = .58, ns)

No prediction (exploratory)

No differences (F (2,88) = .11, ns)
Mentors would be highest

Mentors (M = 6.68, SD = .82) and
undergrads (M = 6.55, SD = .67) thought
it more important than Grads (M = 5.71,
SD = 1.08), but did not differ from each
other (F(2,88) = 10.45, p < .001).

Personality

Mentors and undergrads would be higher
than graduates

No differences (F (2,88) = .57, ns)
Graduates and mentors would be higher
than undergrads

No differences (F (2,88) = .35, ns)
Mentors and undergrads would be higher
than graduates

No differences (F (2,88) = .01, ns)
Undergrads would be highest

No differences (F (2,88) = .55, ns)

39

Partial

No

Partial

n/a

n/a

No

No

No

No

Table 2 (cont’d)

Experience
Family Support of Predicted: Graduates would be highest No
Education Found: No differences (F (2,88) = 1.72, ns)
Mother’s Predicted: Graduates would be highest No
Education Found: No differences (F (2,88) = .01, ns)
Father’s Predicted: Graduates would be highest No
Education Found: No differences (F (2,88) = 1.02, ns)

 

GPA

As predicted, graduate students reported the highest college GPA (M = 3.73, SD =
.23). The GPA reported by graduate students was signiﬁcantly greater than that of
undergraduates (M = 3.12, SD = .54), but the GPA of mentors (M = 3.54, SD = .23) did
not differ from either group (F(2,88) = 19.91, p < .001).
Knowledge and Continuous Learning Behaviors

In support of predictions, graduate students (M = 3.69, SD = .76) rated themselves
signiﬁcantly higher than mentors (M = 3.23, SD = .72) and undergraduates (M = 3.04, SD
= .65) on continuous learning on the BOS (F(2,88) = 6.75, p < .05). On the BARS,
graduate students (M = 4.11, SD = .74) rated themselves signiﬁcantly higher than did
undergraduates (M = 3.40, SD = .84), but mentors (M = 3.61, SD = .79) did not differ
from either group (F (2,87) = 6.12, p < .01). In partial support of predictions, graduate
students (M = 2.34, SD = .66) and mentors (M = 2.20, SD = .49) rated themselves
signiﬁcantly higher than undergraduates (M = 1.55, SD = .34) on the knowledge scale of
the BOS (F (2,88) = 21 .35, p < .001), but not the BARS (on which the groups did not
differ; F(2,84) = 2.24, ns).

Leadership, Social Responsibility, Adaptability, and Ethics Behaviors

40

In partial support of predictions, mentors (M = 4.3 9, SD = .63) rated themselves
as higher than both graduate students (M = 3.39, SD = .74) and undergraduates (M = 3.63,
SD = 1.10) on the leadership scale of the BARS (F (2,87) = 10.45, p <.001). On the BOS,
however, both mentors (M = 2.76, SD = .77) and graduate students (M = 2.33, SD = .82)
rated themselves as greater than undergraduates (M = 1.88, SD = .5 8) on leadership
(F(2,88) = 11.26, p < .001). Again, in partial support of predictions, mentors (M = 2.42,
SD = .52) rated themselves as higher than undergraduates (M = 1.95, SD = .58) on social
responsibility on the BOS (F (2,88) = 4.30, p < .05), but neither group differed from
graduate students (M = 2.28, SD = .82). The groups did not differ on social responsibility
measured by the BARS (F (2,87) = 1.27, ns). Contrary to predictions, the three groups did
not differ on ratings of adaptability (BOS: F(2.88) = 1.03, ns; BARS: F(2,87) = .18, ns)
or ethics (BOS: F(2,88) = .16, ns; BARS: F(2,87) = .81, ns).
Interpersonal Behaviors

Contrary to predictions that undergraduates would rate themselves the strongest
on interpersonal behaviors, the three groups did not differ on ratings of interpersonal
behaviors (BOS: F(2,88) = 1.32, ns; BARS: F(2,87) = 1.50, ns).
Ratings of Importance for College Students

In partial support of predictions, Mentors (M = 6.04, SD = .84) and Undergrads
(M = 5.94, SD = .83) thought leadership more important than graduates (M = 5.32, SD =
.86), but did not differ from each other (F (2,88) = 6.04, p < .05). Mentors (M = 6.00, SD
= 1.02) and undergrads (M = 5.88, SD = .93) thought social responsibility more important
than graduates (M = 4.82, SD = 1.66), but did not differ from each other (F (2,88) = 7.96,

p = .001). Mentors (M = 6.68, SD = .82) and undergrads (M = 6.55, SD = .67) thought

41

ethics more important than Grads (M = 5.71, SD = 1.08), but did not differ from each
other (F (2,88) = 10.45, p < .001). The groups did not differ on the other dimensions
(Knowledge: F (2,88) = 1.53, ns; Continuous Learning: F (2,88) = .64, ns; Artistic:
F(2,88) = 1.94, ns; Multicultural: F (2,88) = 1.17, ns; Interpersonal: F(2,88) = .64, ns,
Health: F(2,88) = .35, ns; Career: F(2,88) = .18, ns; Adaptability: F(2,88) = .58, ns;
Perseverance: F (2,88) = .11, ns).
Personality Traits

Contrary to predictions, the groups did not differ on agreeableness (F (2,88) = .57,
ns), conscientiousness (F (2,88) = .35, ns), emotional stability (F (2,88) = .01, ns), or
extraversion (F (2,88) = .55, ns).
Demographics

Contrary to expectations, the groups did not differ on family support of
education(F (2,88) = 1.72, ns), or mother’s (F (2,88) = .01 , ns) or father’s level of
education (F(2,88) = 1.02, ns).
Summary of Findings for Manipulation Check

Overall, there was very little evidence that the three SME groups differed on the
dimensions along which they were predicted to differ. There was some evidence to
suggest that graduate student SMES were more academically inclined than the other two
groups (they had higher grades in college and reported more continuous learning
behaviors). Mentors did not demonstrate more leadership and service related behaviors
and undergraduates did not report being skilled in interpersonal interactions. The groups
had similar values about the importance of the various dimensions for college student

performance, similar personality traits, experienced similar family support for their

42

education, and reported that their parents had similar levels of education. Despite the
ﬁnding that the groups did not differ in the expected ways, I continued with the planned
analyses for two reasons; I wanted to explore whether my hypothesis that different
groups would provide different judgments would still be supported, even if the
differences in judgments were not of the expected type, and I also viewed this research as
a developmental experience.
Hypothesis 1

Hypothesis 1 was that there would be signiﬁcant differences in the effectiveness
judgments of the different groups of SMES (graduate students, mentors, and
undergraduates) for response options classiﬁed into each of the three hypothesized.
dimensions (academic, leadership/service, and social). In order to examine this
hypothesis, I compared the average ratings of effectiveness for each of the 69 response
options that were identiﬁed by the Phase 1 participants as classiﬁable into the three
dimensions across groups. I expected that the options for which I found signiﬁcant
differences in ratings of effectiveness by each group to reﬂect the different hypothesized
lay theories of each SME group. Speciﬁcally, I expected that, out of the three SME
groups, graduate SMEs would rate response options classiﬁed as academic as most
effective, that mentor SMES would rate leadership/service options as most effective, and
undergraduate SMES would rate social response options as most effective.

In order to make comparisons across groups, I used the following two different
analytic techniques: analysis of variance and standardized mean difference comparisons.
First, I conducted analysis of variance comparisons across the ratings of effectiveness for

each of the 69 response options. The results of this analysis can be found in Table 3. As

43

can be seen in the table, the ANOVAs indicated that the three groups of SMES differed
on the effectiveness ratings for only seven of the 69 response options. Of those seven
response options, post hoc Tukey tests indicated that Hypothesis 1 was partially
supported for only three of those response options. For option 16a, which Phase 1
participants categorized into the social dimension, post hoc analyses indicated that
undergraduates (M = 4.39, SD = .72) rated the option as signiﬁcantly more effective than
mentors (M = 3.69, SD = .88), but that neither group differed in their ratings of
effectiveness from the graduate students (M = 4.14, SD = .71). For option 16b, which
Phase 1 participants also categorized into the social dimension, post hoc analyses
indicated that, once again, undergraduates (M = 3.55, SD = 1.18) rated the option as
signiﬁcantly more effective than mentors (M = 2.77, SD = 1.14), but that neither group
differed in their ratings of effectiveness from the graduate students (M = 3.23, SD =
1.05). For options 16a and 16b, Hypothesis 1 was partially supported because
undergraduate SMES rated those response options classiﬁed as social as more effective
than did mentors, although graduate students did not differ from either group. For option
23a, which Phase 1 participants classiﬁed into the academic dimension, post hoc analyses
indicated that graduate students (M = 4.21, SD = .74) rated the option as signiﬁcantly
more effective than did mentors (M = 3.58, SD = .81), but that the ratings of
undergraduates (M = 3.97, SD = .87) did not differ from either group. For option 23a,
Hypothesis 1 was partially supported because graduate SMES rated the option, which was
classiﬁed as academic, as more effective than did mentors, although undergraduates did

not differ in their ratings from either group.

44

The results from the ANOVA indicated very little support for Hypothesis 1. Very
little evidence was found to suggest that the SME groups differed in their ratings of
effectiveness for the response options. Of the 69 response options that were expected to
elicit differences in ratings, only seven items elicited signiﬁcantly different ratings across
groups and only three of those options were in the expected direction.

In addition to the analysis of variance, I also compared the standardized mean
differences ((1) across each pair of SME groups.

Table 3

Comparison of Effectiveness Ratings of the SJT Response Options across SME Groups

 

Item Option Classiﬁed F-test Tukey test dfor dfor dfor

 

Dimension results G-M G-U M-U
1 a Leadership/Service F(2,84) = 1.31 -0.29 -0.40 -0.12
c Leadership/Service F(2,84) = 6.97* U>M 0.38 -0.57 -1.09
d Leadership/Service F(2,84) = .82 0.07 -0.24 -0.32
2 c Academic F(2,84) = .52 0.17 0.27 0.10
d Social F(2,84) = .45 0.20 -0.06 -0.24
3 b Leadership/Service F(2,84) = 3.00 -0.66 -0.43 0.20
(1 Academic F(2,84) = .14 -0.08 -0.13 -0.06
f Social F(2,83) = .83 0.28 0.31 0.01
4 c Academic F(2,84) = .02 0.04 0.05 0.02
d Academic F(2,84) = 1.09 -0.34 0.00 0.35
7 b Academic F(2,83) = .28 0.04 -0.14 -0.21
c Academic F(2,83) = 1.06 0.11 0.37 0.25
8 a Leadership/Service F (2,84) = .13 0.04 0.12 0.10
9 a Social F(2,84) = .60 -0.29 -0.12 0.19
10 c Academic F(2,84) = 1.15 -0.12 -0.39 -0.26
f Social F(2,83) = .50 0.27 0.06 -0.20
11 c Academic F (2,82) = .22 0.14 -0.05 -0.17
e Academic F(2,84) = .08 -0.06 -0.09 -0.05
12 b Academic F(2,84) = 1.16 -0.35 -0.34 -0.04
13 b Leadership/Service F(2,84) = .18 0.16 0.09 -0.07
c Leadership/Service F(2,84) = .16 0.12 0.14 0.04
d Social F(2,83) = .35 0.22 0.00 -0.20
e Leadership/Service F(2,83) = .14 0.13 0.01 -0.13
14 b Academic F(2,82) = 1.48 0.20 -0.25 -0.47
f Leadership/Service F (2,84) = 3.47* G>U 0.55 0.64 0.16
15 c Leadership/Service F(2,84) = 1.53 -0.09 -0.44 -0.34
d Social F(2,84) = 1.57 0.28 0.47 0.17

45

Table 3 (cont’d)

I6

17
18
19
20
21

22

25

26

27

29

30

31

33

34

35

O—OOO-CO‘ODQ-OO‘WOQOO‘OC‘NO—ﬁD—OO‘ONO‘NOO‘NQUNO’mmc‘Oﬁ-Q—O‘mﬂ

Academic
Social
Social
Social
Social
Academic
Social
Academic
Academic
Academic
Academic
Academic
Social
Academic
Academic
Academic
Social

Leadership/ Service

Academic
Academic
Social
Social
Social

Leadership/ Service

Social
Academic
Social
Academic
Academic
Academic
Social
Academic
Social
Academic
Academic
Academic
Academic
Academic
Academic
Social

F(2,84) = 1.80

F(2,84) = 5.88* U>M
F(2,84) = 345* V U>M
F(2,84) = .49

F(2,83) = 1.36

F(2,84) = 1.69

F(2,84) = 2.80

F(2,84) = .56

F(2,84) = .65

F(2,84) = .29

F(2,83) = .55

F(2,83) = 1.58

F(2,84) = .24

F(2,84) = 4.21 * G>M
F(2,84) = 1.83
F(2,84) = .38
F(2,84) = .47
F(2,84) = 3.17*
F(2,84) = 1.09
F(2,84) = 2.49
F(2,83) = .14
F(2,83) = .11
F(2,83) = .94
F(2,84) = .33
F(2,83) = 1.33
F(2,84) = 510* U>M
F(2,84) = .09

F(2,84) = 2.67

F(2,83) = .96

F(2,83) = 1.31

F(2,84) = 1.37

F(2,84) = 1.03

F(2,84) = .65

F(2,84) = .20

F(2,84) = .88

F(2,84) = .23

F(2,84) = .07

F(2,84) = .09

F(2,84) = .80

F(2,84) = .96

Leadership/Service F(2,84) = 1.47
Leadership/Service F (2,84) = 2.95

0.13
0.57
0.47
-0.19
-0.47
0.47
0.55
0.01
0.04
-0.07
-0.01
0.45
-0.02
0.83
-0.03
-0.21
0.20
-0.04
-0. l 2
-0.52
0.00
0.1 1
-0.37
-0.20
0.13
0.22
-0.03
0.36
0.39
0.24
0.40
0.16
0.29
0.18
0.21
-0.07
-0.08
0.09
0.20
0.36
-0.05
-0.58

-0.33
-0.34
-0.24
-0.25
-0.07

0.12

0.51
-0.23
-0.22
-0.19
-0.23

0.16

0.15

0.30
-0.44
-0.20
-0.04
-0.58
-0.36
-0.48
-0.1 l
-0.01
-0.27
-0.18
-0.29
-0.58
-0.09
-0.25

0.10
-0.21

0.38
-0.21

0.03

0.04
-0.16
-0. l 6
-0.09

0.00
-0. 15

0.28
-0.39
-0.51

-0.51
-0.88
-0.67
-0.05

0.35
-0.37
-0.10
-0.25
-0.32
-0.13
-0.29
-0.34

0.18
-0.46
-0.47

0.00
-0.25
-0.51
-0.28
-0.02
-0.12
-0.13

0.07

0.02
-0.43
-0.94
-0.08
-0.63
-0.27
-0.42

0.07
-0.41
-0.26
-0. l 3
-0.33
-0.1 1
-0.02
-0.10
-0.32
-0.07
-0.40

0.01

 

Note. The p-value for F -tests marked with an asterisk was less than .05. U =

undergraduate group, M = mentor group, G = graduate student group.

46

In the last three columns of Table 3, the d-values are reported for each pair (Graduate-
Mentor, Graduate-Undergraduate, Mentor-Undergraduate). As may be observed in the
table, a fair number of the differences would be considered meaningful according to
Cohen’s standards (1977; d-values of 0.2-0.5 are considered a small difference, 05-08 a
moderate difference, and 0.8 and above a large difference). In order to evaluate whether
the three SME groups differed meaningfully in their ratings of effectiveness using d-
values, I examined all of the meaningful (according to Cohen’s criteria, 0.2 and above) d-
values for each dimension relative to the total number of response options that were
classiﬁed as belonging to each dimension. I created proﬁles to indicate which types of
ﬁndings would strongly and weakly support or refute Hypothesis 1. For example, 34 of
the response options were classiﬁed as belonging to the academic dimension. Of those
34 response options, 3 indicated that graduate students rated the options as more effective
than undergraduates and 11 indicated that graduate students rated the options as more
effective than mentors, both of which would be considered support for the hypothesis
(see Table 4). In contrast, for 15 options, undergraduates rated them as more effective
than graduate students and, for 3 items, mentors rated them as more effective than
graduate students. Overall, the pattern of responses indicated that although graduate
students rated academic options as more effective than did mentors, undergraduates rated
academic options as even more effective than did graduate students. This pattern is
contradictory to Hypothesis 1, which predicted that graduate students would rate

academic options as more effective than either of the other SME groups.

47

Table 4

Comparison of Differences across Groups for Ratings of Effectiveness of Options in the

Academic Dimension (k = 34)

 

 

 

Support Against
G>U G>M U>G M>G
k 3 1 l 15 3
% 8.82 32.35 44.12 8.82

 

Overall, the pattern of response for options classiﬁed into the leadership/service
dimension (k = 14; see Table 5) indicated that although mentors thought
leadership/service items were more effective than graduates did, undergraduates rated
leadership/service items more effective than did mentors.

Table 5

Comparison of Differences across Groups for Ratings of Effectiveness of Options in the

Leadership/Service Dimension (k: I 4)

 

 

 

Suraport Against
M>G M>U G>M U>M
k 4 1 2 5
% 28.57 7.14 14.29 35.71

 

Once again, this pattern is counter to the predictions of Hypothesis 1, that mentors would
rate the leadership/service options as most effective relative to the other two SME groups.

The pattern of response for options classiﬁed into the social dimension (k = 21; see Table

48

6) indicated that undergraduates rated social options as more effective than mentors and
that undergraduates and graduates rated social options as approximately equally effective.
This pattern partially supports the prediction of Hypothesis 1 that undergraduates would
report the highest effectiveness ratings for the social options.

Table 6

Comparison of Differences across Groups for Ratings of Effectiveness of Options in the

Social Dimension (k=21)

 

 

 

Support Against
U>G U>M G>U M>U
k 5 8 5 1
% 23.81 38.10 23.81 4.76

 

Taken together, the ANOVA analyses and standardized mean difference
comparisons for the 69 options for which the three SME groups were hypothesized to
differ in ratings of effectiveness across the predicted dimensions indicate very little
support for Hypothesis 1.

Hypothesis 2

Hypothesis 2 was that there would be signiﬁcant differences in the correlates of
scores based on the scoring keys developed from different SME groups with external
criteria measures. In order to examine this hypothesis, correlations between each of the
three scoring keys (graduate, mentor, and undergraduate) and the external variables were
computed. Tests of the signiﬁcance of the differences in the correlations in the predicted
directions (see Appendix A) would be considered support (or lack of support) for

Hypothesis 2. In order to test for differences among the correlations of the scores

49

developed from the keys for each group with each criterion, the procedure for testing the
heterogeneity of a set of correlated correlations described by Meng, Rosenthal, and Rubin
(1992) was used. The results for each of the analyses can be found in Table 7. Although
the scoring keys for each of the three groups were correlated with most of the criteria
measures, there were no differences in the relationships between the three scoring keys
and any of the criteria. Thus, there was no support for Hypothesis 2.

Table 7

Comparisons of Correlations for Hypothesis 2

 

 

Criterion Measure USJT MSJT GSJT chi-sq. df

First Year GPA ‘ .32 .31 .31 .17 2
N 1519 1519 1519

Second Year GPA 1' .323131 .09 2
N 1383 1383 1383

Third Year GPA r .30 .29 .29 .14 2
N 1318 1318 1318

Fourth Yea, GPA ‘ .25 .24 .25 .09 2
N 1231 1231 1231

CUIUUIathC 4-Year I' 31 29 29 .58 2

GPA N 1867 1867 1867

lSt semeSter BARS I' .20 22 23 .66 2

me‘wge N 1 140 1 140 1 140

1st semester BARS r

Continuous .05 .08 .08 .48 2

Learning N 1 140 1140 1 I40

ISt SCHICSICI' BARS I' O4 09 08 1.62 2

Leaders“? N 1 137 1137 1 137

ISt SCmCSter BARS r .05 09 08 91 2

Interpersonal N 1 140 1 140 1 140

50

Table 7 (cont’d)

lst semester BARS
Social
Responsibility

lst semester BARS
Adaptability

lst semester BARS
Ethics

lst year BARS
Knowledge

lst year BARS
Continuous
Learning

lst year BARS
Leadership

lst year BARS
Interpersonal

lst year BARS
Social
Responsibility

lst year BARS
Adaptability

1st year BARS
Ethics

3rd semester BARS
Knowledge

3rd semester BARS
Continuous
Learning

3rd semester BARS
Leadership

3rd semester BARS
Interpersonal

3rd semester BARS
Social
Responsibility

'1

“Z"Z"Z

Z

"1

Z

2'72"

"1

Z"!

"1 Z"!

Z

"1

Z"Z"Z

.07
1139

.00
1139

.21
1141

.22
980

.03
982

.12
982

.13
977

.14
978

0.06
983

.20
984

.26
872

.05
873

.14
871

.11
870

.13
870

51

.10
1139

.01
1139

.24

l 141
.20
980

.07
982

.17
982

.17
977

.19
978

.09
983

.21
984

.25
872

.07
873

.19
871

.15
870

.18
870

.08
1139

.01
1139

.23
1141

.23
980

.06
982

.16
982

.16
977

.17
978

.09
983

.23
984

.27
872

0.06
873

.20
871

.15
870

.17
870

.68

.03

.43

.35

.82

2.38

1.41

1.77

.82

.49

.26

.25

2.28

.97

1.16

Table 7 (cont’d)

3rd semester BARS
Adaptability

3rd semester BARS
Ethics

4th year BARS
Knowledge

4th year BARS
Continuous
Learning

4th year BARS
Leadership

4th year
Interpersonal

4th year BARS
Responsibility

4th year BARS
Adaptability

4th year BARS
Ethics

BOS Knowledge

BOS Continuous
Learning

BOS Leadership

BOS Interpersonal

BOS Social
Responsibility

BOS Adaptability

BOS Ethics

2

”'3

Z"Z”‘Z"Z"Z"’Z"Z"Z"Z"Z"Z”Z

.03
868

.24
873

.20
594

.05
592

.12
592

.15
590
.09
594
.0]
593
.20
591
.06
872

.07
865

0.02
872

.15
865

.09
872

.11
848

.17

52

0.06
868

.25
873

.19
594

.05
592

.16
592

.18
590

.13
594

.02
593
.21
591
.08
872

.09
865

.07
872

.15
865

.14
872

.10
848

.22

0.06
868

.25
873

.20
594

.06
592

.14
592
.17
590

.11
594

.01
593
.21
591
.07
872

.09
865

.08
872

.16
865

.13
872

.10
848

.20

.52

.07

.01

.02

.59

.31

.63

.03

.05

.32

.21

2.11

.04

1.25

.15

1.38

Table 7 (cont’d)

N 871 871 871
Conscientiousness r 21 .23 .23 1.19 2
N 2686 2686 2686
Emotional Stability r .03 .06 .05 1.24 2
N 2685 2685 2685
Extraversion r .02 .04 .03 .62 2
N 2685 2685 2685 ,
Agreeableness r .27 .31 .29 4.39 2
N 2686 2686 2686

 

 

Note. Correlations above .05 are signiﬁcant at p < .05. The signiﬁcant Chi-square value
with 2 degrees of freedom is 5.99.
Hypothesis 3

Hypothesis 3 was that there would be signiﬁcant differences between groups of
SMES in references made to criteria during a verbal protocol analysis. Because the
results from the manipulation check and Hypothesis 1 indicated very little support for the
prediction that the groups differed along the predicted dimensions, 1 developed 26 new
coding categories from the content of the verbal protocols (see Appendix I). The new
coding categories referred to reasons for or against choosing a response option. Three
graduate students in Industrial/Organizational Psychology who were not part of the Phase
1 data collection and who were unaware of the hypotheses and theoretical background of
the study coded the verbal protocol analyses based on the new coding scheme.

To systematize the coding task, I ﬁrst extracted all phrases from the transcriptions
of the verbal protocols that referred to reasons for selecting or not selecting a speciﬁc

response option. There were 360 distinct phrases or sentences extracted from the verbal

53

protocols. The coders then coded the short phrases or sentences rather than content
analyzing entire verbal protocol transcriptions. The coders were blind to group
membership of the “speaker” and were also unaware that there were different groups of
speakers. They were presented with the item stem and response option corresponding to
each of the phrases extracted from the transcripts.

After the coding was complete, 1 determined that all three coders had agreed on
codes for 149 (41%) of the 360 phrases. Two out of the three coders agreed on codes for
294 (82%) of the 360 phrases. For the following analyses, I assigned codes to the 294
phrases based on the agreement of two out of three of the coders.

In order to compare whether the three SME groups differed in terms of the
reasons they verbally assigned to why they did or did not select a response option, I
created a frequency count for how many SMES from each group responded according to
each code for each item. Because each group of SMES provided different numbers of
code-able phrases, I then divided the frequency for each code by the total number of
phrases elicited from each group to create a proportion. All of the proportions are
presented in Table 8. The SJ T items, responses, and most common codes are presented
in Figure 1.

Table 8

Results ﬁom the Verbal Protocol Analysis

 

Imml Imm2 Imm3

 

Code Grads Mentors Undergrads Grads Mentors Undergrads Grads Mentors Undergrads

 

l 0.04 0.06 0.00 0.08 0.00 0.00 0.00 0.00 0.00
2 0.04 0.06 0.00 0.04 0.00 0.00 0.04 0.00 0.00
3 0.14 0.12 0.1 I 0.08 0.05 0.13 0.04 0.07 0.09
4 0.1 1 0.18 0.33 0.25 0.38 0.25 0.17 0.30 0.36
5 0.00 0.12 0.1 l 0.04 0.05 0.00 0.00 0.00 0.00

54

Table 8 (cont’d)

 

 

 

 

6 0.04 0.00 0.1 1 0.00 0.00 0.25 0.00 0.04 0.00
7 0.1 l 0.06 0.1 l 0.00 0.00 0.00 0.08 0.04 0.00
8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.04 0.09
9 0.00 0.00 0.00 0.00 0.00 0.00 0.04 0.00 0.00
10 0.00 0.00 0.00 0.08 0.10 0.00 0.08 0.1 1 0.09
l l 0.00 0.00 0.00 0.08 0.00 0.00 0.17 0.1 l 0.00
12 0.00 0.00 0.00 0.00 0.00 0.00 0.08 0.00 0.00
13 0.07 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
14 0.00 0.00 0.00 0.04 0.05 0.13 0.00 0.00 0.00
15 0.00 0.00 0.1 l 0.00 0.00 0.00 0.00 0.00 0.09
16 0.04 0.12 0.00 0.08 0.00 0.00 0.00 0.00 0.00
17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.04 0.00
18 0.” 0.18 0.11 0.04 0.10 0.13 0.08 0.15 0.00
19 0.00 0.06 0.00 0.00 0.05 0.00 0.00 0.00 0.00
20 0.00 0.00 0.00 0.00 0.05 0.00 0.04 0.00 0.09
21 0.00 0.00 0.00 0.00 0.00 0.13 0.00 0.00 0.00
22 0.00 0.06 0.00 0.00 0.00 0.00 0.00 0.00 0.00
23 0.00 0.00 0.00 0.08 0.05 0.00 0.08 0.07 0.09
24 0.00 0.00 0.00 0.04 0.10 0.00 0.00 0.04 0.00
25 0.04 0.00 0.00 0.00 0.05 0.00 0.04 0.00 0.09
26 0.00 0.00 0.00 0.04 0.00 0.00 0.04 0.00 0.00
Item 4 Item 5 Item 6
Code Grads Mentors Undeggrads Grads Mentors Undergrads Grads Mentors Undergrads
l 0.08 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
3 0.15 0.00 0.00 0.1 l 0.07 0.00 0.07 0.07 0.00
4 0.08 0.17 0.40 0.47 0.67 0.20 0.50 0.47 0.42
5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
6 0.00 0.00 0.00 0.16 0.07 0.10 0.00 0.00 0.00
7 0.23 0.1 l 0.20 0.05 0.00 0.20 0.07 0.00 0.08
8 0.08 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
9 0.00 0.00 0.00 0.00 0.00 0.00 0.14 0.27 0.17
10 0.00 0.06 0.00 0.00 0.00 0.00 0.00 0.00 0.00
I l 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.17
12 0.08 0.1 l 0.00 0.00 0.00 0.00 0.00 0.00 0.00
13 0.08 0.06 0.07 0.00 0.00 0.00 0.00 0.00 0.00
14 0.00 0.00 0.07 0.00 0.00 0.00 0.00 0.00 0.00
15 0.00 0.00 0.00 0.16 0.20 0.30 0.00 0.00 0.00
16 0. l 5 0.17 0.07 0.05 0.00 0.10 0.00 0.00 0.00
17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.17

55

Table 8 (cont’d)

 

 

18 0.00 0.06 0.00 0.00 0.00 0. l0 0.00 0.13 0.00
19 0.00 0.06 0.13 0.00 0.00 0.00 0.00 0.07 0.00
20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
2 I 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
22 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
23 0.08 0.1 I 0.00 0.00 0.00 0.00 0.21 0.00 0.00
24 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
25 0.00 0.1 I 0.07 0.00 0.00 0.00 0.00 0.00 0.00
26 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
All Items

 

Code Grads Mentors Undergrads

 

l 0.04 0.01 0.00
2 0.03 0.01 0.00
3 0.1 I 0.06 0.05
4 0.26 0.34 0.34
5 0.01 0.03 0.02
6 0.04 0.02 0.06
7 0.09 0.04 0.1 l
8 0.01 0.01 0.02
9 0.03 0.04 0.03
10 0.04 0.05 0.02
l l 0.05 0.03 0.03
12 0.03 0.02 0.00
13 0.03 0.01 0.02
14 0.01 0.01 0.03
15 0.03 0.03 0.08
16 0.05 0.04 0.03
17 0.00 0.01 0.03
18 0.05 0.1 I 0.05
19 0.00 0.04 0.03
20 0.01 0.01 0.02
21 0.00 0.00 0.02
22 0.00 0.01 0.00
23 0.06 0.04 0.02
24 0.01 0.03 0.00
25 0.03 0.03 0.03
26 0.02 0.00 0.00

 

56

Figure 1

SJT Items and Response Options from the Verbal Protocol and the Most Common Codes
per Item
Item 1

You ﬁnd that you are eating more fattening and greasy food than normal, and that you
have not been getting sufﬁcient exercise. You have gained 15 pounds but ﬁnd it difﬁcult
to change your eating and exercising habits. How would you deal with this situation?

a. Start slowly by cutting out two snacks a day.

b. Don’t worry about it. You only live once, so eat what you want.

0. Get help from someone with experience in this area, such as a health professional or
nutritionist.

d. Get some friends together and exercise together. There is power in numbers.

e. Try to establish a regular exercise routine, and focus on eating healthy foods.

Most Common Codes for Responses to Item 1 (Across Options)

(4) It would not solve the problem.
33% of undergraduate responses
18% of mentor responses
11% of graduate student responses
(18) I think this would work well because of what I know about human nature (for
example, setting clear goals helps motivate people).
18% of mentor responses
11% of graduate student responses
(3) Something else would need to be added for this option to work.
14% of graduate student responses
(7) I just don’t like to do this or I am not willing to do this.
11% of graduate student responses

Item 2

You are part of a three-person group working on a class project with a quickly
approaching deadline. One member of the team is not pulling his/her weight and avoids
assignments, complains about the amount of work that has to be done, and says the project
doesn’t really matter anyway. While you are all classmates, you seem to be the group
leader. What would you do?

a. Divide the workload evenly among members of the group, making sure everyone knows
they are responsible for their share. If the group member still does not pull his/her own

57

Figure 1 (cont’d)
weight, bring it up with the instructor.

b. Speak with the group member and offer him/her encouragement to complete their
portion of the project. If the group member still does not contribute, bring it up with the
instructor.

c. Try to get the team member motivated to do a share of the work. If that doesn’t help the
situation, just put more effort into the project yourself in order to complete it.

d. Just do the group member’s portion of the assignment in addition to your own, and tell
the instructor about the situation.

e. See if the person could be removed from your group.

f. Consult with the group member who is not a problem about the most appropriate course
of action, and then act on whatever you both decide.

Most Common Codes for Responses to Item 2 (Across Options)

(4) It would not solve the problem.
38% of mentor responses
25% of graduate student responses
25% of undergraduate responses
(6) It would take too much time or require too much effort.
25% of undergraduate responses

Item 3

You and ﬁve other students must have a report ready within 48 hours. The last time the
six of you worked together, you became the leader. You know that one of the group
members did no work whatsoever on the last occasion, yet she is in your group again. This
time it is necessary that all members pull their own weight. What would you do?

a. Let her know that you are aware that she did not do any work last time, and that this
time it is necessary that she fully contribute.

b. Do your entire end of the work and ensure that the instructor is aware that you did your
share, regardless of what the other members do.

c. Explain to the group that the professor will be made aware of who contributed what to
the project, and ensure that this happens.

d. Stress the importance that everyone fully contributes his or her share to the project.

e. Work as closely with her as possible (e.g. assign both of you a related task) so as to
offer encouragement and to ensure that her work gets done.

f. Assign her a speciﬁc task with a speciﬁc timeﬁame. If she does not do the work, ask to
have her re-assigned, and have the group pick up her work.

58

Figure I (cont’d)
Most Common Codes for Responses to Item 3 (Across Options)

(4) It would not solve the problem.
36% mentor responses
30% undergraduate responses
17% graduate student responses
(11) It is too confrontational or would offend other people.
17% of graduate student responses

Item 4

You are working together with other classmates on a project. Your group keeps running
into a variety of problems that threaten to cause the project to be late. The other group
members want to just plan to submit it late. Another option would be to devote much more
time than planned to the project and possibly get it in on time. What would you do?

a. Try to get it done, but plan to submit it late.

b. Ask the instructor for help or for an extension. If that doesn’t work, just try your best
and do what you can or turn it in late.

c. Motivate the group to devote more time and work together to get it done.

d. Have the group decide what to do.

e. Work hard to ﬁnish it because there are consequences for being late, and meeting
deadlines is important to you.

f. Tell the instructor your situation, and ask for advice.

Most Common Codes for Responses to Item 4 (Across Options)

(4) It would not solve the problem.
40% of undergraduate responses
17% of mentor responses
(16) This is something I’ve done in the past and it worked for me.
17% of mentor responses
(7) I just don’t like to do this or I am not willing to do this.
23% of graduate responses

Item 5

An event in the news makes you wonder about the history behind it. What would you
do?

a. Do some research, looking up all the facts for yourself.

59

Figure 1 (cont’d)

b. Do a quick Internet search to see if you could ﬁnd any information.
c. Think about it brieﬂy, then move on.

d. Ask others what they know about the topic.

e. Resolve to read the newspaper more often.

Most Common Codes for Responses to Item 5 (Across Options)

(4) It would not solve the problem.
67% of mentor responses
47% of graduate student responses
(15) It would work quickly or with little effort.
30% of undergraduate responses

Item 6

One of your friends’ roommates frequently parties until late at night, often returning to
the room after drinking, engaging in loud and obnoxious behavior. Your friend ﬁnds that
he/she cannot study or sleep well because of this, but also feels reluctant or afraid to talk
with the resident assistant about this. What action would you take?

a. Approach the resident assistant on behalf of your friend.

b. Talk to the roommate yourself, and explain that his/her behavior bothers your friend.
c. Tell your friend to talk with the roommate and let him/her know that the behavior is
not acceptable.

(1. Offer to let your friend stay with you when necessary.

e. Suggest to your friend that he/she talk it out with the roommate, and offer to be
available as a neutral third party when the two have the conversation.

Most Common Codes for Responses to Item 6 (Across Options)
(4) It would not solve the problem.
50% of graduate student responses
47% of mentor responses
42% of undergraduate responses
In the last three columns of Table 8, the proportions are compared across all of the items.

Across all of the items, the most common reasoning SMES provided was code 4, “This

response would not be effective because it would not solve the problem.”

60

An analysis of the most common reasons members of each group provided by
item is somewhat more informative. For item 1, which described an unwanted weight
gain and asked what the participant would do in that situation, 33% of the undergraduate
responses were coded 4. Responses by mentors were most often coded 4 (18%) and 18
(18%), “I think this would work well because of what I know about human nature (for
example, setting clear goals helps motivate people)” Responses by graduate students
were most often coded 3 (14%), “Something else would need to be added for this option
to work,” closely followed by I 1% of the responses coded as 4, 7, and 18 each. Code 7
was “I just do not like this or I am not willing to do this.” Although there was signiﬁcant
overlap in the reasons that each group gave for their responses (most commonly 4 or 18),
differences emerged between groups not only in reasoning but in diversity of reasoning.
Whereas responses by graduate students were diverse, one third of the responses by
undergraduates were based on the same reasoning.

For item 2, the largest proportion of responses for graduate students (25%) and
mentors (38%) were coded 4. The responses by undergraduates were clustered at 4
(25%) and 6 (25%), “It would take too much time or require too much effort.” For item
3, the largest proportion of responses for mentors (30%) and undergraduates (36%) was
coded 4. The responses by graduate students were clustered at 4 (17%) and 11 (17%), “It
is too confrontational or would offend other people.” For item 4, the responses for
undergraduates were most frequently coded 4 (40%). The responses for mentors were
most frequently coded 4 (17%) and 16 (17%), “This is something I’ve done in the past
and it worked for me.” The responses for graduate students were most frequently coded

7 (23%). For item 5, the responses for graduate students (47%) and mentors (67%) were

61

most frequently coded 4. The responses for undergraduate students were most frequently
coded 15 (30%), “It would work quickly or with little effort.” For item 6, the responses
for all three groups, graduate students (50%), mentors (47%), and undergraduates (42%)
were most frequently coded 4.

Overall, there was a lot of diversity in responses from each group. The most
common reasoning across all groups for all items was code 4, but analyses of patterns at
the item level revealed more information. Undergraduates tended to prefer response
options that would work quickly or without a lot of effort (together the codes accounted
for 22% of their responses to item 1, 25% of their responses to item 2, and 40% of their
responses to item 5) and to avoid options that would take too much time or effort.
Mentors referred to what they knew about human nature or what had worked for them in
the past (together, the codes accounted for 30% of their responses to item 1 and 23% of
responses to item 4). There was the least amount of agreement among the reasoning by
graduate students between and within items, suggesting that there was a lot of diversity of

reasoning within that group.

62

Discussion

The purpose of this study was to explore whether the selection of the SMES who
provide scoring judgments during the development of an SJ T has the potential to affect
which constructs the SJT ultimately measures. In order to examine this question, three
groups of SMES (graduate students, resident advisors or mentors, and undergraduate
students drawn from the Psychology subject pool) provided scoring judgments for an SJ T
intended to measure college student performance. The ﬁrst hypothesis was that members
of the three groups of SMES would rate the effectiveness of the response options
according to different implicit theories of desirable performance. Speciﬁcally, it was
predicted that graduate student SMES would view options related to academic
performance as most effective, that mentors would view options related to leadership and
service as most effective, and that undergraduates drawn from the subject pool would rate
options related to social performance as most effective. Data regarding differences on
dimensions relevant to these hypothetical differences between groups were collected
from each of the three groups.

The theories presented in the introduction that support the idea that distinct groups
of individuals develop similar implicit theories through shared experiences and traits all
assume that groups differ from one another. In order to ensure that the groups of SMES
in this study did differ from one another and in expected ways, data were collected from
the SMES as a manipulation check. The manipulation check indicated that, although the
groups differed in some of the expected ways, they were actually quite similar. For GPA,
knowledge, and continuous learning self-rated behaviors, for which it was predicted that

graduate students would be highest, there was evidence that undergraduates were indeed

63

 

lower than graduate students, but mentors were similar to both groups. On self-rated
leadership behaviors and social responsibility, for which mentors were expected to be the
highest, there was evidence that undergraduates were lower than mentors, but graduate
students were similar to both groups. On self-rated adaptability and ethics behaviors, for
which mentors were expected to be the highest, the groups did not differ at all. On self-
rated interpersonal behaviors, for which undergraduates were expected to be the highest,
the groups did not differ. The three groups had very similar ideas about what behaviors
are important in college. They also scored very similarly on the personality traits of
agreeableness, conscientiousness, emotional stability, and extraversion. In general, the
three groups of SMES were not very different. This ﬁnding can be seen as a
contradiction of the theories indicating that groups become similar over time through a
wide variety of processes or (more likely) the three groups chosen for this study did not
meet the true deﬁnition of groups who share experiences and distinct traits.

Despite the ﬁnding that the three SME groups did not differ on all variables as
expected, I proceeded with the planned analyses. The data did not support the ﬁrst
hypothesis. Although the three groups of SMES did not agree about the effectiveness of a
few of the response options, they did agree about the effectiveness of most of the
response options. In addition, even for those response options about which they did not
agree, the differences were not always in the expected direction. The second hypothesis
was that scores on the SJ T based on the scoring keys developed by the three different
groups of SMES would differentially predict criterion measures of college student
performance. The second hypothesis was also not supported by the data. Although the

scoring keys developed from the responses of the three different groups of SMES did

64

 

predict a broad array of performance criteria, there were no differences in predictive
capability across groups. This ﬁnding indicates that the three scoring keys were
measuring the same set of constructs.

In order to further explore the reasoning behind the judgments of the SMES in the
scoring task, a sub-group of each SME group participated in a verbal protocol analysis
task in which they reasoned aloud as they made their judgments for six of the SJ T items.
The third hypothesis was that members of the three SME groups would articulate
different reasons for making their scoring judgments. Because of the lack of support for
the original hypotheses, the original predictions that graduate students would refer to
academic reasons, mentors would refer to leadership and service reasons, and
undergraduates would refer to social reasons for making their judgments were ignored
and a new set of codes were developed based on the content of the transcripts from the
verbal protocols. Based on this new set of codes, there is some evidence that the groups
did differ in the reasoning that they used for making their scoring judgments. The most
common reasoning that each group gave for their decisions was that they disregarded an
option because they believed that it simply would not work. There was relatively little
agreement among graduate students for why they chose and rejected response options,
suggesting that graduate students did not share an implicit theory of performance or at
least that they did not articulate a uniform theory. In contrast, members of both the
mentor and undergraduate groups responded in a relatively consistent manner using
similar reasoning. Mentors tended to refer to what they knew about human nature and to

what had worked for them in the past. Undergraduates tended to prefer options that they

65

believed would take little time or effort and reject options that they perceived would take
a lot of time or effort.

The theoretical argument in the introduction was that different groups have
different implicit theories of performance that come from their shared experiences and
traits. Despite some evidence that the members of the three groups used different
reasoning to make scoring judgments, there were no practical effects of the differences in
reasoning for this study. Scores based on scoring keys developed by each group of SMES
were equivalent in terms of predicting a wide array of criteria, including GPA, a variety
of behaviors, and personality traits. As will be discussed below, there are a number of
limitations to the methodology of this study that may have impacted the results, but it
must be emphasized that the main ﬁnding in this study is that the choice of SMES for
scoring key development did not impact construct measurement. In fact, although both
graduate students and mentors could be viewed as “experts” in college student
performance, both groups having demonstrated high performance as undergraduates, the
undergraduates who contributed scoring judgments were not selected based on their
success as students and, thus, would not typically be considered “experts” on
performance. Their judgments, however, were as effective in predicting performance
criteria as were the judgments of the graduate students and mentors.

Limitations and Future Research

There were a number of limitations to this study that may have affected the
ﬁndings. First, the expert groups may not have been distinct enough to have found
effects. Second, the dimensions along which the groups were expected to differ may not

have been representative of true differences between the groups. Third, college student

66

performance may be a construct about which many people have good judgment. Fourth,
the verbal protocol procedure may not have been conducted equally effectively across all
three groups, and ﬁfth, experts may not be very good at explaining how they make their
decisions.

First, the choice of groups for SMES may have limited the ﬁndings. The ﬁndings
of the manipulation check indicated that the groups were fairly similar. The prediction
was that groups have members with similar experiences and traits, so it may be that the
groups were too similar in terms of experience and traits to have had different implicit
theories of college student performance. It is also possible that the groups were not
similar enough within groups to have led to distinct differences between groups. The
graduate student SMES were drawn from several departments and, thus, may not have
had a shared identity or shared experiences as graduate students. It may be that speciﬁc
departments within the graduate school attract certain types of students, but that graduate
students in general do not share a proﬁle of traits and characteristics. Mentors would
seem likely to share a proﬁle of experiences and traits because they are all attracted to
and belong to a single program within the university, but the program is very large and
there may be a large amount of diversity among mentors. The undergraduates shared
only the experience of being students at the same large university and taking at least one
class in the Psychology Department.

Future research should examine more distinct SME groups. For example, in this
study, it may have been possible to examine a different set of groups that would still have
been experts in college student performance. For example, the groups could have come

from different universities. It is possible that individuals at different universities may

67

have different norms or lay theories of performance. I also could have included a faculty
expert group. It may be that faculty members, who have experience teaching, advising,
and mentoring undergraduates, have different theories about effective student
performance than do students and former students. It should be emphasized that the
scoring keys developed by all three expert groups effectively predicted performance
along a variety of dimensions. The question here is whether experts from another
university or setting might develop a scoring key that also effectively predicted student
performance, but with differences in which constructs were best predicted.

Second, the dimensions along which the groups were expected to differ were not
representative of real differences between the groups. In order to ﬁnd meaningful
differences between the groups on ratings of effectiveness if they truly existed, it would
have been necessary to have developed dimensions that represented real differences
between the groups. It is important to note that the ﬁndings that the three scoring keys
did not predict the performance measures differently would not have been different even
if the differences between the groups had been better determined. If a similar study is
conducted to examine differences between groups of experts in the future, the choice of
dimensions along which the experts are expected to differ should be based on pilot-
testing, focus groups, or data of some kind. This would allow differences to become
more apparent if differences in judgments of different groups really can have meaningful
effects. However, use of different dimensions would make direct comparisons of the
groups impossible.

The measure chosen for the study was an SJT intended to measure college student

performance. It may be that the choice of the measure limited the possibility of ﬁnding

68

differences between groups because there is a high level of knowledge about appropriate
behavior for college students among adults in general. The target of the measure is high
school students who have no experience at the college level. The participants who took
the SJ T in 2004 were new college students in their ﬁrst few days or weeks at school. All
of the subject matter experts had been attending college for at least one year by the time
they provided scoring judgments. It may be that the necessary level of knowledge for an
expert is developed during the ﬁrst year of college, so that students at the end of their ﬁrst
year are capable experts whereas students who have not yet entered college differ in their
levels of practical judgment. If there is a common theory of college student performance
among most people who have completed at least their ﬁrst year of college, then there is a
very large pool of experts who could legitimately provide judgments for scoring this
measure. However, there may be many types of constructs for which there is a not a high
level of agreement among the implicit theories of different groups.

Future research might also consider selecting a measure of a construct about
which there are fewer “experts.” If it is true that college student performance is a topic
about which many people have a similar theory of performance, there may be many other
types of measures for different lay theories of performance among different expert groups
that might have a more profound effect on construct measurement. If this is the case, the
ﬁndings of this study could be misleading.

Although the verbal protocol procedure did yield some differences between
groups, there were some ﬂaws in the procedure. First, the verbal protocol procedure was
completed with the undergraduate student participants from the subject pool before the

mentors and graduate students. The undergraduate students also provided the fewest

69

number of use-able phrases of the three groups. It is possible that I was less proﬁcient at
conducting the protocol procedure when working with the ﬁrst ﬁve participants. I did
follow the same protocol and provide the same instructions, but I may have
unintentionally created a more comfortable atmosphere later on that led to more (or more
useful) elicitation ﬁ'om mentors and graduate students. It is also possible that mentors
and graduate students felt more comfortable with a graduate student researcher or that
undergraduates tend to be less verbally ﬂuent than mentors and graduate students.
Mentors are selected for their positions in part for their interpersonal verbal skills and
graduate students are selected for admission to graduate school in part for their high
verbal scores on the Graduate Record Exam (GRE). Also, undergraduates earned class
credit whereas members of the other groups earned money for their participation. It is
possible that money was a stronger motivator than class credit. In addition, I had no
training before leading participants through the verbal protocol procedure and the
exercise may have been more effective at getting at participants’ real thoughts if I were
more skilled.

The other limitation with the verbal protocol is the technique itself. There is some
evidence to suggest that experts may not be capable of explaining the real reasons why
they make certain judgments. Glaser, Lesgold, and Gott (1991) suggested that when an
individual reaches the level of automaticity (or the level of expertise) in a subject, he or
she may no longer be able to verbalize intended behaviors or processes that are necessary
for successful task performance. In addition to the problems with the procedures for the

verbal protocol, it may be that subject matter experts may have trouble articulating the

70

true reasons for their decisions verbally. They may be unaware of the true rationale for
their judgments and, thus, unable to explain them to others.

It is possible that, at least for some SJTS, the constructs measured are so well
deﬁned by the item stems and response options that the scoring system has little effect.
In this case, the choice of SMES for the development of critical incidents that become
items and for the development of response options would have a much greater effect on
construct measurement than the choice of SMES for scoring. Although there has been
research about the impact of the ﬁdelity, length, complexity, and comprehensibility of
item stems on construct measurement (Chan & Schmitt, 1997; McDaniel et al,, 2001;
Sacco et al., 2000), there has been little research about how the content of item stems
may impact construct measurement. There is also some evidence that the content of
response options can be developed with the purpose of measuring speciﬁc constructs (for
example, Motowidlo, Hooper, & Jackson, 2006). Future research further exploring how
the development of item stems and response options impacts the constructs an SJT
measures may be especially useful. However, we should not completely disregard the
hypothesis that aspects of SJ T development other than item development may impact
construct measurement. As discussed earlier, there is also evidence that other aspects of
SJT development, such as instruction sets, may impact construct measurement, even with
identical stems and responses (McDaniel et al., 2007; McDaniel & Nguyen, 2001;
Ployhart & Ehrhart, 2003).

Practical Implications
Because the results were contrary to the predictions in this study, the implications

of the ﬁndings are a little bit confusing to decipher. On theone hand, the implications

71

seem to be that the choice of experts for SJT scoring judgment may not be as important
as one might think. It may be possible for a variety of individuals to serve as subject
matter experts and provide scoring judgments for an SJ T without impacting construct
measurement. On the other hand, it is hard to imagine that careful expert choice is not
important. It is possible that all of the SMEs in this study were appropriate experts for
the measurement of college student performance or that they were at least “equally good”
experts at the measurement of college student performance. It may be that there are
topics about which individuals and groups tend to share implicit theories of performance
and that there are other types of topics about which groups differ in their theories of
performance. The methodology of this study may have limited the ﬁndings because
college student performance may fall in the former category. Practically, any
incumbents with a certain level of experience (in this case, one year in college) may be
able to serve as good experts for scoring judgments. It is still important, however, to
determine what that necessary and sufﬁcient level of experience is prior to selecting

SMES.

72

Appendix A

Expected Relationships between Scores Based on the Scoring Keys Developed by each
SME Group and External Measures and Manipulation Check

External Measures Graduate Students Resident Advisors Undergraduates

 

Neuroticism +
Conscientiousness + +

GPA (Academic + +
Performance)

Family Support of + +
Education

(Manipulation check

only)

Parental Education + +
(Manipulation check
only)

Leadership and +
Service-Related

Behaviors

(Leadership, Social

Responsibility,

Adaptability, Ethics

and Integrity)

Interpersonal + +
Behaviors

Extraversion + +

Agreeableness + +

73

Appendix B
Coding Exercise for [/0 Graduate Students

Please read each of the following situations and the responses to each situation carefully.
After you read each response option, please indicate whether you think it would be
selected by someone who strongly values academics, someone who strongly values
leadership and service, someone who strongly values their social life, or whether you
cannot determine who would select that item based on the above categories.

Please try to select one of the categories for each response option and only rate it as
undetermined if you absolutely cannot decide. If you think that a response option would
be selected by people from more than one of the above categories, you may rate it as
such. However, as much as possible, please select only one category for each response
opﬁon.

After you have selected a category for each response option, please rate your conﬁdence
in your rating of that response option on the following scale: 5 = ﬁilly conﬁdent, 4 =
somewhat conﬁdent, 3 = neither conﬁdent nor lacking conﬁdence, 2 = somewhat lacking
in conﬁdence, 1 = completely lacking conﬁdence.

Thank you!

Example item:

You only have so much time in a day. What do you spend the most time on?

 

 

 

 

 

 

 

 

 

 

 

 

'o
g .9 8 .2 8 8
.e, a 'E =1 a 5
'0 g g '3 0 E
‘6 as '8 6' c:
0 o "3 o “O o
< l.) g m c: Q
:3
a. Doing your homework. X 5
b. Serving food at the local soup kitchen. X 4
c. Hanging out with your friends. X 5
d. Playing Nintendo by yourself. X 3

 

74

Appendix C
Situational Judgment Test

Instructions: We asked a large group of college students to describe situations that they
have faced in college; they then explained how they dealt with those situations. Each
question that follows reﬂects one of those situations, along with a list of alternative ways
they said they would respond to the situation. Please read each situation and then read
all of the alternatives presented. Then, indicate which way you think you would MOST
LIKELY respond. It might not be exactly what you would do in the situation, but it
should be the alternative that comes closest to what you think you would actually do.
Next, decide which alternative you would be LEAST LIKELY to take in the situation, and
record your answer. After you have selected the responses you would be most and least
likely to make, please rate the effectiveness of each of the response choices for the
situation on a scale ﬁ'om 1 (very ineffective) to 5 (very ejfective).

1. After a local disaster, the Red Cross asked for volunteer blood donors. Because of a
medical condition, you cannot donate blood. How would you react in this situation?

Encourage others to donate blood.

Donate money to the Red Cross instead.

Volunteer your time to generate money for the Red Cross.
Volunteer to give out cookies and help at the blood drives.
Ask the Red Cross if you could help them in any other way.

.09-99*.»

2. You ﬁnd that you are eating more fattening and greasy food than normal, and that you
have not been getting sufﬁcient exercise. You have gained 15 pounds but ﬁnd it difﬁcult
to change your eating and exercising habits. How would you deal with this situation?

a. Start slowly by cutting out two snacks a day.

b. Don’t worry about it. You only live once, so eat what you want.

c. Get help from someone with experience in this area, such as a health professional
or nutritionist.

(1. Get some friends together and exercise together. There is power in numbers.

e. Try to establish a regular exercise routine, and focus on eating healthy foods.

3. You are part of a three-person group working on a class project with a quickly
approaching deadline. One member of the team is not pulling his/her weight and avoids
assignments, complains about the amount of work that has to be done, and says the
project doesn’t really matter anyway. While you are all classmates, you seem to be the
group leader. What would you do?

a. Divide the workload evenly among members of the group, making sure everyone

knows they are responsible for their share. If the group member still does not pull
his/her own weight, bring it up with the instructor.

75

b. Speak with the group member and offer him/her encouragement to complete their
portion of the project. If the group member still does not contribute, bring it up
with the instructor.

c. Try to get the team member motivated to do a share of the work. If that doesn’t
help the situation, just put more effort into the project yourself in order to complete
it.

(1. Just do the group member’s portion of the assignment in addition to your own, and
tell the instructor about the situation.

e. See if the person could be removed from your group.

f. Consult with the group member who is not a problem about the most appropriate
course of action, and then act on whatever you both decide.

4. You have very much wanted to be a teacher, but you failed the entrance exam into the
College of Education. This exam is not given again for a year. What would you do?

a. Change majors to something similar that does not require an entrance exam.

b. Take a year off to earn some money, and then retake the exam.

0. Take additional relevant classes, and seek advice on how to best prepare for the
examination the next year.

d. Take other requirements or courses of interest to you for a year, and then retake the
examination next year.

5. A fellow student allows you to listen to threatening phone calls that have been placed
on his/her answering machine by another student. The student does not want you to tell

anyone but thinks the caller may be capable of causing physical harm. What would you
do?

a. Try to talk the friend into calling the police and warn him/her to not walk around
alone.

b. Talk to the resident assistant about it.

c. Contact the police yourself if you think there is any real threat of physical harm.

(I. Find out who is making the calls; if it is another student, confront him/her - singly
or jointly.

e. Unless the friend knows something that he/she is not saying, there is no reason not
to call the police — so call them if your ﬁ'iend won’t.

f. Have the friend change his/her phone munber, and have it unlisted.

6. You have been standing in line for the restroom for some time after a campus event,
and someone cuts into the line ahead of you. What would you do?

a. Politely inform the person that there is a line and hopefully he/she will move to the
back.

b. Say aloud to someone near you how rude it is that people cut in line.

c. Give the person dirty looks, and try to squeeze him/her out of line.

(1. Scold the person for not respecting others.

e. Be annoyed but not do anything. It’s just one more person.

76

f. Calmly cut in line back in front of them.

7. You are interested in ﬁnance, but do not have ﬁirther ﬁnance courses for at least
another semester. What would you do?

a. Wait until the next semester, and take another class then.

b. Try to register for an alternative ﬁnance course as an elective.

c. Use the semester to do some independent study so that you are well prepared for
the next course.

(I. Get involved in on-campus ﬁnance clubs or investment games.

e. See if you could be a teacher’s assistant for a ﬁnance class.

8. As a leader of a student organization, you asked a committee member to track the use
of important and costly supplies. In response, he/she developed forms requiring the
organization’s committee members to indicate when and how they used various supplies.
This committee member then complains that no committee members are completing
these forms. How would you handle this situation?

a. Explain the importance of tracking to the committee, and request that everyone
comply with the request.

b. Ask everyone to respect the committee member’s hard work and effort by
cooperating.

c. Limit access to the supplies until people start ﬁlling out the forms, or have
penalties for not complying.

d. Designate another committee member to be in charge of tracking and enforcing the
information requests.

e. Ask the committee if there is a misunderstanding about the forms and for
suggestions on improving them.

9. Your roommate, usually a tidy person, has recently experienced some personal
difﬁculties, thus becoming quite distracted and leaving much of the household
responsibilities to you. You have discussed your concerns, and empathetically requested
that he/she resume sharing in the responsibilities as soon as possible. A month passes,
and you are still doing too much of the housework. What would you do?

a. Find out more about his/her problem and try to deal with that ﬁrst.

b. Stop doing all of the household responsibilities to show him/her what it’s like.

c. Talk with him/her again, and explain that you are suffering as a result of his/her
behavior.

d. Tell him/her that if he/she doesn’t help, you will move out.

e. Do your share of the work, and put anything of your roommate’s that affects you in
his/her area of the room.

10. After you arrive on campus, you begin to socialize with a group of students who drink
regularly, even though all are underage. By the end of the term, you realize that you are

77

drinking several drinks at least three nights a week, but you don’t know how to withdraw
from the group in which this is normal routine behavior. What action would you take?

a. Ask a close friend to help watch out for your best interests, and pursue other
activities with other people.

b. As long as you keep your grades up, it is not a problem.

c. Explain to the group that you are concerned about falling behind if you continue
the behavior, and concentrate more on your studies instead.

d. Join alternative groups such as campus clubs and sports, or maybe even take an
evening or early morning job.

e. Just socialize with the group less frequently.

f. Continue socializing with the group, but don’t always drink when they do.

11. You have been having trouble with a class in which everyone else seems to be doing
well. Your homework comes back with unsatisfactory grades week after week, and your
test scores have been marginally passing. How would you proceed?

a. Find a study group to work with you.

b. Talk to the professor and to friends in the class, and read more.

c. Get tutoring, and study more frequently for this class.

(1. Seek help from someone in the class who is doing well.

e. Talk to the professor or TA to ﬁnd out what you are doing wrong, compare notes
with others and seek out tutoring.

f. Stay calm and continue to do the best you can.

12. There is a seminar being held on campus that would expand your understanding of a
class topic, but the seminar time conﬂicts with the class schedule. What would you do?

a. Skip the class, and go to the seminar because it is related to the class.
b. Go to class because it might cover what the seminar would cover.

c. Go to class and talk to someone that went to the seminar.

(1. Get advice from the professor and then decide what to do.

13. You are the student coordinator for the gym, and it’s 4:30 P.M. You have just been
informed that there is no heat in the gym. As it is the middle of winter and very cold, you
know this will be a problem. There is a student dance being held in the gym that night at
7:00 P.M., and there are no alternative facilities in which to hold the number of people
expected at this event. What would you do?

a. Let everyone know that the dance is postponed or called off.

b. Call maintenance, and see if they can ﬁx it.

c. Look for small heaters to ﬁll the room.

(I. Call people and check the consensus opinion about what to do.
e. Find a group of rooms as an alternative location.

f. Inform the students to dress warmly for the dance.

78

14. You and ﬁve other students must have a report ready within 48 hours. The last time
the six of you worked together, you became the leader. You know that one of the group
members did no work whatsoever on the last occasion, yet she is in your group again.
This time it is necessary that all members pull their own weight. What would you do?

a. Let her know that you are aware that she did not do any work last time, and that
this time it is necessary that she fully contribute.

b. Do your entire end of the work and ensure that the instructor is aware that you did
your share, regardless of what the other members do.

c. Explain to the group that the professor will be made aware of who contributed what
to the project, and ensure that this happens.

d. Stress the importance that everyone fully contributes his or her share to the project.

e. Work as closely with her as possible (e. g. assign both of you a related task) so as to
offer encouragement and to ensure that her work gets done.

f. Assign her a speciﬁc task with a speciﬁc timeframe. If she does not do the work,
ask to have her re-assigned, and have the group pick up her work.

15. You are working together with other classmates on a project. Your group keeps
running into a variety of problems that threaten to cause the project to be late. The other
group members want to just plan to submit it late. Another option would be to devote
much more time than planned to the project and possibly get it in on time. What would

you do?

a. Try to get it done, but plan to submit it late.

b. Ask the instructor for help or for an extension. If that doesn’t work, just try your
best and do what you can or turn it in late.

c. Motivate the group to devote more time and work together to get it done.

(I. Have the group decide what to do.

e. Work hard to ﬁnish it because there are consequences for being late, and meeting
deadlines is important to you.

f. Tell the instructor your situation, and ask for advice.

16. You grew up in a small farming community and moved into a dorm area in which all
students were from an urban background. They seem to have different concerns and
interests, and they often just stare blankly when you talk about your background and
experiences. How would you react?

a. Ask them questions about their experiences in the hopes that they will develop
some interest in your background.

b. Find other places to make friends with people who also come from farming
communities.

c. Try to talk to just one person, on his/her own, about what life was like for you
growing up.

(I. Ask others about their experiences and ask if they have any questions of you.

e. Voice your feelings about the staring, and limit the talking about your background.

79

17. You have set ideas about what music is pleasing to the ear, and a friend is pushing
you to join his/her at a concert that she thinks you would enjoy. The band will be playing
a type of music that you prefer to avoid. What would you do?

a. You would not go, but you would decline as politely as possible.

b. If the ticket is free, you would go; otherwise you would not attend.

c. You would go to the concert with an open mind, hoping that you might appreciate
it.

(I. You would go because of your friend.

18. You know that a group of students in your class cheats on exams by putting formulas
into scientiﬁc calculators or into cell phones. The professor has clearly warned against
such activity, but you are not sure what she would do if she knew what these students
were doing. What action would you take?

a. Try doing the same thing until people start getting caught.

b. Study the way you know best, don’t cheat, but don’t turn in the other students
either.

c. You would do nothing; it’s none of your business.

d. You would mention it to the professor so she can deal with the problems in the
class.

e. Don’t tell the professor, but make sure it is clear you are not involved in case they
get caught.

f. Send the professor an anonymous message about what is going on.

19. You and your friends know that an attractive mutual friend has been dating another
person for nearly a year. However, one of your friends tries his/her best to get a date with
this individual. How would you react?

a. If the acquaintance is in a happy relationship, tell your friend to wait and rethink it.
If not, it is ok to get a date.

b. Support my friend.

c. Tell your friend to wait until the person is single or to just forget about the person.

(1. Tell your friend that it is inappropriate to interfere in the relationship.

e. Just be annoyed at your ﬁiend. Do not get involved.

20. When you ﬁrst started school, you planned to major in an area in which you are no
longer interested, and now your grades are not as good as you would like. You know that
you do not want to major in this subject. What would you do?

a. Explore other options, and try to change your major to something you like.

b. Take classes or ask friends about other majors.

c. Change majors if isn’t a huge setback. Otherwise, make the best of it.

(I. Ask your advisor if the major has more interesting classes that you haven’t taken
yet.

80

 

 

 

21. Because of family problems, you realize that your parents can no longer support you
ﬁnancially at the same level as they have, and you do not have enough money to continue
in school. What plans would you make?

a. Apply for student ﬁnancial aid or get a part-time job.

b. Ask other family members for money to ﬁnish school.

c. Drop out of school and save money for going back.

d. Take fewer classes because of the lower level of ﬁnances.

22. An event in the news makes you wonder about the history behind it. What would you
do?

a. Do some research, looking up all the facts for yourself.

b. Do a quick lntemet search to see if you could ﬁnd any information.
c. Think about it brieﬂy, then move on.

d. Ask others what they know about the topic.

e. Resolve to read the newspaper more often.

23. You are ﬁnding a particular class dull and boring, and you are having difﬁculty
staying awake. What would you do?

a. Do what you can to stay awake, such as drinking caffeine or sitting toward. the front
of the class.

b. Read the class material beforehand to make the lecture more interesting.

0. During the lecture, do some studying that is required for the course.

(I. Make sure you are getting enough sleep every school night.

e. Skip the class if it is that dull and boring to you.

24. In the summer and fall, you walked to class and participated in various outdoor
sports. When cold weather came, you took the bus and no longer participated in sports.
You ﬁnd that you are gaining weight. What action would you take?

a. Participate in indoor sports and start working out indoors.

b. Try not to eat as much, or eat different kinds of food.

c. Walk to classes more, go to the gym and watch what you eat.

(I. Work out in your room.

e. Talk to an expert in diets and see if you can ﬁnd someone who will encourage you
to start working out again.

f. Not relevant due to physical disability.

25. One of your friends tells a joke that makes fun of people of a particular ethnic
background. What would you do?

a. Laugh if it is funny and no one from that group is present.
b. Leave the room.

81

c. Nothing. Probably laugh if it is funny; it is just a joke.

(1. Point out the offensiveness of the remark to the friend, and indicate your lack of
tolerance for similar remarks.

e. Laugh if it is funny, but warn him/her to be careful in the future about where the
joke is told.

f. Do not laugh; show displeasure by ignoring the joke.

26. Your grade for a particular class is based on three exams, with no class attendance
requirement. All of the homework requirements for the class are posted on the professor’s
web site. What would you do?

a. Attend class for as long as you feel that it is helping your grades. I
b. Do all the homework, but only go to some of the lectures. It’s the exams that count.
c. Go to all the classes anyway. The professor may say something important.

(1. Skip classes, but if you did poorly on the ﬁrst exam, start going to classes.

e. There is no need to go to classes. Just get the homework done, and pass the exams.

27. There is a concert coming up that you think will be fantastic, but no one you know is
interested in going with you. What would you do?

a. Go by yourself and ﬁnd someone else at the concert that went alone.

b. Try to ﬁnd someone else to go with you, but if you cannot then you would not go.

c. Ask your best friend to go even if you knew that he/she wasn’t as excited as you
were.

(I. Get two tickets and offer a free ticket to anyone you know that might want to go.

28. You share a dorm room with three other students. One half-hour before you are
expecting a guest, you get home to ﬁnd the place completely trashed. There is no sign of
any of your roommates. What would you do?

a. Clean up the mess as much as possible before the guest arrives. Then speak with
your roommates immediately upon their return, so your guest knows how
concerned you were about the mess.

b. Leave the mess and explain the situation to your guest.

c. Leave the mess and take the guest somewhere else.

(1. Clean up the mess as much as possible before the guest arrives. Then, without the
guest around, ask the roommates why the place was trashed so badly and what can
be done in the future to avoid this situation.

29. One of your friends’ roommates frequently parties until late at night, often returning
to the room after drinking, engaging in loud and obnoxious behavior. Your friend ﬁnds
that he/she cannot study or sleep well because of this, but also feels reluctant or afraid to
talk with the resident assistant about this. What action would you take?

a. Approach the resident assistant on behalf of your friend.

82

b. Talk to the roommate yourself, and explain that his/her behavior bothers your
friend.

c. Tell your friend to talk with the roommate and let him/her know that the behavior
is not acceptable.

(1. Offer to let your friend stay with you when necessary.

e. Suggest to your ﬁiend that he/she talk it out with the roommate, and offer to be
available as a neutral third party when the two have the conversation.

30. You are searching for a major that interests you and think you might be interested in
psychology. You do not know much about preparation to be a psychologist or what kinds
of opportunities exist for careers in this area. What action would you take?

a. Talk to an advisor in psychology to see what career options are available.

b. Talk with a friend who is a psychology major to see what it is about.

c. Take an introductory psychology course to see what areas in psychology there are.
(I. Look up job listings for psychologists on the lntemet.

31. You are interested in several different classes/disciplines, but don’t know anything
about future educational or career opportunities in these areas. What steps would you take
to get informed?

a. Go to an advisor or knowledgeable professional who might tell you more and be
able to answer your questions.

b. Research topics using available resources like relevant books and lntemet web
sites.

c. Attempt to obtain some hands-on experience, like internships.

(I. Use the school’s resources such as career services and career counselors.

e. Take some introductory classes in the area of interest to see if you want to pursue
that area further.

f. Think about your interests and try to ﬁgure out which of them ﬁt with the different
disciplines.

g. Ask friends and family for advice and information. If possible, ask a friend who is
familiar with the area.

32. In a class of 50 students, you discover that a group of your friends have worked out a
scheme to share answers on an exam. The professor has vision problems and will likely
never notice. You are not doing very well in the course. What would you do in these
circumstances?

a. Avoid being around these friends.

b. It is not exactly honest but under the circumstances, the scheme is OK. You would
join them.

c. Do your own work, and do not tell the professor about the scheme because it is not
your problem.

d. Cheat and get a good grade.

e. Tell the professor about the scheme.

83

f. Study for the exam, but join the scheme as a backup strategy for the test.

33. You see a painting that intrigues you. You know nothing about it other than the
artist’s name. What would you do?

a. Look up the artist on the lntemet to see if you can ﬁnd some of his/her other work.
b. Ask others if they know anything about the artist.

c. Do some research to ﬁnd out what you want to know.

(I. Look for help at the library, asking for books about this artist.

e. Enjoy the painting, but leave it at that.

34. Your professor has just given you a project that will obviously require the whole

semester to complete. She gave you all the details you need to get started, but you are not .4
sure how the project should proceed from there. She does not appear to intend to give you 3?
any more information in class. What would you do? a .

a. Work out the project to the best of your ability, and approach the professor if you
get stuck.

b. Generate some ideas, and then go to ofﬁce hours to see how the professor responds
to them.

c. Ask the professor about the project after class.

(I. Visit the professor or a teaching assistant during ofﬁce hours to discuss the project.

e. Talk to other students to get an idea of what they are doing.

f. Try to get an idea of whether or not other students seem confused. If so, bring the
issue up with the professor during class.

35. You are part of a committee to reduce cross-cultural tension in your dorm. A group of
students in your dorm complain to you that people always convey holiday greetings to
them that are not associated with their religion or culture. They request that their
differences be respected. How would you address this problem?

a. Ask the group politely to ignore the greetings, realizing that the people had good
intentions.

b. Tell the well-wishers respectfully to please refrain from making speciﬁc holiday
greetings.

c. Have a meeting at which people can discuss their differences and hopefully work
out an understanding.

d. As part of the committee, make all cultural holidays visible so that people can be
aware of diversity.

e. Tell them to respond with a meaningful greeting of their own.

36. A friend on your ﬂoor is always organizing “social” activities — including trips to
local bars. Aside from the fact that this person is underage and failing some classes, you
realize that the individual is drinking half a dozen or more drinks at least three or four
times a week. No one else seems to know or to be concerned about the person. What
would you do?

84

a. Talk to him/her about easing up on the alcohol, explaining that it will not help with
classes, which should be the main reason for being in college.

b. Use humor to broach the topic and offer alternatives to this usual “social” activity.

c. Bring up the situation with the ﬂoor’s resident assistant.

d. Try to get him/her involved in other activities.

e. Talk to the person to determine subtly if there are other issues that need to be
addressed, and refer him/her to help if appropriate.

f. Talk to other people on the ﬂoor, and discuss ways to address the situation.

g. Ask once about this behavior and see where the discussion leads, then leave
him/her to pursue his/her own course of action.

 

85

Appendix D
Behaviorally Anchored Rating Scales
T he following questions ask you to rate your skills in 12 different areas during the PAST
SIX MONTHS. Read the definition of each are, then use the behavioral examples
provided to help you rate yourself most accurately.
Knowledge and masteg/ of general principles is deﬁned as: Gaining knowledge and
mastering facts, ideas and theories and how they interrelate, and the relevant contexts in

which knowledge is developed and applied.

1. Very low (for example: rarely studying for tests, slacking off on assignments)

 

3. Average (for example: sometimes studying for tests, putting some effort into

assignments)

4.

5. Very high (for example: studying hard for tests, putting a great deal of effort into
assignments)

Continuous learning is deﬁned as: Being intellectually curious and interested in
continuous learning. Actively seeking new ideas and new skills, both in core areas of
study as well as in peripheral or novel areas.

 

1. Very low (for example: only learning the minimum amount required for class,
rarely searching out information on topics that interest you on the intemet or at
the library)

b.)

. Average (for example: sometimes learning a little more than what is required for
courses on class topics that interest you, occasionally searching out interesting
topics on the intemet or at the library)

LII

Very high (for example: frequently learning extra information beyond what is
covered in classes on topics that interest you, often searching out interesting
topics on the intemet or at the library)

Leadership is deﬁned as: Demonstrating skills in a group, such as motivating others,
coordinating groups and tasks, serving as a representative for the group, or otherwise
performing a managing role in a group.

1. Very low (for example: avoids being in charge of group projects, always waiting

for others to assign work to you on group tasks)
2.

86

3. Average (for example: sometimes coordinating group tasks or activities when
asked, speaking up in groups when you have an idea about the direction the group
should go)

S"

Very high (for example: often taking charge in group activities, motivating others
in groups, representing the groups that you’re involved in to others)

Interpersonal skills is deﬁned as: Communicating and dealing well with others, whether
in informal social situations or more formal school-related situations. Being aware of the
social dynamics of a situation and responding appropriately.

1. Very low (for example: picking ﬁghts with other people, keeping thoughts/or I
feelings bottled up, letting emotions explode, saying inappropriate things) I

2. 2.

3. Average (for example: usually expressing thoughts and feelings effectively,
thinking about what situation you’re in and what type of behavior is appropriate, .
usually remaining calm when interacting with others) «

LII

Very high (for example: almost always clearly and calmly expressing thoughts
and feelings, listening carefully to others and responding appropriately)

Social responsibility is deﬁned as: Being responsible to society and the community, and
demonstrating good citizenship. Being actively involved in the events in one's
surrounding community, which can be at the neighborhood, town/city, state, national, or
college/university level. Activities may include volunteer work for the community,
attending city council meetings, and voting.

 

1. Very low (for example: very rarely voting, rarely participating in community
activities or volunteer work, littering)

2.

3. Average (for example: voting in major elections, occasionally participating in
community activities, signing petitions)

4.

5. Very high (for example: voting in all major and local elections, actively
participating in community activities, helping out neighbors and other community
members)

Adaptabilitignd life skills is deﬁned as: Adapting to a changing environment (at school
or home), dealing well with gradual or sudden and expected or unexpected changes.
Being effective in planning one’s everyday activities and dealing with novel problems
and challenges in life.

1. Very low (for example: frequently getting upset when unexpected events force
you to change your plans, rarely leaving extra time in your schedule in case things
don’t go according to plan)

2.

87

3.

£11

Average (for example: sometimes getting upset when unexpected events force
you to change your plans, sometimes leaving a little bit of extra time in your
schedule in case things don’t go according to plan )

. Very high (for example: rarely getting upset when unexpected events force you to

change your plans, almost always leaving enough time to get everything done
even if things don’t go according to plan)

Ethics and integrity is deﬁned as: Having a well-developed set of values, and behaving in
ways consistent with those values. In everyday life, this probably means being honest, not
cheating (on exams or in committed relationships), and having respect for others.

1.

2.
3.

Very low (for example: cheating on exams, frequently telling lies, worrying very
little about being an ethical person)

Average (for example: usually acting honestly, but sometimes telling lies, might
consider cheating on an exam under certain circumstances)

Very high (for example: almost always behaving honestly, never cheating on an
exam, never unfaithful a signiﬁcant other)

88

Appendix E
Behavioral Observation Scales

The following items ask about various experiences you may have had during college.
You will be asked to estimate the number of times you had each experience. For some
questions, it might be difficult to remember exactly how many times, so take your best
guess. It might help to think of examples of times when you had each experience. (Please
note that the items were not collected in the order as shown. The items were collected
across scales based on the response scale appropriate for each item.)

Please rate how many times during college you...

(Knowledge and Mastery of General Principles)
1. Were on the dean’s list

A. 0 times

B. 1-2 times

C. 3-4 times

D. 5-6 times

E. 7-8 times

F. 9-10 times

G. more than 10 times

2. Were invited to be part of a research group
A. 0 times
B. 1-2 times
C. 3-4 times
D. 5-6 times
E. 7-8 times
F. 9-10 times
G. more than 10 times

3. Were invited to join an honor society
A. 0 times
B. 1-2 times
C. 3-4 times
D. 5-6 times
E. 7-8 times
F. 9-10 times
G. more than 10 times

4. Were recognized publicly by a professor for your class work
A. 0 times
B. 1-4 times
C. 5-9 times
D. 10-14 times

89

E. 15-19 times
F. 20-24 times
G. more than 24 times

 

5. Won or maintained a competitive academic scholarship
A. 0 times
B. 1-4 times
C. 5-9 times
D. 10-14 times
E. 15-19 times
F. 20-24 times
G. more than 24 times

 

 

6. Were paid to tutor a classmate in a course
A. 0 times
B. 1-4 times
C. 5-9 times
D. 10-14 times
E. 15-19 times
F. 20-24 times
G. more than 24 times

 

7. Won an award for an academic project
A. 0 times
B. 1-4 times
C. 5-9 times
D. 10-14 times
E. 15-19 times
F. 20-24 times
G. more than 24 times

8. Participated in an academic competition
A. 0 times
B. 1-10 times
C. 11-20 times
D. 21-30 times
E. 31-40 times
F. 41-50 times
G. more than 50 times

(Continuous Learning)
1. Enrolled in a class outside of school to learn more about a subject you were interested
in

A. 0 times

B. 1-2 times

C. 3-4 times

90

D. 5-6 times

E. 7-8 times

F. 9-10 times

G. more than 10 times

2. Conducted an experiment not required by class
A. 0 times
B. 1-4 times
C. 5-9 times
D. 10-14 times
E. 15-19 times
F. 20-24 times
G. more than 24 times

3. Attended a lecture or talk not required or rewarded by classes
A. 0 times
B. 1-4 times
C. 5-9 times
D. 10-14 times
E. 15-19 times
F. 20-24 times
G. more than 24 times

 

4. Offered information to a teacher that went beyond the information in the course
textbook

A. 0 times

B. 1—10 times

C. 11-20 times

D. 21-30 times

E. 31-40 times

F. 41-50 times

G. more than 50 times

5. Read ahead in a class textbook because you were interested in the subject
A. Never
B. Less than once per year
C. At least once per year
D. At least once per semester
E. At least once per month
F. At least once per week
G. Almost every day

6. Read an educational or scientiﬁc magazine
A. Never
B. Less than once per year
C. At least once per year

91

D. At least once per semester
E. At least once per month

F. At least once per week

G. Almost every day

7. Devoted a regular practice time to develop a skill or a better understanding of
something that interests you

A. Never

B. Less than once per year
C. At least once per year

D. At least once per semester
E. At least once per month

F. At least once per week

G. Almost every day

8. Researched (e. g., getting a book or looking on the intemet) and learned more
information about a topic or question that you found interesting

A. Never

B. Less than once per year
C. At least once per year

D. At least once per semester
E. At least once per month

F. At least once per week

G. Almost every day

9. Read a book or article related to something you found interesting

A. Never

B. Less than once per year
C. At least once per year

D. At least once per semester
E. At least once per month

F. At least once per week

G. Almost every day

(Leadership)
1. Worked as a formal representative for your college or university

A. 0 times

B. 1—2 times

C. 3-4 times

D. 5-6 times

E. 7-8 times

F. 9-10 times

G. more than 10 times

2. Were the team captain or leader for an ofﬁcial school or club sports team

A. 0 times

92

B. 1-2 times

C. 3-4 times

D. 5-6 times

E. 7-8 times

F. 9-10 times

G. more than 10 times

3. Started a new club, organization, or other ofﬁcial group
A. 0 times
B. 1-2 times
C. 3-4 times
D. 5-6 times
E. 7-8 times
F. 9-10 times
G. more than 10 times

4. Were appointed or elected ofﬁcer in a club, professional society, or other organized
interest group

A. 0 times

B. 1-4 times

C. 5-9 times

D. 10-14 times

E. 15-19 times

F. 20-24 times

G. more than 24 times

5. Organized a community event (e. g., a walkathon, a neighborhood picnic, a voter
registration drive)

A. 0 times

B. 1-4 times

C. 5-9 times

D. 10-14 times

E. 15-19 times

F. 20-24 times

G. more than 24 times

6. Encouraged non-participating members of a group to be more active
A. 0 times
B. 1-4 times
C. 5-9 times
D. 10-14 times
B. 15-19 times
F. 20-24 times
G. more than 24 times

7. Acted as the leader of a team for a class project

93

A. Never

B. Less than once per year
C. At least once per year

D. At least once per semester
E. At least once per month

F. At least once per week

G. Almost every day

8. Delegated tasks to a group of people
A. Never
B. Less than once per year
C. At least once per year
D. At least once per semester
E. At least once per month
F. At least once per week
G. Almost every day

(Interpersonal skills)
1. Were told by a friend to stop saying something offensive or embarrassing
A. 0 times
B. l-2 times
C. 3-4 times
D. 5-6 times
E. 7-8 times
F. 9-10 times
G. more than 10 times

2. Helped other people resolve a dispute
A. 0 times
B. 1—10 times
C. 11-20 times
D. 21-30 times
E. 31-40 times
F. 41-50 times
G. more than 50 times

3. Did or said something that seriously offended someone
A. 0 times
B. 1-10 times
C. 11-20 times
D. 21-30 times
E. 31-40 times
F. 41-50 times
G. more than 50 times

4. Hosted a party or large social gathering

94

 

 

 

A. Never

B. Less than once per year
C. At least once per year

D. At least once per semester
E. At least once per month

F. At least once per week

G. Almost every day

5. Comforted a friend who was upset
A. Never
B. Less than once per year
C. At least once per year
D. At least once per semester
E. At least once per month
F. At least once per week
G. Almost every day

6. Made “small talk” with someone you didn’t know very well
A. Never
B. Less than once per year
C. At least once per year
D. At least once per semester
E. At least once per month
F. At least once per week
G. Almost every day

7. Introduced yourself to others at a party or social gathering
A. Never
B. Less than once per year
C. At least once per year
D. At least once per semester
E. At least once per month
F. At least once per week
G. Almost every day

(Social Responsibility)
I. Voted in a school election
A. 0 times
B. 1-2 times
C. 3-4 times
D. 5-6 times
E. 7-8 times
F. 9-10 times
G. more than 10 times

2. Voted in a local or national election

95

 

A. 0 times

B. 1-2 times

C. 3-4 times

D. 5-6 times

E. 7-8 times

F. 9-10 times

G. more than 10 times

3. Participated in a protest or demonstration
A. 0 times
B. 1-2 times
C. 3-4 times
D. 5-6 times
E. 7-8 times
F. 9-10 times
G. more than 10 times

4. Signed a petition
A. 0 times
B. 1-4 times
C. 5-9 times
D. 10-14 times
E. 15-19 times
F. 20-24 times
G. more than 24 times

5. Were a member of a community outreach organization (e.g., Boy/Girl Scouts or Big
Brother/ Sister)

A. 0 times

B. 1-4 times

C. 5-9 times

D. 10-14 times

B. 15-19 times

F. 20-24 times

G. more than 24 times

6. Participated as a member of an ofﬁcial political organization
A. 0 times
B. 1-4 times
C. 5-9 times
D. 10-14 times
E. 15-19 times
F. 20-24 times
G. more than 24 times

7. Donated money or items to a charity organization

96

A. 0 times

B. 1-10 times

C. 11-20 times

D. 21-30 times

B. 31-40 times

F. 4l-50 times

G. more than 50 times

8. Organized or participated in a community event
A. Never
B. Less than once per year
C. At least once per year
D. At least once per semester
B. At least once per month
F. At least once per week
G. Almost every day

9. Were involved in volunteer work
A. Never
B. Less than once per year
C. At least once per year
D. At least once per semester
E. At least once per month
F. At least once per week
G. Almost every day

(Adaptability)
I. Missed deadlines for class projects, work, or other important obligations
A. 0 times
B. 1-4 times
C. 5-9 times
D. 10-14 times
E. 15-19 times
F. 20-24 times
G. more than 24 times

2. Were late paying a bill
A. 0 times
B. 1-4 times
C. 5-9 times
D. 10-14 times
E. 15-19 times
F. 20-24 times
G. more than 24 times

3. Were late to a class, meeting, or other appointment

97

A. 0 times

B. 1-4 times

C. 5-9 times

D. 10-14 times

B. 15-19 times

F. 20-24 times

G. more than 24 times

4. Asked for an extension on an assignment because you didn’t leave enough time to
ﬁnish

A. 0 times

B. 1-4 times

C. 5-9 times

D. 10-14 times

E. 15-19 times

F. 20-24 times

G. more than 24 times

5. Had to back out of prior meetings or responsibilities because you had trouble
managing all of your responsibilities

A. 0 times

B. 1-4 times

C. 5-9 times

D. 10-14 times

E. 15-19 times

F. 20-24 times

G. more than 24 times

6. Went to class unprepared when you could have completed work or readings
A. 0 times
B. l-10 times
C. 11-20 times
D. 21-30 times
B. 31-40 times
F. 41-50 times
G. more than 50 times

7. Produced a poor product for class (e. g. paper, presentation) because you did not start
working on it early enough

A. 0 times

B. 1-10 times

C. 11-20 times

D. 21-30 times

E. 31-40 times

F. 41-50 times

G. more than 50 times

98

(Ethics and Integrity)
I. Lied on a formal document (e.g., school form, work application)
A. 0 times
B. 1-2 times
C. 3-4 times
D. 5-6 times
E. 7-8 times
F. 9-10 times
G. more than 10 times

2. Received a warning from a landlord or were evicted from an apartment
A. 0 times
B. 1—2 times
C. 3-4 times
D. 5-6 times
E. 7-8 times
F. 9-10 times
G. more than 10 times

3. Were arrested for a misdemeanor or received a citation
A. 0 times
B. 1-4 times
C. 5-9 times
D. 10-14 times
E. 15-19 times
F. 20-24 times
G. more than 24 times

4. Were investigated by the campus judicial advisory board
A. 0 times
B. 1-4 times
C. 5-9 times
D. 10-14 times
B. 15-19 times
F. 20-24 times
G. more than 24 times

5. Were issued a parking or speeding ticket
A. 0 times
B. 1-10 times
C. 11-20 times
D. 21-30 times
E. 31-40 times
F. 41-50 times

99

G. more than 50 times

6. Used a fake ID
A. Never
B. Less than once per year
C. At least once per year
D. At least once per semester
E. At least once per month
F. At least once per week
G. Almost every day

7. Stole something or borrowed something without permission
A. Never
B. Less than once per year
C. At least once per year
D. At least once per semester
E. At least once per month
F. At least once per week
G. Almost every day

8. Cheated on an exam, test, or classwork
A. Never
B. Less than once per year
C. At least once per year
D. At least once per semester
E. At least once per month
F. At least once per week
G. Almost every day

9. Lied to someone to cover up something you did
A. Never
B. Less than once per year
C. At least once per year
D. At least once per semester
E. At least once per month
F. At least once per week
G. Almost every day

100

Appendix F
IPIP Sub-scales

On the following pages, there are phrases describing people's behaviors. Please use the
rating scale below to describe how accurately each statement describes you. Describe
yourself as you generally are now, not as you wish to be in the future. Describe yourself
as you honestly see yourself, in relation to other people you know of the same sex as you
are, and roughly your same age. So that you can describe yourself in an honest manner,
your responses will be kept in absolute conﬁdence. Please read each statement carefully,
and then ﬁll in the bubble that corresponds to the number on the scale.

Response Options

1: Very Inaccurate

2: Moderately Inaccurate

3: Neither Inaccurate nor Accurate
4: Moderately Accurate

5: Very Accurate

(Conscientiousness)
Am always prepared.
Pay attention to details.
Get chores done right away.
Like order.
Follow a schedule.
Am exacting in my work.
Leave my belongings around.
Make a mess of things.
Oﬁen forget to put things back in their proper place.
Shirk my duties.

(Neuroticism)
Am relaxed most of the time.
Seldom feel blue.
Get stressed out easily.
Worry about things.
Am easily disturbed.
Get upset easily.
Change my mood a lot.
Have frequent mood swings.
Get irritated easily.
Often feel blue.

(Extraversion)
Am the life of the party.

lOl

Feel comfortable around people.

Start conversations.

Talk to a lot of different people at parties.
Don't mind being the center of attention.
Don't talk a lot.

Keep in the background.

Have little to say.

Don't like to draw attention to myself.
Am quiet around strangers.

(Agreeableness)
Am interested in people.
Sympathize with others' feelings.
Have a soft heart.
Take time out for others.
Feel others' emotions.
Make people feel at ease.
Am not really interested in others.
Insult people.
Am not interested in other people's problems.
Feel little concern for others.

102

Appendix G
Rating Task and Demographics for SMES

Please rate how important you believe the following domains are to success as a college
student:

Very unimportant

Unimportant

Somewhat unimportant

Neither unimportant nor important
Somewhat important

Important

Very important

(remap-99‘s»

2. Knowledge and mastery of general principles is deﬁned as gaining knowledge
and mastering facts, ideas and theories and how they interrelate, and the relevant
contexts in which knowledge is developed and applied. For example, studying for
tests and putting effort into assignments.

3. Continuous learning is deﬁned as: Being intellectually curious and interested in
continuous learning. Actively seeking new ideas and new skills, both in core areas
of study as well as in peripheral or novel areas. For example, learning more than
what is required for courses, searching out interesting topics on the intemet or at
the library.

4. Artistic and cultural appreciation is deﬁned as: Appreciating art and culture, either
at an expert level or simply at the level of one who is interested. For example,
attending plays, musical performances, art galleries or other artistic events, trying
to learn about art and culture.

5. Appreciation for diversity is deﬁned as: Showing openness, tolerance, and interest
in a diversity of individuals and groups (e.g., by culture, ethnicity, religion, or
gender). Actively participating in, contributing to, and inﬂuencing a
heterogeneous environment. For example, speaking in a “politically correct” way,
actively trying to learn about people from other cultures or groups, going to
events sponsored by different cultural groups.

6. Leadership is deﬁned as: Demonstrating skills in a group, such as motivating
others, coordinating groups and tasks, serving as a representative for the group, or
otherwise performing a managing role in a group.

7. Interpersonal skills is deﬁned as: Communicating and dealing well with others,
whether in informal social situations or more formal school-related situations.
Being aware of the social dynamics of a situation and responding appropriately.

103

8. Social responsibility is deﬁned as: Being responsible to society and the
community, and demonstrating good citizenship. Being actively involved in the
events in one's surrounding community, which can be at the neighborhood,
town/city, state, national, or college/university level. Activities may include
volunteer work for the community, attending city council meetings, and voting.

9. Physical and Psycholggical Health is deﬁned as: Possessing the physical and
psychological health required to engage actively in a scholastic environment. This
would include participating in healthy behaviors, such as eating properly,
exercising regularly, and maintaining healthy personal and academic relations
with others, as well as avoiding unhealthy behaviors, such as alcohol/drug abuse,
unprotected sex, and ineffective or counterproductive coping behaviors.

1.0. Career orientation is deﬁned as: Having a clear sense of career one aspires to
enter into, which may happen before entry into college, or at any time while in
college. Establishing, prioritizing, and following a set of general and speciﬁc
career-related goals.

1 1. Adaptability and life skills is deﬁned as: Adapting to a changing environment (at
school or home), dealing well with gradual or sudden and expected or unexpected
changes. Being effective in planning one’s everyday activities and dealing with
novel problems and challenges in life.

12. Perseverance is deﬁned as: Committing oneself to goals and priorities set,
regardless of the difﬁculties that stand in the way. Goals range from long-term
goals (e.g., graduating from college) to short-term goals (e.g., showing up for
class every day even when the class isn’t interesting).

13. Ethics and integrity is deﬁned as: Having a well-developed set of values, and
behaving in ways consistent with those values. In everyday life, this probably
means being honest, not cheating (on exams or in committed relationships), and
having respect for others.

How supportive is your family of your pursuits in higher education?

very unsupportive

unsupportive

somewhat unsupportive

neither unsupportive nor supportive
somewhat supportive

supportive

very supportive

garage-99‘s»

What is the highest level of education attained by your mother?

104

.09-99‘s»

below high school

high school diploma
two-year Associate’s degree
four-year Bachelor’s degree
Graduate degree

What is the highest level of education attained by your father?

.09-99‘s»

below high school

high school diploma
two-year Associate’s degree
four-year Bachelor’s degree
Graduate degree

105

Appendix H
Instructions for Verbal Protocol Task
Practice Items for Verbal Protocol Participants

As you answer the following questions, please try say all of your thoughts out loud as
you go. Please describe your thoughts, feelings and choices about what you are doing and
reading. It might be your reactions, reasoning, or even something you are reminded of.
Please don’t censor your thoughts. Even if a thought does not seem relevant to the task,
it is of interest to me. Remember, everything you say to me today will be kept
conﬁdential.

To help you get used to saying all of your thoughts out loud, we’re going to go through
two practice items. Once you feel comfortable with the think-aloud technique, then I will
ask you to complete six additional items.

1. You are shopping when you notice a man robbing the store. What would you do?

a) Leave the store as quickly as possible and call the police.

b) Try to apprehend the robber yourself.

c) Follow the man and call the police as soon as he appears settled somewhere.
(1) Nothing, as you do not wish to get involved in the matter.

2. Your professor recently passed out exams that your class took last week. Everyone
except you was given an extra 5 points to make up for some errors the professor made in
writing the test. What would you do?

a) Assume it was a mistake and speak to your professor.

b) Confront your professor regarding why are being treated unfairly.

c) Assume that the professor added 5 points to your score but forgot to indicate the
score change on the test you received back.

(I) Complain to the head of the department.

e) Drop the class.

106

Appendix I

Coding Scheme:
Reasons Why Someone Would or Would Not Choose a Response Option

The response would not be effective because...

.V‘PP’NT"

It is too difﬁcult or I don’t have the qualities necessary to pull it off.
I’ve done this in the past and it did not work for me.

Something else would need to be added for this option to work.

It would not solve the problem.

It is too extreme.

The response might be effective, but it would be undesirable because...

owsg

10.
11.
12.
13.
14.

It would take too much time or require too much effort.

I just don’t like to do this or I am not willing to do this.

It is immature.

I don’t have the right to do this.

It would be unfair to me or to others.

It is too confrontational or would offend other people.

It could be harmful to me (for example, it could backﬁre or overburden me).
I prefer to take responsibility myself rather than relying on others.

I prefer to seek others’ opinions rather than make decisions unilaterally.

The response would be effective because...

15.
16.
17.
18.

19.
20.
21.

It would work quickly or with little effort.

This is something I’ve done in the past and it worked for me.
This would be the kind or thoughtful option.

I think this would work well because of what I know about human nature (for
example, setting clear goals helps motivate people).

Seeking help from an authority can be helpful.

I like this because it is fair.

It is what I am “supposed” to do (for example, it follows the rules).

Other reasons:

22.
23.

24.
25.
26.

I wouldn’t think of this on my own.

This option does not provide enough information for me to know if it would
work.

Group problems and issues should be dealt with inside a group.

This option sounds like something I would do because of my personality.

I don’t think this would work, but I know I would do it anyway.

107

References

Bauer, T. N., & Truxillo, D. M. (2006). Applicant reactions to situational judgment tests:
Research and related practical issues. In J .A. Weekley, & R.E. Ployhart (Eds.),
Situational judgment tests: Theory, measurement, and application (pp. 233-249).
Mahwah, NJ, US: Lawrence Erlbaum Associates Publishers.

Beauregard, R. S. (2000). Construct explication of a situational judgment test:
Addressing multidimensionality through item development, content analysis, and
scoring procedures. Unpublished doctoral dissertation, Wright State University,
Dayton OH.

Cable, D. M., & Judge, T. A. (1996). Person-organization ﬁt, job choice decisions, and
organizational entry. Organizational Behavior and Human Decision Processes,
6 7, 294-31 1.

Cable, D. M., & Judge, T. A. (1997). Interviewers’ perceptions of person-organization ﬁt
and organizational selection decisions. Journal of Applied Psychology, 82, 546-
56].

Cable, D. M., & Parsons, C. K. (2001). Socialization tactics and person-organization ﬁt.
Personnel Psychology, 54, 1-23.

Caspi, A., Roberts, B. W., & Shiner, R. L. (2005). Personality development: Stability and
change. Annual Review of Psychology, 56, 453-484.

Chan, D., & Schmitt, N. (1997). Video-based versus paper-and-pencil method of
assessment in situational judgment tests: Subgroup differences in test

performance and face validity perceptions. Journal of Applied Psychology, 82,
143-159.

Chan, D., & Schmitt, N. (2006). Situational judgment tests: Method or construct? In J .A.
Weekley, & R.E. Ployhart (Eds.), Situational judgment tests: Theory,
measurement, and application (pp. 135-155). Mahwah, NJ, US: Lawrence
Erlbaum Associates Publishers.

Clevenger, J. P., & Haaland, D. E. (2000, April). The relationship between job knowledge
and situational judgment test performance. Paper presented at the 15th annual
conference of the Society for Industrial and Organizational Psychology, New
Orleans, LA.

England, G. W. (1971). Development and use of weighted application blanks. (Bulletin
No. 55). Minneapolis: University of Minnesota, Industrial Relations Center.

108

George, J. M. (1990). Personality, affect, and behavior in groups. Journal of Applied
Psychology, 75, 107-116.

Glaser, R., Lesgold, A., & Gott, S. (1991). Implications of cognitive psychology for
measuring in job performance. In A. K. Wigdor, B. F. Green, Jr. (eds.),
Performance Assessment for the Workplace (Vol. 2, pp. 1-26). Washington, DC:
National Academy Press.

Goldberg, L. R. (1999). A broad-bandwidth, public-domain, personality inventory
measuring the lower-level facets of several ﬁve-factor models. In I. Mervielde, I.
Deary, F. De F ruyt, & F. Ostendorf (Eds.), Personality psychology in Europe
(Vol. 7, pp. 7-28). Tilburg, the Netherlands: Tilburg University Press.

Hogan, J. B. (1994). Empirical keying of background data measures. In G.S. Stokes,
M.D. Mumford, & W.A. Owens (Eds.), Biodata handbook: Theory, research, and
use of biographical information in selection and performance prediction (pp. 69-
107). Palo Alto, CA: Consulting Psychologists Press, Inc.

Hooper, A. C., Cullen, M. J ., & Sackett, P. R. (2006). Operational threats to the use of
SJTs: Faking, coaching, and retesting issues. In J .A. Weekley, & R.E. Ployhart

(Eds.), Situational judgment tests: Theory, measurement, and application (pp.
205-232). Mahwah, NJ, US: Lawrence Erlbaum Associates Publishers.

Hough, L., & Paullin, C. (1994). Construct-oriented scale construction. In G.S. Stokes,
M.D. Mumford, & W.A. Owens (Eds.), Biodata handbook: Theory, research, and
use of biographical information in selection and performance prediction (pp. 109-
145). Palo Alto, CA: Consulting Psychologists Press, Inc.

Jackson, S. E., Brett, J. F., Sessa, V. 1., Cooper, D. M., Julin, J. A., & Peyronnin, K.
(1991). Some differences make a difference: Individual dissimilarity and group

heterogeneity as correlates of recruitment, promotions, and turnover. Journal of
Applied Psychology, 76, 675-689.

Judge, T. A., & Cable, D. M. (1997). Applicant personality, organizational culture, and
organization attraction. Personnel Psychology, 50, 359-394.

Lawrence, B. S. (1997). The black box of organizational demography. Organization
Science, 8, 1-22.

Magnus, K., Diener, E., Fujita, F ., & Pavot, W. (1993). Extraversion and neuroticism as
predictors as objective life events: A longitudinal analysis. Journal of Personality
and Social Psychology, 65, 1046-1053.

McDaniel, M. A., Hartman, N. S., Whetzel, D. L., Grubb, W. L. (2007). Situational

judgment tests, response instructions, and validity: A meta-analysis. Personnel
Psychology, 60, 63-91.

109

McDaniel, M. A., Morgeson, F. P., Finnegan, E. B., Campion, M. A., & Braverman, E.
(2001). Use of situational judgment tests to predict job performance: A
clariﬁcation of the literature. Journal of Applied Psychology, 86, 730-740.

McDaniel, M. A., & Nguyen, N. T. (2001). Situational judgment tests: A review of
practice and constructs assessed. International Journal of Selection and
Assessment, 9, 103-113.

McKenzie, J. (1989). Neuroticism and academic achievement: The F umeaux factor.
Personality and Individual Differences, 10, 509—5 15.

McKenzie, J., Taghavi-Knosary, M., & Tindell, G. (2000). Neuroticism and academic
achievement: The F umeaux factor as a measure of academic rigour. Personality
and Individual Differences, 29, 3—1 1.

Meng, X., Rosenthal, R., & Rubin, D. B. (1992). Comparing correlated correlation
coefﬁcients. Psychological Bulletin, 111, 172-175.

Mitchell, T. W., & Klimoski, R. J. (1982). Is it rational to be empirical? A test of
methods for scoring biographical data. Journal of Applied Psychology, 67, 411-
418.

Motowidlo, S. J., Diesch, A. C., & Jackson, H. L. (2003, April). Using the situational
judgment format to measure personality characteristics. Paper presented at the
18'h annual conference of the Society for Industrial and Organizational
Psychology, Orlando, FL.

Motowidlo, S. J ., Dunnette, M. D., & Carter, G. W. (1990). An alternative selection
procedure: The low-ﬁdelity simulation. Journal of Applied Psychology, 75, 640-
647.

Motowidlo, S. J ., Hooper, A. C., & Jackson, H. L. (2006). A theoretical basis for
situational judgment tests. In J .A. Weekley, & R.E. Ployhart (Eds.), Situational
judgment tests: Theory, measurement, and application (pp. 57-81). Mahwah, NJ,
US: Lawrence Erlbaum Associates Publishers.

Motowidlo, S. J ., & Tippins, N. (1993). Further studies of the low-ﬁdelity simulation in
the form of a situational inventory. Journal of Occupational and Organizational
Psychology, 66, 337-344.

Nettle, D. (2006). The evolution of personality variation in humans and other animals.
American Psychologist, 61, 622-631.

Oswald, F. L., Schmitt, N., Kim, B. H., Ramsay, L. J ., & Gillespie, M. A. (2004).

Developing a biodata measure and situational judgment inventory as predictors of
college student performance. Journal of Applied Psychology, 89, 187-207.

110

Pfeffer, J. (1983). Organizational demography. In L.L. Cummings & B.M. Staw (Eds.)
Research in organizational behavior (pp. 299-357). Greenwich, CT: J AI Press.

Ployhart, R. E. (2006). The predictor response process model. In J .A. Weekley, & R.E.
Ployhart (Eds.), Situational judgment tests: Theory, measurement, and application
(pp. 83-105). Mahwah, NJ, US: Lawrence Erlbaum Associates Publishers.

Ployhart, R. E., & Erhart, M. G. (2003). Be careful what you ask for: Effects of response
instructions on the construct validity and reliability of situational judgment tests.
International Journal of Selection and Assessment, 1 I , 1-16.

Ployhart, R. E., & Ryan, A. M. (2000, April). Integrating personality tests with
situational judgment tests for the prediction of customer service performance.
Paper presented at the 15th annual conference of the Society for Industrial and
Organizational Psychology, New Orleans, LA.

Porr, W. B., & Ployhart, R. E. (2004, April). The validity of empirically and construct-
oriented situational judgment tests. Paper presented at the 19th annual conference
of the Society for Industrial and Organizational Psychology, Chicago, IL.

Pulakos, E. D., & Schmitt, N. (1996). An evaluation of two strategies for reducing
adverse impact and their effects on criterion-related validity. Human
Performance, 9, 241 -258.

Reilly, R. R., & Chao, G. T. (1982). Validity and fairness of some alternative employee
selection procedures. Personnel Psychology, 35, 1-62.

Sacco, J. M., Scheu, C., Ryan, A. M., Schmitt, N. W. (2000, April). Understanding race
differences on situational judgment tests using readability statistics. Paper
presented at the 14'h annual conference of the Society for Industrial and
Organizational Psychology, New Orleans, LA.

Sacco, J. M., Schmidt, D. B., & Rogg, K. L. (2000, April). Using readability statistics
and reading comprehension scores to predict situational judgment test
performance, black-white diﬂerences, and validity. Paper presented at the 14th

annual conference of the Society for Industrial and Organizational Psychology,
New Orleans, LA.

Sackett, P. R., Schmitt, N., Ellingson, J. E., & Kabin, M. B. (2001). High-stakes testing in
employment credentialing, and higher education: Prospects in a post-afﬁrrnative-
action world. American Psychologist, 56, 302-318.

Schneider, B. (1987). The people make the place. Personnel Psychology, 40, 437-454.

Thayer, P. W. (1977). Somethings old, somethings new. Personnel Psychology, 30, 513-
524.

111

Trippe, M. D., & Foti, R. J. (2003, April). An evaluation of the construct validity of
situational judgment tests. Paper presented at the 18th annual conference of the
Society for Industrial and Organizational Psychology, Orlando, FL.

Weekley, J. A., Ployhart, R. E., & Holtz, B. C. (2006). On the development of situational
judgment tests: Issues in item development, scaling, and scoring. In J .A. Weekley,
& R.E. Ployhart (Eds.), Situational judgment tests: Theory, measurement, and
application (pp. 157-182). Mahwah, NJ, US: Lawrence Erlbaum Associates
Publishers.

Whetzel, D. L., McDaniel, M. A., Nguyen, N. T. (2008). Subgroup differences in

situational judgment test performance: A meta-analysis. Human Performance, 2] ,
291-309.

112