AN EXAMINATION OF STEREOTYPE THREAT EFFECTS ON KNOWLEDGE ACQUISITION IN AN EXPLORATORY LEARNING PARADIGM By James Grand A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILSOPHY Psychology 2012 ABSTRACT AN EXAMINATION OF STEREOTYPE THREAT EFFECTS ON KNOWLEDGE ACQUISITION IN AN EXPLORATORY LEARNING PARADIGM By James Grand Stereotype threat describes the situation where an individual is faced with the risk of upholding a negative stereotype about a subgroup to which that person belongs based on his/her actions (Steele & Aronson, 1995). Empirical investigations of stereotype threat effects across a variety of individuals, subgroups, and contexts have identified a number of undesirable consequences related to performance on domain-relevant tasks (e.g., Nguyen & Ryan, 2008; Steele, 1997; Steele & Aronson, 1998; Steele, Spencer & Aronson, 2002). Efforts to identify the psychological mechanisms and processes most directly affected by stereotype threat have indicated that one of its most detrimental influences is exerted on individuals’ working memory capacity. More specifically, the added cognitive and emotional-regulatory strain introduced by the presence of stereotype threat uses up a portion of one’s limited working memory capacity, thus “hijacking” cognitive resources that could otherwise have been put towards completing task-relevant activities/performance (Schmader, Johns, & Forbes, 2008). Given the importance of working memory to the development of new knowledge, skills, and abilities (cf., Feldman Barrett, Tugade, & Engle, 2004), the primary goal of the present investigation was to extend and build upon recent research examining the acquisition of taskrelevant knowledge by individuals facing conditions of stereotype threat during their learning activities (Rydell, Rydell, & Boucher, 2010; Rydell, Shiffrin, Boucher, Van Loo, & Rydell, 2010). Guided by an empirically grounded taxonomy of critical learning outcomes (Kraiger, Ford, & Salas, 1993), the knowledge organization and development of task strategies were examined for 145 female learners assigned into either stereotype threat or control conditions. Individuals were tasked with learning to operate a low-fidelity computer-based radar tracking simulation over the course of three experimental sessions held on consecutive days. Based on principles of active learning (Bell & Kozlowski, 2008), the presentation of content/materials followed an exploratory learning paradigm which facilitates task comprehension through the use/improvement of learners’ inferential reasoning capabilities (e.g., McDaniel & Schlager, 1990). Key findings of this study indicated that, unlike females who learned the task under control conditions, females facing stereotype threat experienced the greatest difficulty acquiring effective heuristics critical to improving task performance. Examination of participants’ knowledge structures revealed that although female learners under stereotype threat were capable of deducing advanced relations amongst relevant task concepts over time, they appeared to do so in a manner that was far less efficient and, consequently, less conducive to performance when required to apply their knowledge in more demanding task conditions. Further analyses indicated that females under conditions of stereotype threat were not only less accurate at applying their learned knowledge to task-critical decisions, but the manner in which they had learned to interpret information presented to them in the task was generally also less optimal. Lastly, the observed pattern of results revealed that the above effects did not manifest immediately during initial onset of learning activities and required time for meaningful differences to emerge, suggesting that longitudinal examinations of stereotype threat effects are an important direction for future research. Copyright by James Grand 2012 ACKNOWLEDGEMENTS I would like to thank my parents (Barry and Kathy), brother (William), and sister (Lacy) for the support and encouragement they have provided, and continue to provide, in everything I do; I could not have asked for a more loving or caring family. I would also like to thank Jennifer Wessel for her friendship throughout our graduate school career, as well as the personal and professional relationship that has emerged from that; I am forever grateful for the happiness and fun you bring to my life. Finally, I would like to thank the members of my dissertation committee—Ann Marie Ryan, Steve Kozlowski, Neal Schmitt, Tim Pleskac, and Georgia Chao—whose investment in my education and the challenges they have pushed me to take on have left a lasting impact which will not soon be forgotten. v TABLE OF CONTENTS LIST OF TABLES ....................................................................................................................... viii LIST OF FIGURES ........................................................................................................................ x INTRODUCTION .......................................................................................................................... 1 An Overview of Stereotype Threat: Theory, Consequences, and Criticisms ............................. 5 Nature of stereotype threat. ..................................................................................................... 6 Outcomes of stereotype threat. ............................................................................................. 11 Criticisms of stereotype threat. ............................................................................................. 18 Stereotype Threat at Learning: Rationale, Applications, and Implications .............................. 22 Conceptual background for stereotype threat at learning. .................................................... 22 Examinations of stereotype threat at learning. ...................................................................... 30 Research/practical implications of stereotype threat at learning. ......................................... 37 Research Hypotheses ................................................................................................................ 42 Stereotype threat and knowledge organization. .................................................................... 43 Stereotype threat and cognitive strategy ............................................................................... 51 Stereotype threat and performance. ...................................................................................... 55 METHOD ..................................................................................................................................... 57 Participants................................................................................................................................ 57 Experimental Task .................................................................................................................... 58 Procedure .................................................................................................................................. 63 Online signup. ....................................................................................................................... 63 Experimental sessions. .......................................................................................................... 63 Task introduction and familiarization trial. ...................................................................... 65 Practice trials. .................................................................................................................... 66 Performance trial. .............................................................................................................. 67 Exploratory learning recommendations. ........................................................................... 69 Experimental manipulation ............................................................................................... 71 Measures ................................................................................................................................... 73 Demographics. ...................................................................................................................... 73 Cognitive ability. ................................................................................................................... 73 Math domain identification. .................................................................................................. 74 Working memory. ................................................................................................................. 74 Metacognitive activity. ......................................................................................................... 76 Manipulation checks. ............................................................................................................ 77 Declarative knowledge. ......................................................................................................... 78 Knowledge structure assessment. ......................................................................................... 78 Strategic learning behaviors. ................................................................................................. 85 Decision-making strategy. .................................................................................................... 87 Task performance. ................................................................................................................. 87 vi RESULTS ..................................................................................................................................... 89 Descriptive Statistics and Data Cleaning .................................................................................. 89 Manipulation Check ................................................................................................................ 105 Knowledge Structure Analyses ............................................................................................... 105 Knowledge structure similarity ........................................................................................... 110 Knowledge structure correlation ......................................................................................... 112 Knowledge structure coherence .......................................................................................... 113 Number of knowledge structure links ................................................................................. 114 Knowledge structure clustering .......................................................................................... 115 Cognitive Strategy Analyses ................................................................................................... 127 Strategic learning behaviors ................................................................................................ 127 Knowledge acquisition behaviors ................................................................................... 130 Task practice behaviors................................................................................................... 137 Self-regulation................................................................................................................. 139 Decision-making strategy ................................................................................................... 143 Task Performance ................................................................................................................... 160 DISCUSSION ............................................................................................................................. 168 Summary of Key Findings ...................................................................................................... 171 Stereotype Threat Effects on Knowledge Organization ......................................................... 172 Stereotype Threat Effects on Cognitive Strategy Acquisition ................................................ 179 Stereotype Threat Effects on Task Performance .................................................................... 185 Implications and Directions for Future Research ................................................................... 188 Study Limitations and Generalizability .................................................................................. 193 Conclusion .............................................................................................................................. 196 FOOTNOTES ............................................................................................................................. 198 APPENDICES ............................................................................................................................ 202 APPENDIX A ............................................................................................................................. 203 APPENDIX B ............................................................................................................................. 205 APPENDIX C ............................................................................................................................. 207 APPENDIX D ............................................................................................................................. 209 APPENDIX E ............................................................................................................................. 222 APPENDIX F.............................................................................................................................. 224 APPENDIX G ............................................................................................................................. 225 APPENDIX H ............................................................................................................................. 226 APPENDIX I .............................................................................................................................. 227 APPENDIX J .............................................................................................................................. 229 APPENDIX K ............................................................................................................................. 236 REFERENCES ........................................................................................................................... 238 vii LIST OF TABLES Table 1. Summary of Rydell and Colleagues’ Multi-study Experiments Examining Stereotype Threat Effects at Learning .............................................................................................. 31 Table 2. Total Sample Size and Attrition Rates across Days by Sex and Experimental Condition .................................................................................................. 58 Table 3. Subdecision Outcomes and Relevant Identifying Information Cues/Values in TANDEM ................................................................................................................... 61 Table 4. Rules of Engagement for Determining Final Engagement Decisions ............................ 62 Table 5. Summary of Experimental Session Sequence and Timings ........................................... 64 Table 6. Distribution of Target Characteristics Across all Scenarios for Practice Trial Targets (n = 63) and Performance Trial Targets (n = 126).......................................................... 68 Table 7. TANDEM Knowledge Concepts with Descriptions ....................................................... 79 Table 8. Relative Probabilities Shared between Subdecision Outcomes and Final Engagement Decision Outcomes .................................................................................... 82 Table 9. Means, Standard Deviations and Interrcorrelations for Study Variables ....................... 90 Table 10. MRCM Parameter Estimates for Female’s Knowledge Structure Similarity with Males and the Top 15 Performers (Hypotheses 1 & 6) ...................................... 111 Table 11. MRCM Parameter Estimates for Female’s Knowledge Structure Correlation with Males and the Top 15 Performers (Hypotheses 2 & 7) ...................................... 113 Table 12. MRCM Parameter Estimates for Female’s Knowledge Structure Coherence (Hypotheses 3 & 8) ..................................................................................................... 114 Table 13. MRCM Parameter Estimates for Number of Links in Female’s Knowledge Structures (Hypotheses 4 & 9) .................................................................................... 115 Table 14. MRCM Parameter Estimates for Graph Theoretic Metrics (Exploratory Analyses) .. 128 Table 15. MRCM Parameter Estimates for Time Spent on Task Manual Sections (Hypothesis 11) ........................................................................................................... 133 Table 16. MRCM Parameter Estimates for Task Practice Behaviors (Hypothesis 11) .............. 138 viii Table 17. MRCM Parameter Estimates for Female’s Metacognitive Activity (Hypothesis 11) ........................................................................................................... 142 Table 18. MRCM Parameter Estimates for Performance Outcomes Measured during Learning/Practice Trials (Hypothesis 13 & 14) .......................................................... 162 Table 19. MRCM Parameter Estimates for Performance Outcomes Measured during Performance Trials (Hypothesis 13 & 14) .................................................................. 164 Table 20. MRCM Parameter Estimates for Performance on the Declarative Knowledge Assessments ................................................................................................................ 167 Table 21. Hypothesis Summary .................................................................................................. 169 ix LIST OF FIGURES Figure 1. Conceptualization of stereotype threat as a cognitive imbalance triggered by person and/or situation factors (Schmader, Johns, & Forbes, 2008)........................................... 7 Figure 2. Integrated process model of stereotype threat effects on performance (Schmader, Johns, & Forbes, 2008) .................................................................................................. 13 Figure 3. Classification system for learning outcomes (adapted from Kraiger, Ford, & Salas, 1993) ................................................................................................................... 24 Figure 4. TANDEM graphical user interface ............................................................................... 60 Figure 5. Sequencing of exploratory learning recommendations and manipulation instructions during daily practice rounds ...................................................................... 70 Figure 6. Cognitive strategy heuristic for TANDEM performance .............................................. 86 Figure 7. Knowledge structures for female participants in the stereotype threat and control conditions averaged across days ..................................................................... 117 Figure 8. Average knowledge structures for female participants in the stereotype threat and control conditions at end of Day 1 ........................................................................ 118 Figure 9. Average knowledge structures for female participants in the stereotype threat and control conditions at end of Day 2 ........................................................................ 119 Figure 10. Average knowledge structures for female participants in the stereotype threat and control conditions at end of Day 3...................................................................... 120 Figure 11. Cumulative average time spent viewing manual pages during learning trials .......... 131 Figure 12. Average time spent viewing manual pages during learning trials ............................. 132 Figure 13. Female’s average task practice behaviors across learning trials ............................... 140 Figure 14. Observed and optimal decision weights for Type cue (Surface) on decision to Warn rather than Clear targets for stereotype threat females and control females at each day ................................................................................................................. 151 Figure 15. Observed and optimal decision weights for Type cue (Surface) on decision to Mark rather than Clear targets for stereotype threat females and control females at each day ................................................................................................................. 152 x Figure 16. Observed and optimal decision weights for Type cue (Sub) on decision to Warn rather than Clear targets for stereotype threat females and control females at each day ................................................................................................................. 153 Figure 17. Observed and optimal decision weights for Type cue (Sub) on decision to Mark rather than Clear targets for stereotype threat females and control females at each day ................................................................................................................. 154 Figure 18. Observed and optimal decision weights for Class cue (Military) on decision to Mark rather than Clear targets for stereotype threat females and control females at each day ................................................................................................................. 155 Figure 19. Observed and optimal decision weights for Class cue (Military) on decision to Mark rather than Clear targets for stereotype threat females and control females at each day ................................................................................................................. 156 Figure 20. Observed and optimal decision weights for Intent cue (Hostile) on decision to Warn rather than Clear targets for stereotype threat females and control females at each day ................................................................................................................. 157 Figure 21. Observed and optimal decision weights for Intent cue (Hostile) on decision to Mark rather than Clear targets for stereotype threat females and control females at each day ................................................................................................................. 158 xi INTRODUCTION White men can’t jump, women can’t drive, and three men can’t take care of a baby. Aside from their familiarity as popular entertainment punch lines, each of these events also share a subtler and potentially more surprising feature—under the right circumstances, empirical research suggests there is some truth to their claims (Bosson, Haymovitz, & Pinel, 2004; Stone, Lynch, Sjomerling, & Darley, 1999; Yeung & von Hippel, 2008). The specific circumstances in question here refer to instances of stereotype threat, a predicament in which an individual is faced with “the risk of confirming, as self-characteristic, a negative stereotype about one’s group” based on his/her actions (p. 797, Steele & Aronson, 1995). More specifically, stereotype threat theory posits that the presence of a culturally-shared stereotype which implicates a subgroup is less capable at specific domain tasks or possesses deficient knowledge, skills, or abilities in a domain can lead to a variety of undesirable consequences for individuals identified with the disadvantaged subgroup on domain-relevant tasks (Steele, 1997; Steele & Aronson, 1998; Steele, Spencer & Aronson, 2002). Although the most widely documented of these undesirable consequences is reduced performance on intellective ability/knowledge tests (e.g., Cole, Matheson, & Anisman, 2007; Good, Aronson, & Harder, 2008; Keller, 2007; Spencer, Steele, & Quinn, 1999; Steele & Aronson, 1995; Walton & Cohen, 2003; see Nguyen & Ryan, 2008, for a meta-analysis), stereotype threat has been linked to a variety of other negative outcomes as well. Poorer functioning at physical and social activities (Stone & McWhinnie, 2008; Kray, Galinksy, & Thompson, 2002, respectively), a higher prevalence of internal versus external attributions to failure (Koch, Müller, & Sieverding, 2008), greater engagement in self-handicapping behaviors (erecting barriers to performance that provide a “fallback excuse” for potential failures, Keller, 1 2002; Steele & Aronson, 1995; Stone, 2002), adoption of performance-avoidance goals (Brodish & Devine, 2009; Smith, 2004; 2006; Smith, Sansone, & White, 2007), discounting the validity, importance, or appropriateness of a task (Keller, 2002; Lesko & Corpus, 2006), and attempts to distance oneself from the stereotyped group (Pronin, Steele, & Ross, 2004; Steele & Aronson, 1995) or “disengage” from the task domain (Crocker, Major, & Steele, 1998; Major, Spencer, Schmader, Wolfe, & Crocker, 1998) have all been linked to stereotype threat. While stereotype threat is most commonly invoked in explanations for the underachievement of minorities (i.e., females and non-Whites) in specific domains, the effect has been shown to generalize to majority subgroups as well. For example, when faced with stereotypes about Asian students’ superiority at mathematics and intelligence testing, White males have been shown to perform significantly worse on mathematics tests and to disengage more strongly from the task domain by diminishing the importance/self-relevance of their intellect than in situations where no such stereotype is mentioned (Aronson, Lustina, Good, Keough, Steele, & Brown, 1999; von Hippel, von Hippel, Conway, Preacher, Schooler, & Radvansky, 2005). The demonstrable effects of stereotype threat thus span a variety of outcomes and are inclusive of virtually all subgroup categories. However, the examination and application of stereotype threat as a phenomenon of interest has primarily been restricted to instances in which achievement or evaluation are the central criteria. Stated differently, the development of stereotype threat theory and investigations of its influence have primarily been of interest to researchers and practitioners at performance, defined here as any point in time where an individual is asked to demonstrate some domain knowledge, skill, or ability for the purposes of explicitly diagnosing or measuring that individual’s domain competence. Although a subtle distinction (and one that has been inherent in treatments of the concept since its inception, Steele 2 & Aronson, 1995), this limited scope takes for granted a fundamental tenet and extrapolation of the theory: while the negative stereotypes which lend strength to a situational threat are domain/ability-specific, the influence of those stereotypes are not necessarily restricted to the manner by which the domain is encountered or the capability expressed. For example, regardless of whether women are asked to complete a difficult test of mathematical ability (an explicitly evaluative context, Spencer et al., 1999), teach young students whose mathematical ability is later assessed (context in which performance of the female instructor is not directly evaluated, Beilock, Gunderson, Ramirez, & Levine, 2010), or learn about novel mathematical operations (context in which performance is not the immediate focus of attention, Rydell, Rydell, & Boucher, 2010), the stereotype “women are less proficient at mathematics” is equally relevant to women participating in these activities. Of particular interest is the latter of these examples, which implies that stereotype threat could potentially impair the knowledge acquisition process of affected individuals. Though variability in the definition of “intelligence” and related theories about the extent to which individual differences contribute to intellective performance abound (e.g., Sternberg, Conway, Ketron, & Bernstein, 1981; Sternberg & Grigorenko, 2004; Wagner & Sternberg, 1984), there is virtually no disagreement that learning, training, and the accrual of performance-relevant knowledge and experience is critical to successful performance achievement (cf., Baldwin, Ford, & Blume, 2009; Goldstein & Ford, 2002; Kraiger, Ford, & Salas, 1993). To the extent that domain-relevant negative stereotypes impact the acquisition of knowledge, skills, or abilities needed by individuals to effectively perform in a given domain, the subsequent domain achievement of threatened individuals would also be expected to suffer. As will be elaborated further in the sections to follow, this possibility has significant practical and research 3 implications—not least of which is that the current state of the literature is unable to adequately answer whether examinations of stereotype threat at performance are instances of an insidious situational pressure producing differences between individuals who possess equally sturdy intellective foundations or one that capitalizes on preexisting instabilities. This simple yet integral concept serves as the primary impetus of the present research effort. The goal of this study is to extend the conceptualization and associated consequences of stereotype threat theory by examining its effects at learning, defined here as any time during which individuals engage in non-evaluative experiences and activities designed to contribute to the development of one’s competencies/capabilities through the acquisition and retention of domain-/task-relevant knowledge. The conceptual and methodological rationale for this research begins with an overview of stereotype threat theory’s conceptualization, consequences (both proximal and distal), and criticisms. Attention is next directed towards a discussion of learning in the context of stereotype threat theory. Using Kraiger et al.’s (1993) learning outcomes classification scheme as an organizing framework, research from the literature on learning/knowledge acquisition is summarized to characterize the manner by which stereotype threat effects are likely to impact these processes as well as delineate the specific conceptualization of learning pursued in the present study. This section also includes a detailed examination of the first published studies investigating the role of stereotype threat during learning (Rydell, Rydell, & Boucher, 2010; Rydell, Shiffrin, Boucher, Van Loo, & Rydell, 2010) in order to provide some context regarding the contributions which the present research stands to add. Lastly, the formal research hypotheses and their accompanying rationale are advanced which lay out the intended direction of the present study. 4 An Overview of Stereotype Threat: Theory, Consequences, and Criticisms No doubt owing to its provocative conclusions and relatively intuitive logic, stereotype threat theory’s account for why some groups of individuals have tended to underperform in specific domains has stimulated interest in both media (e.g., Chandler, 1999; Cloud, 2009; Rivers, 2007) and scholarly outlets. In the 15+ years since Steele and Aronson (1995) published their seminal article on the topic, over 300 empirical studies have been published which attempt to examine the achievement deficiencies elicited by stereotype threat effects. As noted previously, the most common investigations of stereotype threat at performance have been directed towards explaining group performance discrepancies in cognitive ability testing. Examples of stereotype threat’s effects have been documented across a variety of testing domains and subgroups, including females and mathematical ability testing (Good et al, 2008; Spencer et al., 1999; Walsh, Hickey, & Duffy, 1999), Black students on general cognitive ability exams (Brown & Day, 2006; McKay, Doverspike, Bowen-Hilton & Martin, 2002, 2003; Steele & Aronson, 1995), and low socioeconomic status individuals on verbal ability tests (Croizet & Claire, 1998; Harrison, Stevens, Monty, & Coakley, 2006), among many others. Nevertheless, the theoretical basis for the process by which stereotype threat is experienced by individuals and ultimately exerts its influence on their performance/achievement is believed to be the same regardless of its application. In their earliest formulation, Steele and Aronson (1995) proposed that stereotype threat operates by activating a number of proximal affective, behavioral, and cognitive mechanisms detrimental to performance, including “distraction, narrowed attention, anxiety, self-consciousness, withdrawal of effort, [or] overeffort” (p. 809). Although these intervening processes were believed to vary in importance and salience based on the conditions of the performance situation, the authors’ primary conclusions 5 were that stereotype threat leads to both cognitive processing inefficiencies (i.e., threatened individuals spend more time doing fewer things less accurately) and lowered performance expectations/motivation on the part of threatened individuals. Since that time, extensive efforts have been invested in specifically isolating and examining these fundamental operations of stereotype threat and their impact on performance outcomes. Arguably the most complete theoretical treatment and review of the stereotype threat literature to date was presented by Schmader, Johns, and Forbes (2008). Based on their review of the relevant literature, these authors proposed an integrated conceptual representation of the intrapersonal dynamics believed to underlie experiences of stereotype threat as well as a model describing the processes through which stereotype threat influences psychological functioning and performance. Given their richness and ambitious incorporation of the large majority of the stereotype threat literature, these models will be used as the primary conceptualization of stereotype threat in the present study and are described in greater detail below. Nature of stereotype threat. Schmader et al. (2008) state that stereotype threat stems from the activation of three intrapersonal constructs: an individual’s concept of his/her group membership, concept of the ability domain in question, and his/her self-concept (Figure 1). However, it is not the mere engagement of these concepts that encapsulates stereotype threat, but rather the propositional relations that exist and are altered among them. Semantically, propositional relations describe the evaluations and beliefs that individuals explicitly form in their attempts to validate automatic and associative appraisals of a situation (e.g., a negative reaction to a score one receives on a math test translates into the propositional relation “I am bad at math”) (cf., Gawronski & Bodenhausen, 2006). For any given context, a positive propositional relation implies that two concepts coincide with one another (e.g., My group has this ability; I 6 Figure 1. Conceptualization of stereotype threat as a cognitive imbalance triggered by person and/or situation factors (Schmader, Johns, & Forbes, 2008) am like my group; I have this ability) whereas a negative relation implies that two concepts oppose one another (e.g., My group does not have this ability; I am not like my group; I do not have this ability). On the basis of this relational framework and other similar research (Heider, 1958; Nosek, Banaji, & Greenwald, 2002), Schmader et al. (2008) posit that stereotype threat manifests from a situationally induced imbalance in the implied propositional relations among an individual’s concept of group, ability, and self that the individual is driven, yet struggles, to resolve. More specifically, stereotype threat is experienced when a negative propositional relation between one’s group membership and the ability domain is engendered which is seemingly irreconcilable with positive propositional relations between the self-and-group and the self-and-domain (e.g., My group does not have this ability, I am like my group, but I have this ability). 7 As exemplified in Figure 1, the cognitive imbalances that elicit stereotype threat arise from the simultaneous activation of situational primes across the three relational links between group, ability, and self, each of which may be further influenced by certain individual difference characteristics. In the first link between one’s group and ability, external environmental cues signal that one’s group is considered deficient in the ability domain and thus infer a negative propositional relation exists between those concepts. In studies of stereotype threat, these cues are most commonly introduced through the manipulation of negative stereotypes relevant to the situation and actors. These experimental manipulations—often the hallmark and defining criticism of stereotype threat research (e.g., Cullen, Hardison & Sackett, 2004; Cullen, Waters, & Sackett, 2006)—have been presented in many shapes and forms, including altering individual’s perceptions of the diagnosticity of a test (Kray et al., 2001; Steele & Aronson, 1995), the salience/explicitness of a negative performance stereotype (Grand, Ryan, Schmitt, & Hmurovic, 2011; Spencer et al., 1999), or the manner by which a domain task is described (Frantz, Cuddy, Burnett, Ray, & Hart, 2004; Stone et al., 1999). Additionally, individual differences such as stigma consciousness (Brown & Lee, 2005; Brown & Pinel, 2003), group-based rejection sensitivity (Mendoza-Denton, Purdie, Downey, & Davis, 2002), and stereotype knowledge/belief (Keifer & Sekaqueptewa, 2007; Schmader, Johns, & Barquissau, 2004) can facilitate the ease with which negative group-ability domain stereotypes are adopted and, by extension, the corresponding negative propositional relation activated. The second link contributing to the experience of stereotype threat is proposed to exist between a person’s self-concept and his/her membership in the stereotyped group. In this case, situational cues promote recognition of a “collective self” as a representative indicator of one’s self-concept, thereby encouraging a positive propositional relation between the self-and-group 8 that deemphasizes an individual’s unique strengths, weaknesses, and characteristics (e.g., Marx, Stapel, & Muller, 2005; Shih, Pittinsky, & Ambady, 1999). Experimental primes of the selfgroup link have ranged from pre-performance questionnaires soliciting group identity-relevant information (Ambady, Shih, Kim, & Pittinsky, 2001; McGlone & Aronson, 2006; Shih et al., 1999; Shih, Pittinsky, & Trahan, 2006; Yopyk & Prentice, 2005), to having individuals interact with out-group members prior to performance (Marx & Goff, 2005; Stone & McWhinnie, 2008), to simply asking individuals to provide their gender/race on pre-test demographics (Steele & Aronson, 1995). The results from such experimental studies have generally found performance decrements for the stereotyped group when self-group membership is made salient before rather than after performance (though this point is not without contention, see Stricker & Ward, 2004, Danaher & Crandall, 2008, and Stricker & Ward, 2008). Additionally, although certain minority groups (females, African Americans, etc.) are most often the focus of stereotype threat researchers, neither the ease of visibility nor the proportional demographic status of one’s group is a prerequisite for experiencing stereotype threat. Individuals from less readily detectable group categories—such as those based on socioeconomic status (Croizet & Claire, 1998; Harrison et al., 2006) or mental illness (Quinn, Kahng, & Crocker, 2004)—and even from groups typically considered culturally dominant or in the majority—such as men (Koenig & Eagly, 2005) or Whites (Aronson et al., 1999; Stone, 2002)—are susceptible to stereotype threat under certain circumstances. However, it is certainly the case that members from minority or low-status groups face a far higher prevalence of negative stereotypes, and thus run the risk of more regular exposure to conditions that are favorable to stereotype threat (Gonzalez, Blanton, & Williams, 2002; Shih et al., 1999). Furthermore, this linkage suggests that individuals more likely to exhibit a positive self-group 9 concept even when situational primes are ambiguous may be more susceptible to threat, a finding supported by research on the effects of group identification in threatening performance situations (Marx et al., 2005; Ployhart, Ziegert, & McFarland, 2003; Schmader, 2002). The final link in the stereotype threat imbalance is the positive propositional relation between self-and-domain such that an individual’s self-concept is associated with doing well in that context due to expectations of success or a high motivation to achieve (Schmader et al., 2008). Personal stake or investment in a domain/outcome has been advanced as a critical precondition for the elicitation of stereotype threat (Steele, 1997; Steele & Aronson, 1995; Steele & Davies, 2003); generally, the more that individuals care about a given domain or doing well in it, the more susceptible they are to threat (Aronson et al. 1999; Cadinu, Maass, Frigerio, Impagliazzo, & Latinotti, 2003; Hess, Auman, Colcombe, & Rahhal, 2003; Keller, 2007; Levy, 1996; Leyens, Désert, Croizet, & Darcis, 2000; Spencer et al., 1999; Stone et al., 1999; Wout, Danso, Jackson, & Spencer, 2008; but see Nguyen & Ryan, 2008). This positive self-ability relation has been experimentally elicited primarily by either indicating to participants that an experimental task is challenging but within the scope of their abilities or by simply selecting participants with documented success in the ability or domain (e.g., Aronson et al., 1999; Brown & Pinel, 2003; Josephs, Newman, Brown, & Beer, 2003; Schmader & Johns, 2003; Spencer et al., 1999; Steele & Aronson, 1995). For many, this link represents perhaps the most unfortunate dimension of stereotype threat as it implies that individuals who are the most highly motivated and driven to succeed are those at greatest risk of falling prey to the Sisyphean struggles engendered by the phenomenon (cf., Steele, 1997). In sum, stereotype threat encapsulates a specific concoction and relational network of situational and intrapersonal characteristics. Although alternative conceptualizations of the 10 phenomenon exist, the broader appeal and utility of Schmader et al.’s (2008) model is its emphasis on the nature of stereotype threat as an emergent situational phenomenon requiring multiple conditions be met in order for the phenomenon to occur. Though further research is needed to support the claim, a primary implication of this framework is that stereotype threat cannot (or is highly unlikely to) occur unless all portions of this cognitive imbalance are brought to bear in a situation. Additionally, this model suggests that it may not be possible (or, at minimum, plausible) to vary the “amount” of stereotype threat in a single performance instance in the same manner one might vary other characteristics such as time pressure or cognitive load; instead, the manifestation of threat is influenced by targeting the three core concepts of group, 1 self, and ability and the formation of the conflicting propositional relations among them . Thus, it is perhaps most conceptually accurate to treat stereotype threat (regardless of its application) as the confluence of a defined set of situational and individual characteristics that—when mixed or experienced together in a prescribed fashion—can lead to cognitively and affectively unsettling states/processes that are not conducive to productive psychological functioning. Precisely what those unsettling states/processes are is the topic of the following section. Outcomes of stereotype threat. To this point, the discussion has centered on the manner by which stereotype threat manifests as a cognitive imbalance. However, as with nearly all models of cognitive disruption or inconsistency (e.g., Baumeister & Vohs, 2004; Higgins, 1987; Festinger, 1957) the critical assumption of stereotype threat theory is that the discrepancy it produces prompts a state of unresolved tension within the stereotyped individual that he/she is motivated to dispel. In response, a variety of psychological resources and processes are engaged to aid the resolution effort. In and of itself, such a response is not inherently negative and serves an important homeostatic function for the individual (cf., Pressing, 1999); but, when one 11 considers that that “brainpower” could otherwise be directed towards the demands imposed by the task domain and, more importantly, is disproportionately experienced by only certain group members, this imbalance and the subsequent chain of events it sets off is substantially more problematic. Based on their review of the literature, Schmader et al. (2008) developed an integrated model of the process by which stereotype threat is believed to influence performance achievement (Figure 2). The authors devote a great deal of effort to precisely explicating and defending the empirical rationale/support for their framework, much of which is well beyond the scope of the present discussion. Consequently, only the major relations and their relevance to the present study will be highlighted. To begin, Schmader et al. (2008) suggest that three interconnected responses are aroused in reaction to the cognitive imbalance and subsequent ruminations/appraisals generated by stereotype threat: a heightened state of physiological stress/anxiety (e.g., Murphy, Steele, & Gross, 2007; Blascovich, Spencer, Quinn, & Steele, 2001; Croizet, Després, Gauzins, Huguet, Leyens, & Méot, 2004); hyper-vigilant monitoring of both perceived personal performance and feedback/cues that indicate one is threatened or is being influenced by the negative stereotype (e.g., Beilock, Rydell, & McConnell, 2007; Ben-Zeev, Fein, & Inzlicht, 2005; Forbes, Seibt & Förster, 2004; Schmader, & Allen, 2008; Johns, Inzlicht, & Schmader, 2007); and thought suppression processes directed towards regulating negative cognitions and affect (e.g., Johns et al., 2007; von Hippel et al., 2005; Wraga, Helt, Jacobs, & Sullivan, 2007). In turn, each of these factors is believed to consume resources from an individual’s working memory (cf., Beilock, Jellison, Rydell, McConnell, & Carr, 2006; Matheson & Cole, 2004; Muraven & Baumeister, 2000; Smith & Henry, 1996; Wenzlaff & Wegner, 2000). Simply described, working memory can be conceptualized as the limited capacity “cognitive 12 Figure 2. Integrated process model of stereotype threat effects on performance (Schmader, Johns, & Forbes, 2008) workspace” that individuals employ to coordinate the storage and controlled attention of immediately relevant thoughts, operations, and information (Baddeley, 1986, 1997; Baddely & Hitch, 1974; Engle, 2002; Kane, Conway, Hambrick, & Engle, 2007). Consequently, the siphoning and disruption of working memory resources by the experience of stereotype threat reduces the efficiency and capability with which one can effectively manage and complete cognitively demanding tasks (Beilock et al., 2006; Beilock et al., 2007; Schmader, 2010; Schmader, Forbes, Zhang, & Berry Mendes, 2009; Schmader & Johns, 2003), thereby inhibiting an individual from reaching their performance potential. As is apparent from Figure 2, working memory is posited to be the most proximal mechanism through which experiences of stereotype threat impede functioning on cognitivebased tasks and thus warrants closer examination. In their seminal manuscript on the topic of working memory, Baddeley and Hitch (1974) summarized preliminary evidence for the existence 13 of a cognitive processing system that operated similarly to, yet distinctive from, previous conceptualizations of short-term memory. Early treatments of these authors’ working memory hypothesis proposed a tripartite system composed of a superordinate central executive responsible for controlled processing and attention among two “slave systems,” the phonological loop and the visuospatial sketchpad, which coordinated the temporary storage (~1-2 seconds) and manipulation/rehearsal of auditory/speech-based and visuospatial information, respectively (Baddeley, 1986, 1992). Baddeley (2000) later amended this framework by incorporating a fourth system termed the episodic buffer that assumes some of the controlled processing functions from the executive control. Specifically, the episodic buffer is characterized as an interface for integrating information from the phonological loop and visuospatial sketchpad with information from long-term memory to form brief “episodic memories” over short periods of time, leaving the executive control primarily responsible for directing attentional efforts (Baddeley, 2001). Although Baddeley’s model (1986, 2000) is widely cited as the preeminent conceptualization of working memory, research spearheaded by Engle, Kane and colleagues (e.g., Engle, 2002; Engle, Tuholski, Laughlin, & Conway, 1999; Kane & Engle, 2003; Kane et al., 2007; Kane, Hambrick, Tuholski, Wilhem, Payne, & Engle, 2004) has adopted a slightly altered perspective on the functioning of this cognitive system that integrates well with stereotype threat theory. The primary point of emphasis these researchers advance attempts to more discretely distinguish “storage” functions of working memory from its role as a domain-general executive attention process. As Engle (2002) relates: The term capacity, as used in discussions of short-term memory (STM), often conjures up images of a limited number of items or chunks that can be stored (e.g., 7 ± 2). However, my sense is that [working memory] WM capacity is not about individual differences in how many items can be stored per se but about differences in the ability to control 14 attention to maintain information in an active, quickly retrievable state. Thus, WM capacity is just as important in retention of a single representation, such as the representation of a goal or of the status of a changing variable, as it is in determining how many representations can be maintained. WM capacity is not directly about memory—it is about using attention to maintain or suppress information. WM capacity is about memory only indirectly. Greater WM capacity does mean that more items can be maintained as active, but this is a result of greater ability to control attention, not a larger memory store. Thus, greater WM capacity also means greater ability to use attention to avoid distraction. (p. 20) Additionally, this interpretation further implies that short-term memory is essentially a subset of working memory, with short-term memory performing functions akin to the phonological loop and visuospatial sketchpad and working memory coordinating controlled attention (Engle et al., 1999). Although still largely consistent with Baddeley’s framework (1986, 2000), empirical investigations based on this perspective seem to support the notion that short-term memory processes are domain-specific (that is, the chunking, rehearsing, coding, storing, etc. of information is specific to a particular domain) whereas the executive control of working memory is agnostic and functions more consistently across multiple domains in maintaining particular memory representations in active and easily accessible states (Engle & Kane, 2004; Kane & Engle, 2003; Kane et al., 2004). Given the above depiction, it is possible to more precisely explicate the manner by which stereotype threat exerts its influence on meaningful outcomes of interest by considering its direct 2 impact on working memory . Working memory capacity has been implicated in a large variety of intellective activities, including reading comprehension (Daneman & Merikle, 1996), problem-solving based on listening comprehension (Adams & Hitch, 1997; Carpenter, Just, & Shell, 1990), advanced reasoning (Kyllonen & Christal, 1990), strategy adaptation (Schunn & Reder, 2001), multitasking (König, Bühner, & Mürling, 2005), Stroop color naming (Kane & Engle, 2003), and visuospatial reasoning (Kane et al., 2004), among others (see Feldman Barrett, 15 Tugade, & Engle, 2004, for further review). Additionally, many researchers consider working memory the primary processing component underlying general fluid intelligence (Cattell, 1943), or the ability to reason logically, solve novel problems, and adapt to new circumstances (Conway, Cowan, Bunting, Therriault, & Minkoff, 2002; Engle et al., 1999; Jaeggi, Buschkuehl, Jonides, & Perrig, 2008; Kyllonen, 1996; Kyllonen & Christal, 1990). Common to all such heavily working memory-centric and fluid intelligence tasks is that they necessitate focused direction, effortful processing, and dynamic self-regulation of one’s controlled attention in order to successfully complete—characteristics which leave them particularly susceptible to the added demands of stereotype threat. As described previously, the cognitive imbalance imparted through the presence of a negative domain-relevant stereotype elicits increased stress from threatened individuals and spurs them to adopt (consciously or not) added monitoring processes to gauge the extent to which their performance or actions are confirming the stereotype (e.g., Grand et al., 2011; Ployhart et al., 2003; Rydell, Rydell, & Boucher, 2010). These responses are believed to engender further internal appraisal processes through which the individual attempts to reconcile the contradictory state and that tend to elicit increased focus/attention towards negative thoughts and emotions (e.g., Forbes et al., 2007), leading to subsequent cognitive effort to subdue those responses. All the while, these added, task-irrelevant attentional demands are indiscriminately filtered through the working memory system. Working memory capacity facilitates functioning on intellective activities by enabling individuals to both maintain one’s attention on task-relevant facets of a situation and suppress interference from non-relevant components (Rosen & Engle, 1998). However, working memory has limits on its capacity (e.g., Engle, 2002; Kane et al., 2004), and such controlling and 16 suppressing functions are cognitively demanding. Thus to the extent that stereotype threat engenders more task-irrelevant foci to suppress, less of one’s limited working memory capacity can be allocated to rehearsing, integrating, and manipulating information relevant to achieving goal-directed objectives. In short, the primary harm introduced by stereotype threat is the contribution of irrelevant affective and cognitive stimuli that unnecessarily hijack an individual’s working memory resources, thereby leaving fewer cognitive resources to devote to task demands (Schmader et al., 2003; Schmader et al., 2008). This proposition is also generally supported by the observed pattern of results concerning stereotype threat effects. For example, there is evidence to support the claim that domain activities must be both difficult and relatively complex for stereotype threat effects to manifest (e.g., Nguyen & Ryan, 2008; Quinn & Spencer, 2001; Steele & Aronson, 1995). Undertaking tasks that are simple and/or well-rehearsed seldom leads to underachievement by threatened individuals (e.g., Beilock et al., 2007; O’Brien & Crandall, 2003). From an attentional resource allocation perspective, participating in difficult and complex tasks presumably pushes the bounds of one’s working memory to its limits. When conditions conducive to stereotype threat are then made salient in such situations, the added cognitive demands simply overwhelm the capacity of stereotype threatened individuals and prevent them from focusing on required task demands. Note that this rationale does not imply that the underachievement of stereotype threatened individuals necessarily results from reduced effort or motivation to succeed on their part. In fact, and as would be predicted based on the working memory models detailed above, targeted individuals have been shown to exert equal amounts of (or, in many cases, more) effort and persist in those efforts longer than non-threatened individuals (Forbes et al., 2007; Jamieson & Harkins, 2007; Kray, Thompson, & Galinsky, 2001; O’Brien & Crandall, 2003; Rydell, Shiffrin, 17 et al., 2010). However, this effort is inefficiently oriented towards managing task-irrelevant demands that are not present for non-threatened individuals, leading to quicker mental fatigue and exhausting more time and mental energy on things that do not contribute to task accomplishment—essentially forcing these individuals to do more while achieving less (cf., Grier et al., 2003; Steele & Aronson, 1995). Criticisms of stereotype threat. It is pertinent to briefly address two broad concerns commonly levied against stereotype threat theory that also bear on the proposed research. First, some researchers have advocated that stereotype threat is more appropriately considered through the lens of motivational states and goal-orientation primes rather than the working memory cognitive models described herein (e.g., Marx & Stapel, 2006a, 2006b; Wheeler & Petty, 2001). For example, Grimm, Markman, Maddox, and Baldwin (2009) suggest that stereotype threat can be interpreted as a mismatch in regulatory focus driven by differences in the reward structure of the task environment and the prevention/failure-avoidance states that are primed by the introduction of negative stereotypes (i.e., tasks in which “doing something better” implies greater success do not reward avoiding failure, Canidu et al., 2005; Seibt & Förster, 2004). In their study, Grimm et al. (2009) present convincing evidence that one can eliminate the performance discrepancies caused by stereotype threat for women on mathematics tests by simply changing the performance criteria of the task and instructing participants that their goal on a performance task is to avoid losing a certain number of points (rather than gaining a certain number of points)—a reward structure presumably better aligned with the prevention/avoidance state held by negatively stereotyped women which should thus facilitate higher achievement motivation. However, the conceptual rationale underlying such effects does not necessarily contradict, supersede, or invalidate the propositions of the cognitive imbalance or working memory models. 18 Instead, they largely complement one another. Grimm et al.’s (2009) findings are consistent with the notion that in the face of intrapersonal conflict about their and their’ group’s abilities, stereotype threatened individuals engage in more effortful situational monitoring that taxes working memory resources (e.g., Beilock & Carr, 2005; Beilock et al., 2006; Beilock et al., 2007; Schmader et al., 2008) and that alterations to the structure of the environment can make this effortful process more or less beneficial to performance. For example, Forbes et al. (2007) and Jamieson and Harkins (2007) report that threatened individuals are typically more motivated to correct errors in performance contexts, a strategy that facilitates equal or better achievement compared to non-threatened individuals in situations where it is functional. However, when the performance context is less conducive to these strategies as a result of greater working memory demands (i.e., more difficult task, stricter time limit, less space for error correction), threatened individuals operating under such heightened motivational states tend to underperform (cf., Harkins, 2006; Nguyen & Ryan, 2008). Such findings are supportive of the notion that threatened versus non-threatened individuals demonstrate differences in the processes/manner by which their performance outcomes are achieved (e.g., Beilock & Carr, 2005; Rydell, Shiffrin, et al., 2010). In short, the motivational approaches to stereotype threat offer valuable insights into the mechanisms of the phenomenon and are likely to supplement its theoretical underpinnings, but they do not invalidate cognitive/working memory accounts. The final battery of criticisms typically voiced against stereotype threat theory bear less on the theoretical end of the quotient and more heavily on its application. For more than a decade, organizational researchers have noted a variety of difficulties with detecting the effects of stereotype threat in “real-world” assessment and performance situations, implying that its feasibility as a phenomenon of import may thus be limited (Cullen et al., 2004; Cullen et al., 19 2006; Good, Aronson, & Inzlicht, 2003; Sackett, Hardison, & Cullen, 2004; Sackett & Ryan, 2012; Sackett, Schmitt, Ellingson & Kabin, 2001; Schmidt, 2002; Stricker & Ward, 2004). In large part, these concerns can be summarized as follows: 1. The manner by which stereotype threat is introduced into a situation is too unrealistic and/or unethical and therefore the likelihood it would be elicited in an evaluative performance situation is virtually nonexistent. 2. Analyses of large datasets in which between-group performance differences attributable to stereotype threat might be expected have not supported the theory’s predictions. 3. For those who acknowledge that stereotype threat may be a legitimate concern in high stakes performance situations, the stereotype threat removal strategies offered by researchers as solutions to the issue (e.g., minimizing the diagnosticity/evaluative purposes of a test, priming alternative identities, etc.) are too impractical to be implemented. While deconstructing and addressing each of these concerns is well beyond the means of this paper (cf., Steele & Davies, 2003; Stricker & Ward, 2008), they do serve to highlight certain key points relevant to the present study. First, as noted previously, stereotype threat is the characterization of an emergent cognitive imbalance predicated on the activation of specific relations among a small set of situational and intrapersonal characteristics. There can be little argument that these restrictions dictate strict boundary conditions for when one might expect to observe stereotype threat effects and when it would be appropriate to conclude that stereotype threat is operating. However, both the priming of negative stereotype cues (e.g., Nguyen, O’Neal, & Ryan, 2003, etc.) and attempts to minimize/reverse stereotype threat effects (e.g., Grimm et al., 20 2009; Stricker & Ward, 2004) may occur through many subtly different paths, not all of which have been fully tested or documented. Furthermore, even if the base rate of stereotype threat occurrences are minimal in evaluative work or academic performance contexts, there are still other applications where stereotype threat effects may more readily manifest that could potentially influence important outcomes that have not yet been adequately examined—such as training or learning contexts. Second, all performance is not created equal. Although it is common to presume that similar between-group performance scores indicate that group members operate in functionally equivalent manners, the process by which performance outcomes are generated is a crucial consideration. For example, stereotype threat researchers have not typically drawn distinctions between maximal (i.e., assessments aimed at evaluating individual’s highest predicted achievement) and typical (i.e., assessments aimed at evaluating individual’s day-today/sustainable achievement) performance contexts when evaluating threat effects. However, to the extent that how one performs a task is as important as what one is capable of performing, stereotype threat may influence behavior/achievement in manners not easily observable in contexts traditionally examined in stereotype threat studies (e.g., testing/assessment, selection). In sum, the characterization of stereotype threat as a cognitive, behavioral, and affective experience stemming from an emergent and dynamic process sensitive to the situational characteristics in which an individual operates implies that its effects may extend beyond the organizational functions traditionally examined. The patterns of interference that characterize the expression of stereotype threat are generally undesirable features which could seemingly arise and influence outcomes in areas other than testing and performance assessment. It is this very possibility that serves as the impetus for the present research, and to which focus is now directed. 21 Stereotype Threat at Learning: Rationale, Applications, and Implications As implied by Figure 1 and Schmader et al. (2008), the experience of stereotype threat can theoretically emerge in any instance where a negative domain-relevant stereotype exists capable of producing dissonance amongst a person’s concept of self, group, and ability that he/she is motivated to overcome. Although perhaps self-evident, the stereotypes which could trigger this imbalance (e.g., “Women struggle with mathematics,” “Whites are not as naturally athletic as Blacks,” “Lower income individuals are not very intelligent,” etc.) do not simply appear during performance episodes and then vanish; such stereotypes are persistent features of a domain and can influence virtually any related functional pursuit within its purview (e.g., Stangor, 2000; Stangor & Lange, 1994). By this rationale then, stereotype threat may manifest in numerous occasions and across myriad circumstances. Among the many potential applications of the theory though, the implication of working memory efficiency as the primary gateway through which stereotype threat exerts influence suggests that learning efforts may be particularly sensitive to threat effects. Conceptual background for stereotype threat at learning. Learning is often generically defined as a relatively permanent change in knowledge, skill, or behavior brought about through experience (e.g., Weiss, 1990; Wexley & Latham, 1991). This conceptualization, however, somewhat oversimplifies the nuanced and multifaceted nature of what learning “means;” for example, at different points in the history of psychology, learning has been approached from perspectives of classical and operant conditioning, observational and social learning/modeling, rote memorization of letter/digit strings, insight learning, and latent learning of complex procedures (Kosslyn & Rossberg, 2004). In an effort to better orient the diversity of such learning investigations, a number of researchers suggest that it is desirable to focus on the 22 outcomes of the learning process that are of interest to the research question/domain as a means of better aligning empirical efforts and to capture the full spectrum of learning-relevant behaviors (cf., Gagne, 1984; Glaser, 1990; Messick, 1984). In the spirit of this recommendation, Kraiger et al. (1993) offer a relatively simple yet comprehensive categorization system useful for describing the different forms and specifications of learning outcomes one could assess during learning or training experiences (Figure 3). These authors’ classification scheme describes three broad classes of learning outcomes: cognitive (procurement and synthesis of information/knowledge), skill-based (development of procedural/behavioral routines or understanding), and affective (attitudinal, motivational, or dispositional changes). Each of these three learning outcomes is also further divided into associated categories that characterize the processes, constructs, or targets associated with that outcome. Kraiger et al.’s (1993) depiction of learning outcomes serves as a useful guiding framework for examining the potential impact of stereotype threat at learning. As noted previously, there is strong evidence to suggest that stereotype threat interferes with an individual’s cognitive functioning primarily by forcing him/her to allocate limited attentional resources from working memory towards suppressing task-irrelevant responses that stem from the presence of a negative domain stereotype (e.g., Beilock et al., 2006; Beilock et al., 2007; Schmader, 2010; Schmader et al., 2009; Schmader & Johns, 2003). To the extent that working memory capacity/processes are related to relevant cognitive, skill-based, and/or affective learning outcomes, there would be sufficient reason to believe that the generalizable experience of stereotype threat could negatively influence learning efforts—and indeed, there is evidence supporting the relation between working memory and components of all three of these learning outcomes. For example, individuals with lower working memory capacity appear to be less 23 Learning Outcomes Cognitive Acquisition, organization, and application of knowledge Skill-based Development of technical, procedural, or motor skills Affective Changes in attitude, motivations, goals, and/or values Categories Categories Categories  Declarative knowledge  Knowledge organization  Compilation (Proceduralization & composition)  Automaticity  Cognitive strategies  Attitudinal  Motivational (Disposition, selfefficacy, goal setting) Figure 3. Classification system for learning outcomes (adapted from Kraiger, Ford, & Salas, 1993) proficient at maintaining and adhering to appropriate task goals (i.e., affective outcomes) in performance situations with greater task interference than individuals with higher working memory (Kane, Bleckley, Conway, & Engle, 2001; Kane & Engle, 2003; Rosen & Engle, 1997; see also Daily, Lovett, & Reder, 2001). Similarly, there is a large body of research demonstrating that working memory contributes heavily to both synthesizing declarative facts, concepts, and information into longer-term memory stores (i.e., cognitive outcomes) as well as integrating those pieces into procedural relations (i.e., skill-based outcomes) (e.g., Baddeley, 2001; Budd, 24 Whitney, & Turley, 1995; Cantor & Engle, 1993; Just & Carpenter, 1992; Rosen & Engle, 1997; see also Feldman Barrett et al., 2004, for a review). In the domain of reading comprehension, for instance, Whitney, Ritchie, and Clark (1991) found that persons with less available working memory tend to draw more surface-level (versus deeper) interpretations of written text and do so much earlier while reading than individuals with higher working memory. Presumably this occurs because the former’s reduced working memory capacity does not enable them to maintain enough information in an active, readily accessible state for a long enough time to draw more comprehensive inferences (e.g., Engle, 2002). This form of reading comprehension, however, has long been demonstrated to be an ineffective and inefficient means of interpreting written information. Research indicates that readers who employ more thematic or structural approaches to reading (i.e., attempt to deduce major themes and inferences from text) are better able to recall and make use of more information from narrative passages than readers who process that same text serially (i.e., sentence-by-sentence) (e.g., Loman & Mayer, 1983; Marshall & Glock, 1979; Meyer, Brandt, & Bluth, 1980; Meyer & Rice, 1982; Reder & Anderson, 1980). Thus, a primary implication of these research streams is that individuals with less available working memory—whether as a result of individual differences in capacity or “artificial” reductions through situational primes (e.g., Beilock & Carr, 2005)—may be at a much larger disadvantage when it comes to achieving cognitive, skill-based, and/or affective learning outcomes in complex, self-directed learning environments (e.g., students taking online classes, workers learning complicated procedures from an instructional manual, decision-makers attempting to synthesize technical reports, etc.). Although there is reasonable evidence to postulate that stereotype threat is detrimental to all three of the learning outcomes shown in Figure 3, the present study focuses on the attainment 25 of cognitive learning outcomes as the primary domain of interest for three reasons. First, the centrality of working memory processes to the acquisition of declarative and procedural knowledge is a hallmark of nearly all models of higher-order cognition (e.g., Anderson et al., 2004; Baddeley, 2000; Just & Carpenter, 1992; Kieras & Meyer, 1997; Newell, 1990); thus, deficiencies in related cognitive learning outcomes represent the most likely victim of any decrement in working memory generated by stereotype threat. Second, despite its broader range of applicability, stereotype threat theory has gained its greatest traction in the area of cognitive ability and intelligence testing (cf., Nguyen & Ryan, 2008). While there is some debate regarding the magnitude and empirical relations among measures of working memory and cognitive ability (see Ackerman, Beier, & Boyle, 2005, and responses from Kane, Hambrick, & Conway, 2005, and Oberauer, Schulze, Wilhelm, & Süß, 2005) there is general agreement that the concepts are highly interrelated (Beier, & Ackerman, 2005; Carroll, 1993; Engle, 2002; Jensen, 1998; Kyllonen & Christal, 1990; Turner & Engle, 1989). The generalization of stereotype threat from cognitive performance outcomes to cognitive learning outcomes, therefore, represents a logical extension of the theory—and one which also carries implications for disentangling how stereotype threat’s effects at learning may influence stereotype threat at performance. Lastly, fluid intelligence/working memory is often considered among the most important factors in the success of professional/educational learning experiences, especially in situations that are complex or demanding (Deary, Strand, Smith, & Fernandes, 2007; Gottfredson, 1997; Jaeggi et al., 2008; Neisser et al., 1996; Rohde & Thomspon, 2007; te Nijenhuis, van Vianen, & van der Flier, 2007). As such, person or situation factors which reduce working memory and influence fluid intelligence are likely to have a substantial impact on the acquisition and expression of domain-relevant information in the learning environment. 26 Within the subset of cognitive learning outcomes, Kraiger et al. (1993) describe three categories of learning constructs which could serve as potential targets for examining effects of stereotype threat at learning. The first, declarative knowledge, reflects the encoding of data “chunks” comprised of context-free facts, statements, or assertions—or, as it is colloquially referenced, information about “what” (i.e., “3 + 4 = 7,” “Lincoln’s Gettysburg Address was delivered in 1863,” etc., Anderson, 1996; Miller, 1956; Simon, 1974). Learning outcomes of this type are widely regarded as foundational to higher-order cognitive skill development, and it is a generally accepted dictum that the acquisition of basic declarative knowledge is a necessary condition for the development of more sophisticated procedural knowledge (e.g., Ackerman, 1986, 1987; Anderson, 1982, 1993a, 1996). The second group of cognitive learning outcomes refers to knowledge organization, or the manner by which individuals represent relations/associations among concepts, facts, functions, and other knowledge objects relevant to a given task domain (e.g., Glaser, 1990; Jonassen, Beissner, & Yacci, 1993; Rowe, Cook, Hall, & Halgren, 1996; Schoenfeld & Herrmann, 1982). Synonymous with mental models, cognitive maps, schema, or conceptual frameworks (Dorsey, Campbell, Foster, & Miles, 1999; Schuelke et al., 2009), these knowledge structures serve as contextual organizers for the interpretation and acquisition of new knowledge as well as influence one’s ability to make use of existing knowledge to accomplish task requirements (Ausubel, 1963; Day, Winfred, & Gettman, 2001; Kozlowski, Gully, et al., 2001; Messick, 1984; Medin et al., 2006). For this reason, some researchers suggest that the organization of knowledge structures may be of equal or greater importance than the amount or type of knowledge one possesses (Johnson-Laird, 1983; Kraiger et al., 1993; Rouse & Morris, 1986). 27 The final cognitive learning outcome concerns the development of cognitive strategies, indicative of the internalized procedures and mental activities that individuals use to facilitate the synthesis of knowledge and its application to a given task space (Kraiger et al., 1993; Prawat, 1989). The achievement of effective cognitive strategies signals that individuals have developed a deeper understanding of the relationship between their capabilities and the demands of the task environment, as well as metacognitive awareness of their thought processes and causal attributions relevant to task completion (Bereiter & Scardamalia, 1985; Kanfer & Ackerman, 1989; Pressley, Snyder, Levin, Murray, & Ghatala, 1987). In their framework, Kraiger et al. (1993) posit a sequential progression through these three cognitive learning outcomes as learners advance from early to later stages of knowledge acquisition. Individuals generally work to acquire basic declarative knowledge first, organize that knowledge into meaningful structures/mental models that provide context for drawing interpretations among relevant information, and then use that understanding to develop procedural approaches for accomplishing specific task goals. A final important consideration for examining stereotype threat effects at learning is explicating the manner by which the learning environment is organized as such structural aspects can hold significant influence over the attainment of desired learning outcomes (Bell & Kozlowski, 2008; Iran-Nejad, 1990; Schwartz & Bransford, 1998). The present study focuses on the effects of stereotype threat during episodes of exploratory learning. Exploratory learning (sometimes referred to as discovery learning) is a form of active learning in which individuals are encouraged to experiment with task content in order to infer the principles, rules, and mechanisms of a given operational domain (Frese et al., 1988; Kamouri, Kamouri, & Smith, 1986; McDaniel & Schlader, 1990). As opposed to more traditional passive learning approaches 28 (e.g., lectures, proceduralized instruction, etc.), exploratory learning provides near complete control over the instructional environment to the learner, who bears the brunt of the responsibility for making learning decisions (e.g., choosing what content to learn and when to learn it, monitoring learning progress and adjusting strategies as necessary, etc.). This requirement promotes an inductive learning frame that necessitates engaging in more effortful metacognitive activity on the part of the learner, a critical component in the acquisition and transfer of adaptive expertise and complex skills/knowledge (Bell & Kozlowski, 2002, 2008; Ford & Kraiger, 1995; Ford, Smith, Weissbein, Gully, & Salas, 1998; Frese et al., 1988; Ivancic & Hesketh, 2000). Researchers have noted that learners can easily be overwhelmed by purely exploratory approaches and may fail to ever come into contact with the instructional material (Debowski, Wood, & Bandura, 2001; Mayer, 2004); however, the provision of even minimal guidance can often be enough to stimulate the sense-making and metacognitive efforts essential to knowledge acquisition without undermining the self-directed efforts of learners (Bell & Kozlowski, 2008; Kozlowski, Toney, et al., 2001). As a result, such guided “constructivist” approaches to learning and training in which learners are intimately involved in the comprehension and organization of domain/tasks concepts, principles, strategies, etc. have become increasingly popular instructional methods in both professional (e.g., de Freitas & Neumann, 20009; Grand & Kozlowski, in press; Rieman, 1996) and educational (Marshall, 1996; Phillips, 1998; Steffe & Gale, 1995) domains. Given the demands placed on learners in such approaches though, exploratory learning environments may be particularly susceptible to stereotype threat effects. Many aspects of working memory have been implicated in the metacognitive functions which are central to successfully navigating exploratory learning paradigms (i.e., selective attention, error detection, inhibitory control, 29 Fernandez-Duque, Baird, & Posner, 2000; Shimamura, 2000). To the extent that stereotype threat disrupts available working memory capacity by stimulating task-irrelevant thoughts and emotions (Schmader et al., 2008), exploratory learning paradigms may represent one of the more probable instances in which threat-based learning decrements are likely to be experienced. Based on its relative popularity, desirable learning outcomes, and cognitively demanding nature, this particular learning environment thus marks a reasonable point of departure for examining stereotype threat at learning with both empirical and practical implications. Examinations of stereotype threat at learning. Rydell and colleagues (Rydell, Rydell, & Boucher, 2010; Rydell, Shiffrin, et al., 2010) have recently published a series of findings which mark the first documented effects of stereotype threat at learning. Table 1 presents a summary of these experiments outlining the basic rationale, methodological aspects, and results 3 for each study . Additionally an attempt was made to map the learning outcomes assessed in each of the studies back to the classification scheme of Kraiger et al. (1993) to illustrate the scope of their research findings. Taken together, these investigations provide an ambitious point of entry into the research domain. The accumulated evidence they present illustrates an important initial application of stereotype threat theory to outcomes specific to the learning process and furthermore offers a useful means for assessing a number of methodological and conceptual considerations for continued work in the area. Additionally, they reveal a number of theoretical and methodological concerns within this research stream in need of further refinement. These issues bear direct relevance to the development of the current study and help elucidate its proposed rationale and added contribution; thus, these points of interest are highlighted below. Although their efforts mark only an initial foray into the research domain, the manner by which learning was conceptualized, implemented, and operationalized in both of the Rydell 30 Table 1 Summary of Rydell and Colleagues’ Multi-study Experiments Examining Stereotype Threat Effects at Learning Rydell, Rydell, & Boucher (2010) Study 1 Description Procedure/ task Study 2 Study 3 Examined whether ST influences women’s ability to learn and perform novel mathematical rules. Examined whether ST influences women’s ability to learn rules from a novel math task (MA). Previous research has shown that ST has no influence on women when solving easy MA items. However, ST may influence the learning of MA rules, which should inhibit their ability to solve even easy MA problems. Examined whether ST influences women’s ability to learn abstract logic task that utilized math principles. Also examined whether ST at initial learning inhibits transfer of learning to novel domain and implicit learning. 2 (ST: control, ST) x 2 (introduction of ST: before learning, after learning) factorial design. Self-paced procedural tutorials were presented to all participants, with presence & placement of instructional manipulation depending on experimental condition. Participants received ST or control instructions prior to initial learning. Focal & transfer learning tasks presented list of logic rules to learn (e.g., circle plus diamond equals flag, etc.); transfer task used new symbols, but followed same logic rules. An additional implicit learning task asked participants to indicate if a target symbol had been presented in the focal task after being primed with a new or old stimulus; individuals who learned focal task stimuli should take longer to respond to old-old than new-old pairings. Participants received tutorial on how to solve math problems based on 8 novel mathematical rules presented one at a time. After first 4 rules were presented, additional instructions were presented that introduced ST for half of participants. The remaining 4 rules were then presented in the same manner as before. 31 Table 1 (cont’d) Rydell, Rydell, & Boucher (2010) Study 1 Study 2 Study 3 Number of questions correctly answered on test applying logic rules (focal and transfer task); response time to identify target symbol (implicit learning task) Number of rules recalled correctly Subjective ratings of participants’ written description of the steps for solving MA problems Performance measure Number of math problems requiring the novel learned rules answered correctly Number of easy, moderate, and difficult MA items answered correctly Main results No difference in pre-instruction rule learning for ST vs. control women, but ST women recalled fewer postinstruction rules. Performance worse on problems with post-, but not pre-, instruction rules for ST women. ST-before learning women provided less accurate descriptions and answered fewer easy problems than control; no difference on description ratings or performance for ST-after learning vs. control women. ST women scored lower on focal and transfer task than control, & the effect of ST on transfer test was greater than on focal test. Response times longer for control vs. ST women on implicit learning task Learning outcome Cognitive (Declarative knowledge) Cognitive (Declarative knowledge) Cognitive (Declarative knowledge) Learning measure Note. ST = stereotype threat; MA = modular arithmetic (see Beilock et al., 2007). 32 N/A Table 1 (cont’d) Rydell, Shiffrin, et al. (2010) Study 4 Study 5 Description Examined whether ST influences women’s ability to learn more efficient, automatized, and less effortful processing strategies for † completing a visual search task. Examined effects of ST introduced later in learning (control+ST) and effects of ST removed later (ST+release) on women’s ability to learn visual search task. Procedure/ task Participants randomly assigned to receive ST vs. control instructions at beginning and at start of each trial block (6 blocks of 80 trials). Visual search task involved identifying whether one of five target Chinese characters was present or absent amongst either two or four displayed Chinese characters. Participants randomly assigned to control+ST, ST+release, and control conditions and completed 8 blocks of 80 comparison trials. After block 6, ST was introduced to the control+ST group and a self-affirmation manipulation meant to reduce ST was introduced to the ST+release group. 33 Study 6 Examined alternative indicator of ST’s ability to influence women’s ability to learn automatic visual search strategy by examining whether performance on an unrelated visual search task would be interfered with by presence of a familiar target stimuli. Same learning task and design as Study 4. Following learning trials, participants presented with new visual search task where goal was to identify which of two same-color patches was more saturated. Superimposed on each patch was either a new Chinese character or one of the target characters from learning. Table 1 (cont’d) Rydell, Shiffrin, et al. (2010) Study 4 Learning measure Study 5 Response times (T) collected and deconstructed into separate measures of learning and performance. Measures were derived from T based on algorithmic model of serial self-terminating search in visual information processing (i.e., target items compared to display items one at a time successively until target is found or all display items are used). Learning = comparison time per character (C). Reduction in C over time implies individuals are learning/automating more efficient processing strategies for visual search. Performance measure Performance = base time of responses other than those used to carry out visual search comparisons (B). Reduction in B over time implies improved performance for components of responding unrelated to learning (i.e., perception time, motor-response time, etc.). Study 6 Response time required to identify saturated color swatch; longer time when a familiar vs. new Chinese character presented implies individuals had learned target characters, which interfered with color saturation task N/A Main results C decreased (indicating learning) for control women while remaining relatively constant (no learning) for ST women across blocks. B decreased for ST women (indicating performance improvement on processing components not related to visual search). C decreased for women in control+ST until block 6, after which it increased; C remained constant for ST+release group for all blocks. B increased for women in control+ST group until block 6, after which it decreased; B decreased for women in ST+release group for all blocks. Control women took longer to select correct color patch than ST women when target Chinese character was presented; suggests automatic processing of target characters was learned by control women and was interfering with new visual search task but not ST women Learning outcome Skill-based (Compilation & Automaticity) Skill-based (Compilation & Automaticity) Skill-based (Compilation & Automaticity) † ST was elicited by instructing participants that the visual search task was diagnostic of why women tend to underperform on math tests 34 publications is somewhat questionable. With respect to the former two concerns, relatively little theoretical or methodological attention was directed towards the design, delivery, and/or development of the participant’s learning environment—a key determinant in the effectiveness of individuals’ learning experiences (e.g., Kozlowski, Toney, et al., 2001). For example, the instructional delivery mechanism through which participants were expected to learn the novel mathematical principles/rules in Studies 1 and 2 were presented as short, text-based tutorials in which participants were passively exposed to the material. While there is nothing inherently wrong with empirically examining stereotype threat effects in such a learning system, no mention was made of the implications this environment held for the expected success of learning (regardless of threat) in this context nor whether the sparse training delivery system may have been more or less susceptible to threat effects than other more feature-rich—but potentially more cognitively demanding—systems (i.e., provision of feedback, active participation, advanced organizers, etc., van Merriënboer & Sweller, 2005). Furthermore, the experience of learning is typically viewed as a dynamic, iterative process that develops over time, exposure, and rehearsal (e.g., Anderson et al., 2004; Goldstein & Ford, 2002). While Studies 4-6 integrated a longitudinal component in their design, the presentation of novel material in a single brief episode in Studies 1-3 may not have been a rich enough context for individuals to learn the material. For example, the average number of declarative math rules correctly recalled in Study 1 was only .83 and .74 out of 8 for women in the control and stereotype threat conditions, respectively, indicating that even those participants purportedly not influenced by stereotype threat had only learned approximately 10% of the presented material. At best then, these results suggest that there may be very minor (though statistically significant) differences in immediate “learning” as a result of stereotype threat, but 35 this single exposure learning environment limits the extent to which inferences can be drawn about threat-based interferences on the overall process of cognitive knowledge acquisition. Admittedly, the influence of threat relative to the design and characteristics of the learning environment was not a primary or even secondary focus of the Rydell studies. Thus, acknowledgement and consideration of the manner by which a learner’s engagement with the desired content material could impact the attainment of desired learning outcomes is a needed next step within this research stream. Rydell and colleagues are quick to note that the operationalization of learning and the manner by which one assesses that criterion with respect to stereotype threat is a crucial matter for research in the area. On multiple occasions, the authors note that a primary reason why stereotype threat research has not extended into the learning domain is because “learning is difficult to distinguish from performance” (Rydell, Rydell, & Boucher, 2010, p. 885). This sentiment suggests that a more clearly defined conceptual framework of the primary dependent variable would be of great benefit. Without such a theoretical foundation, issues related to construct validity become a significant concern in the exploration of stereotype threat effects during learning. For instance, the Rydell studies employ both judgments of participants’ procedural descriptions (Study 2) and response times (Studies 4-5) as indicators of learning— both of which are fairly subjective and can be easily confounded by external characteristics (e.g., verbal/writing skills of participants in Study 2, accuracy of computational model for describing visual search task in Studies 4-5, etc.) that do not accurately reflect underlying changes in the learning. Additionally, the assessments of learning in the focal and transfer learning tasks assessed in Studies 1 and 2 could have just as easily been considered measures of task performance rather than changes in learning, an issue explored further in the following section. 36 Perhaps of greater import though, the learning indicators used in many of these studies were not often consistent with nor effective at demonstrating how the proposed mechanisms of stereotype threat impacted knowledge/skill acquisition efforts. For instance, Rydell, Rydell, and Boucher (2010) stated that stereotype threat affects mathematical learning “by reducing [a threatened individual’s] ability to encode mathematical information into memory, not by inhibiting the ability to retrieve mathematical information from memory.” Irrespective of research which suggests working memory and therefore impediments to working memory do interfere with the process of retrieving/decoding information from memory (Rosen & Engle, 1997, 1998), the use of measures which require participants to explicitly recall learned material (e.g., Studies 1 and 2) most certainly taps into memory retrieval processes and thus dilutes observations of threat’s effects on encoding efficiency. Additionally, Rydell, Shiffrin et al. (2010) hypothesize in Studies 4-6 that stereotype threat inhibited women’s perceptual learning because it impeded the development of more efficient visual search strategies (“popout” and character unitization, Shiffrin & Lightfoot, 1997; Shiffrin & Schneider, 1977). They later confess, though, that their choice to only model changes in response times as an indicator of learning (Studies 4-5) does not allow them to conclude whether learning of these visual search processes was impeded by the introduction of threat—instead they can simply infer that “Whatever had been learned by women in the control group, it seems not to have been learned by women under [stereotype threat]” (p. 14046). Research/practical implications of stereotype threat at learning. In sum, the Rydell studies simultaneously provide an important proof of concept and demonstrate that the implementation and operationalization of learning experiences is an important consideration to the study of stereotype threat effects on knowledge/skill acquisition. Fortunately, there is well 37 over 100 years’ worth of accumulated literature on learning behaviors, outcomes, and techniques available to inform investigations of threat at learning. The tripartite classification scheme developed by Kraiger et al. (1993) has been advanced here as one particularly useful theoretical perspective for organizing investigations of threat effects on the learning process due to its theoretically grounded and intuitive description of possible learning foci and constructs. As shown in Table 1, translating the learning outcomes assessed by Rydell and colleagues through this conceptual lens reveals that Studies 1-3 appeared to focus on threatened individuals’ ability to learn declarative knowledge (facts/statements/rules about mathematical or logical operators) while Studies 4-6 investigated the acquisition of skill-based learning outcomes and the ability for individuals to compile and automate new visual search strategies (popout and character unitization). Rydell and colleagues’ initial efforts therefore examine only a small portion of the possible construct space within which stereotype threat may influence learning (especially with respect to cognitive learning outcomes, the learning domain arguably most relevant to traditional applications of stereotype threat, e.g., Steele, 1997; Steele & Aronson, 1995). As such, integrating even basic frameworks of learning such as Kraiger et al. (1993) offers a useful contribution to future research efforts in this area—and reveals there is still significant empirical ground to cover with respect to understanding stereotype threat at learning. Rydell, Rydell, and Boucher’s (2010) investigation of stereotype threat effects on cognitive learning outcomes also exemplifies another unique and troubling methodological issue for continued research in this area. Specifically, although declarative knowledge acquisition serves as the foundation for the development of more advanced learning (Anderson, 1982, 1993a, 1996), the most common assessments of declarative knowledge acquisition (multiple-choice, true-false, or free recall tests, cf., Kirkpatrick, 1976, 1987; Kraiger et al., 1993) are identical to 38 those used to demonstrate stereotype threat effects on cognitive performance outcomes (e.g., Grand et al., 2011; Nguyen et al., 2003; Ployhart et al., 2003; Spencer et al., 1999; Steele & Aronson, 1995, etc.). This is problematic given that (1) stereotype threat hijacks available cognitive resources from working memory (Schmader et al., 2008; Schmader, 2010), (2) working memory capacity influences both the encoding (Feldman Barrett et al., 2004) and retrieval (Rosen & Engle, 1997, 1998) of declarative knowledge to/from longer-term memory stores, and (3) virtually all self-report assessments of declarative knowledge involve memory retrieval mechanisms (cf., Kirkpatrick, 1976, 1987). Therefore, pinpointing the manner by which stereotype threat affects the acquisition of declarative knowledge through traditional “learning assessments” becomes analytically untenable as such measurement approaches are ill-equipped to disentangle threat’s effects on cognitive encoding, storage, and synthesis mechanisms (characteristics associated with learning, e.g., Kraiger et al., 1993) from its effects on cognitive retrieval, integration, and manipulation mechanisms (characteristics associated with performance, 4 e.g., Schmader et al., 2008) . As Figure 3 highlights though, knowledge structure organization and procedural strategy formulation represent two alternative possibilities for examining the influence of stereotype threat on the acquisition of cognitive learning outcomes. Although changes in these outcomes are posited to be more “advanced” consequences of one’s learning experiences (Kraiger et al., 1993) and will, to some extent, be influenced by the acquisition and retrieval of declarative knowledge (Anderson, 1982, 1993a, 1996; Anderson et al., 2004), examining these learning indicators holds certain advantages over more traditional measures of learning proficiency. For example, knowledge structures can be assessed by asking respondents to make relational ratings among relevant domain/task concepts that are presented to respondents and do not necessarily require 39 one to explicitly recall declarative facts (e.g., Schvaneveldt, 1990). As a result, one can lessen the “double-dipping” problem of stereotype threat effects on encoding and retrieval processes that would taint investigations focused only on declarative knowledge while still examining and interpreting the influence of stereotype threat on learning outcomes. Additionally, the development of effective knowledge structures and cognitive performance strategies hold important implications for a variety of performance outcomes as well. At a conceptual level, the relation between performance and knowledge has been widely acknowledged. Campbell, McCloy, Oppler, and Sager (1993) characterize job performance as a multiply determined, integrative function of domain-/goal-relevant declarative knowledge, procedural knowledge/skill, and motivation. More generally, the ACT-R model of cognition (cf., Anderson, 1996; Anderson et al., 2004) posits that virtually all demonstrative instances of intellective performance result from the encoding of environmental stimuli into feature-rich information “chunks” (declarative knowledge) that feed production rules (i.e., procedural knowledge) and guide the generation of task-relevant outcomes. Inherent in both of these conceptualizations of performance, however, is the notion that the translation of basic knowledge to actionable knowledge (i.e., translating information about what to do into information about how to do it, when to do it, and why it’s done in that manner) is central to performing any task. Empirical evidence of the relation between knowledge and task outcomes further supports the notion that knowledge structures and cognitive strategies are integral in determining the manner by which one contextualizes, approaches, and undertakes performance-relevant activities (e.g., Day et al., 2001; Kozlowski, Gully, et al., 2001; Medin et al., 2006; Royer, Tronsky, Chan, Jackson, & Marchant, 1999; Zentall, 1999). 40 The above also suggests that the influence of stereotype threat on the development of advanced cognitive learning outcomes could impact the performance potential of threatened individuals. For example, even if one were to find no significant performance differences between threatened and non-threatened individuals on a declarative knowledge-type assessment in a performance domain, threatened individuals may possess less well organized or proceduralized knowledge within that domain. As a result, they may take longer to do the same tasks, expend greater cognitive resources to achieve the same performance levels, or be less capable/require greater levels of investment to learn related tasks than non-threatened others. Such consequences could potentially lead to a host of undesirable outcomes in the long run, such as quicker burnout within a domain, stagnated performance growth, and fewer opportunities for advancement/promotion—especially in areas with both prevalent group stereotypes and rapidlypaced learning environments (i.e., STEM fields). In sum, the importance of working memory to the acquisition/retention of task-relevant information, behaviors, and dispositions strongly implies that the learning efforts of threatened individuals may be undermined by the working memory decrements stemming from stereotype threat. The use of Kraiger et al.’s (1993) classification system in this context indicates a variety of possible learning outcomes which may be susceptible to threat effects and further serves as a useful organizing framework for systematically advancing research in this area beyond previous works. Although the present study focuses only on the acquisition of advanced cognitive learning outcomes, stereotype threat may impact equally important components related to skill-based (e.g., inability to adapt to effective behavioral routines, slower to automatize procedures and thus develop expertise) and affective (e.g., smaller self-efficacy improvements, greater resistance to error-based training, etc.) learning outcomes as well. Investigations of the effects of stereotype 41 threat at learning thus potentially carry a number of methodological and practical implications, including a better understanding of how threat experienced at performance differs or is similar to threat experienced during learning, how to improve training/learning paradigms to help targeted individuals learn and engage information from stereotyped content domains, and further explication of the boundary conditions, cognitive mechanisms, and alternative consequences of stereotype threat beyond intellective testing (cf., Nguyen & Ryan, 2008). Research Hypotheses The preceding description of the nature of stereotype threat and the theoretical rationale for the potential influence of stereotype threat at learning provides the bulk of the conceptual underpinnings for the hypotheses presented below. In the present study, the primary predictions of interest concern the effects of stereotype threat on participants’ knowledge organization, cognitive strategy formulation, and subsequent task performance. Given that the elicitation of stereotype threat is conditioned on the subgroup comparison and stereotyped domain of interest, it is pertinent to briefly make note of these aspects. In the stereotype threat conditions, the purpose of the study will be presented to participants as an examination of why some subgroups have greater difficulty on mathematical reasoning assessments. Males and females will serve as the subgroup comparison of interest as stereotypes and empirical performance differences favoring men are commonly documented within this ability domain, especially amongst collegeeducated populations, and are commonly recognized by most individuals in Western cultures (Ackerman, Bowen, Beier, & Kanfer, 2001; Beilock et al., 2010; Halpern, 2000; Halpern et al., 2007; Hyde, Fennema, & Lamon, 1990). Furthermore, females’ performance within mathematical domains has been shown to be reactive to and produce patterns consistent with stereotype threat in previous research (e.g., Spencer et al., 1999) and can be elicited even in 42 instances where the task itself does not explicitly involve mathematical operations/content (e.g., Jamieson & Harkins, 2007; Rydell, Shiffrin, et al., 2010). Stereotype threat and knowledge organization. Knowledge organization represents the way in which individuals form and store relationships among declarative facts, concepts, propositions, data, and other objects within a given task domain (e.g., Jonassen et al., 1993; Koubek, Clarkston, & Chavez, 1994; Shavelson, 1972, 1974; Taber, 2000). It is generally believed that such knowledge structures reflect an individual’s deeper understanding of the manner by which task requirements are fulfilled as well as the knowledge, skills, and procedures necessary to achieve them (Glaser, 1990; Rowe et al., 1996; Schoenfeld & Herrmann, 1982). Assessments of knowledge structures have most commonly been used to examine betweenperson differences in knowledge/skill acquisition and as a means to differentiate between domain experts and novices (e.g., Chi, Glaser, & Farr, 1988; Day et al., 2001; Ford & Kraiger, 1995). However, knowledge structures can also be used to investigate within-person changes in information acquisition/synthesis as individuals accumulate experience in a given content domain (Rumelhart & Norman, 1978), although longitudinal examinations of this process are rare (Ifenthaler, Masduki, & Seel, 2011; Ifenthaler & Seel, 2005; Seel, 1999). In order to characterize the hypothesized relationship between stereotype threat and knowledge organization, it is useful to describe the manner by which knowledge structures will be operationalized and constructed in the present study. Knowledge structures can be elicited from individuals in a variety of ways. Ifenthaler et al. (2011) broadly classify these different techniques into either natural language (e.g., thinking-out-loud protocols, word association, card sorting, etc.) or graphical (concept mapping, causal diagrams, etc.) approaches. Each methodology possesses different strengths and weaknesses, though graphical data gathering and 43 analytic approaches have become increasingly popular due to the ease with which they can be used to quantitatively and qualitatively represent structural knowledge formations (Goldsmith, Johnson, & Acton, 1991; Schuelke, 2009). Among these graphical approaches, the Pathfinder algorithm (Schvaneveldt, 1990) and its associated statistical software (Interlink, 2011) is among the most familiar and commonly used technique in the learning and training literature and the one that will be adopted for the present study. In brief, Pathfinder reconstructs structural networks amongst a set of concepts based on similarity or proximity ratings provided by an individual for all possible pairs of concepts in the set. Each rating is interpreted as how strongly a pair of concepts is related in a person’s memory (Nagy, 1984). The Pathfinder algorithm then identifies the most parsimonious relationships shared amongst all concepts in order to form a structural network composed of nodes (concepts), links (a single relationship between two concepts), and paths (an indirect relationship between any two concepts composed of x number of links). Analytically, the algorithm operates by first linking all concepts in the network together and then removing direct links between any two concepts if a stronger indirect link (i.e., a link that passes through another node that lies between the target and destination node) exists. Two parameters can be manipulated in the network generation algorithm to control the production of links and paths among nodes (Dearholt & Schvaneveldt, 1990): r (from Minkowski’s distance formula), which determines how the distance between nodes not directly linked is computed, and q, which places a limit on the number of links that can exist in a path between any two nodes in the network. Similar to previous studies employing Pathfinder (e.g., Day et al., 2001; Schuelke et al., 2009; Kozlowski, Gully, et al., 2001; Kraiger, Salas, & Cannon-Bowers, 1995), the present analyses evaluated structures in which r = ∞ (the weight of a path is equal to the maximum weight of any link in the 44 path) and q = n-1, where n equals the number of concepts in the network. These parameter values tend to produce the most parsimonious structures while still allowing maximal interconnectivity among nodes in the network. Pathfinder networks can also be used to compute two sets of statistical indices useful to characterizing the organizational efficiency of observed knowledge structures. The first of these are the similarity and correlation indices and are used to evaluate comparative similarity across different knowledge structures (e.g., a structure derived from a novice versus one derived from a domain expert, etc.). Similarity reflects the extent to which two knowledge structures share the same pattern of linkages among concepts, whereas correlation indices represent the extent to which these associations share consistent rank-order priorities (Schuelke et al., 2009; SmithJentsch, Mathieau, & Kraiger, 2005). In addition to these referent-based indices, the second set of numeric indices provides information about the structural characteristics of any single knowledge map. Number of links yields information about the complexity/parsimony of a knowledge structure and can be used to evaluate the degree to which particular concepts are more/less related to others (Day et al., 2001). Alternatively, coherence is a measure of internal consistency depicting the extent to which concepts share logical relations based broadly on assumptions of transitivity (i.e., if two concepts share similar relations with other concepts, then those two concepts should be similar to each other, Interlink, 2011). In addition to these numerical outputs, Pathfinder networks can be used to examine the manner by which concepts are clustered in individual’s knowledge structures. The derivation of knowledge structures in Pathfinder’s software incorporates features similar to those used in hierarchical clustering analyses and multidimensional scaling techniques; as a result, the relative location of concepts in a visualized Pathfinder graph can also be used to deduce semantic categorization of knowledge 45 (Dearholt & Schvaneveldt, 1990; Esposito, 1990). Thus, similarity, correlation, number of links, coherence and clustering will be examined as the primary knowledge structure output of interest. Based on the rationale that stereotype threat impedes working memory efficiency by hijacking available capacity for task-irrelevant demands (i.e., Beilock et al., 2006; Beilock et al., 2007; Schmader, 2010; Schmader et al., 2009; Schmader & Johns, 2003) and working memory capacity is related to learning and related information encoding processes (cf., Feldman Barrett et al., 2004), stereotype threat should negatively influence knowledge structure formation by inhibiting threatened individuals’ ability to maintain enough task-relevant information in an activated/accessible state for a sufficient duration of time needed to develop an integrated mental representation of the content. Developing coherent and proficient knowledge structures is a cognitively demanding and effortful process that requires individuals to successfully coordinate information about features, meanings, and associations simultaneously across multiple concepts. To the extent that one’s capacity to actively engage and efficiently direct attentional resources towards these activities is inhibited, the formation of a comprehensive knowledge structure indicative of a deep understanding of the task domain should be affected. As indirect evidence of this proposition, MacDonald, Just, and Carpenter (1992) and Whitney et al. (1991) both report evidence consistent with the claim that persons with lower working memory capacity are less able to maintain as many pieces of information in active attention long enough to derive alternative possible meanings for written prose or disambiguate multiple semantic interpretations, both of which are indicative that one has formed more complex associations among available information. More directly relevant to the relationship between working memory and knowledge structure formation, Cantor and Engle (1993) posited that individuals learn declarative knowledge by creating semantic memory networks containing 46 the underlying relational propositions among those chunks (cf., Anderson, 1996). However, in order to form such mental representations, a significant portion of those information bits must remain in an activated state during learning in order for individuals to draw associations between their shared features (Johsnon-Laird, 1983; Radavansky & Zacks, 1991). The development of these schemas facilitates task functioning in that they allow individuals to access the majority of information about a context by activating only a single mental representation containing all the underlying propositional relations among the information rather than actively retrieving each of those facts separately and independently (Reder & Anderson, 1980). Thus, well-formed knowledge structures enable one to retrieve more information faster and with less attentional demand. To the extent that working memory is related to the development of these mental models, one would expect to see differences in how readily individuals with differing levels of working memory capacity could learn and recall large amounts of rote declarative information (Johnson-Laird, 1983). In a series of studies examining these predictions, Cantor and Engle (1993) revealed that individuals with lower working memory capacity were (A) slower at correctly identifying learned sentences as the number of common features they shared with other learned sentences increased and (B) slower at correctly identifying learned sentences as the number of sentences in the pool of related sentences to-be-learned increased. Comparatively, individuals with higher working memory were much quicker at completing (A) than their lower-memory counterparts and became faster at (B) as the learning pool increased. Although not a direct examination of mental model formation per se, these results are consistent with the claim that more comprehensive knowledge structures facilitate task functioning and individuals with lower working memory capacities may have greater difficulty maintaining enough data in working 47 memory during learning efforts necessary to efficiently integrate the semantic content of and relational associations among task-relevant concepts. Thus, the inability to make use of one’s full reserve of working memory capacity should disproportionately influence a threatened individual’s ability to actively maintain enough information related to task goals/requirements and for the duration of time needed to develop comprehensive associations among knowledge concepts indicative of advanced learning. Consequently, working memory decrements engendered by stereotype threat should exert a stronger influence on the development of integrated, sophisticated, and well-organized knowledge structures in threatened individuals compared to non-threatened individuals. In the context of the present study, men are not expected to be influenced by the stereotype threat manipulation and thus provide a reasonable control condition against which to examine differences in knowledge structure formations between threatened and non-threatened women. Hypothesis 1: The knowledge structures of females who learn under conditions of stereotype threat will be less similar to those from top performers/men than the knowledge structures of females who learn under control conditions. Hypothesis 2: The knowledge structures of females who learn under conditions of stereotype threat will be less correlated with those from top performers/men than the knowledge structures of females who learn under control conditions. Hypothesis 3: The knowledge structures of females who learn under conditions of stereotype threat will be less coherent than the knowledge structures of females who learn under control conditions. 48 Hypothesis 4: The knowledge structures of females who learn under conditions of stereotype threat will have significantly more links (i.e., be less parsimonious) than the knowledge structures of females who learn under control conditions. Hypothesis 5: The clustering of concepts in the knowledge structures of females who learn under conditions of stereotype threat will be significantly different than that for non-threatened women; specifically, the knowledge structures of females in the stereotype threat conditions will exhibit poorer integration of related task concepts (i.e., report fewer associations between task concepts whose meanings are mutually informative and/or relevant to task performance) than the knowledge structures of women in the control condition. Given the inherently dynamic process through which the accumulation of learning experiences and the development of cognitive learning outcomes proceeds (Anderson et al., 2004; Goldstein & Ford, 2002; Kraiger et al., 1993), investigating changes in knowledge structure formation over time marks an important contribution to the understanding of stereotype threat effects on learning. When attempting to learn a novel/complex domain task, adaptations to one’s knowledge structure should emerge as individuals receive and make use of subsequent learning opportunities to better understand domain concepts and their associations (e.g., Ifenthaler et al., 2011; Jonassen et al., 1993; Kraiger et al., 1993; Rumelhart & Norman, 1978). Furthermore, to the extent that practice and learning exposures enable individuals to improve their understanding of domain concepts and how to effectively complete tasks based on those concepts, knowledge structures should show some degree of convergence towards a singular “optimal” configuration (or small subset of optimal configurations, depending on the nature of the task or the manner by 49 which it is learned, Medin et al., 2006) over time as individuals become knowledgeable, acquire expertise, and become better at performing domain tasks (e.g., Chi et al., 1988; Day et al., 2001). However, it is widely acknowledged that early stages of learning exert significant influence on subsequent learning efforts (e.g., Anderson et al., 2004; Bell & Kozlowski, 2002; Goldstein & Ford, 2002). To the extent that threat-based working memory decrements interfere with the formation of knowledge structures early in the learning process then, the learning difficulties of threatened individuals may compound over time, resulting in more stagnant structures that are much slower to or which never converge towards models that represent efficient, high performing “expert” models. In the present study, the knowledge structures of individuals were evaluated on three consecutive days in an attempt to evaluate differences in learning growth between threatened versus non-threatened individuals. In addition to the contributions that longitudinal examinations of knowledge structure development hold for learning researchers in general (cf., Ifenthaler et al., 2011), these results hold a number of possible implications for design and evaluation considerations of instructional systems in stereotyped domains (e.g., value added of additional training exposures, expected success/retention, etc.)—especially in environments where individuals are expected to quickly gain expertise in one knowledge/skill area before moving to more advanced applications (e.g., mathematics education, technology training, etc.). Hypothesis 6: The similarity between the knowledge structures of females who learn under conditions of stereotype threat with those from top performers/men will improve at a slower rate compared to females who learn under control conditions. 50 Hypothesis 7: The correlation between the knowledge structures of females who learn under conditions of stereotype threat with those from top performers/men will improve at a slower rate compared to females who learn under control conditions. Hypothesis 8: The coherence of the knowledge structures of females who learn under conditions of stereotype threat will improve at a slower rate compared to females who learn under control conditions. Hypothesis 9: The number of links in the knowledge structures of females who learn under conditions of stereotype threat will increase at a faster rate (i.e., structures will become less parsimonious) compared to females who learn under control conditions. Hypothesis 10: The knowledge structures of females who learn under conditions of stereotype threat will demonstrate less integration of related task concepts over time (i.e., fewer and less efficient associations between related task concepts) compared to females who learn under control conditions. Stereotype threat and cognitive strategy. As a general learning outcome, cognitive strategy development describes the internalization and manifestation of cognitive/behavioral processes, procedures, and heuristics that direct one’s efforts towards accomplishing a given goal (Anderson, 1982; Kanfer and Ackerman, 1989; Prawat, 1989). Acquiring effective cognitive strategies is a relatively advanced outcome of the learning process that usually only develops after one has developed a deeper level of understanding about a task/domain and its requirements (Kraiger et al., 1993; Sweller, Mawer, & Ward, 1983). It is perhaps not surprising to note, then, that knowledge organization and cognitive strategies are somewhat interdependent. Knowledge 51 structures provide broadly construed mental schema for deducing cause-effect relations within a domain space, which individuals then learn to effectively employ and integrate with task-specific conditions (i.e., rules, demands, criteria, etc.) to form cognitive strategies that direct future learning and performance efforts (e.g., Chi, Feltovich, & Glaser, 1981; Chi, Glaser, & Rees, 1982; Simon & Simon, 1978). As one’s cognitive strategies are vetted and feedback about their success gathered, subsequent refinements about the relations and clustering among knowledge concepts may also occur. Consistent with Kraiger et al.’s (1993) framework, stereotype threat conditions which adversely impact the formation of effective knowledge structures likely also impair the acquisition of effective cognitive strategies as such strategies would be based on a less sophisticated or complete understanding of the content domain. However, as implied above, cognitive strategy acquisition also requires individuals to simultaneously attend to unique features of a given task in order to incorporate those task demands/needs with one’s conceptual understanding of the domain. Again, working memory functions are believed to play an integral role in this monitoring and integration procedure. Johnson-Laird (1983) succinctly elucidates this intersection: The effects of both number of [mental] models and figure [i.e., task requirements] arise from an inevitable bottleneck in the inferential machinery: the processing capacity of working memory, which must hold one representation in a store, while at the same time the relevant information from the current premise is substituted in it. (p. 115) The development and selection of these inferential strategies is purportedly coordinated by the executive control and episodic buffer functions of one’s working memory (Baddeley, 1986; 2000). During such activities, individuals learn to activate representative knowledge structures and cognitive areas of functioning that assist in reconciling current conditions/needs in the task space (e.g., Johnson-Laird, 1983; Wraga et al., 2007). As individuals streamline this process and 52 become more proficient at its application over time, the coordination between knowledge structure retrieval and task demand processing also becomes a less effortful undertaking (Smith, McEvoy, & Gevins, 1999), freeing working memory resources to monitor and respond to other stimuli in the learning environment. However, threatened individuals who may be employing sparsely developed knowledge structures to begin with while also relying on limited working memory capacities to coordinate these inferential processes should be less likely to see such effective strategies emerge and/or improve during learning. A threatened individual with diminished working memory capacity should therefore have greater difficulty holding learned knowledge structures in active awareness, learning/interpreting conditional features of a task space, and integrating those pieces into effective strategies that dictate how those representations should be applied to most effectively operate within a given context. In support of this proposition, a number of studies report strong correlations between working memory capacity and cognitive strategy use/selection (e.g., Anderson, Reder, & Lebiere, 1996; Barrouillet, Bernardin, & Camos, 2004; Barrouillet & Lépine, 2005; Dunlosky & Kane, 2007; Espy et al., 2004; Gilhooly, Logie, Wetherick, & Wynn, 1993; McNamara & Scott, 2001). For example, research on arithmetic skill development in elementary school children reveals that those with lower working memory capacities tend to have more difficulty learning and employing advanced problem-solving tactics. Furthermore, individuals with less working memory capacity are less likely to adapt to using more effective strategies in response to increased complexity in task demands (Geary, Hoard, Byrd-Craven, & DeSoto, 2004; Imbo & Vandierendonck, 2007). Interestingly, these findings are consistent with the interpretations drawn by Rydell, Shiffrin et al. (2011) that threatened individuals (whose working memory capacities are presumably diminished) tended to persist in suboptimal visual search strategies 53 over the course of multiple experimental trials rather than acquiring new and more efficient approaches to task completion. As such, threatened women are predicted to develop less effective and advanced cognitive task strategies than non-threatened women. Hypothesis 11: Females who learn under conditions of stereotype threat condition will exhibit poorer/more basic cognitive task strategies than females who learn under control conditions. For many task domains, cognitive strategies also reflect one’s methods for identifying particular pieces of declarative knowledge and optimally interpreting that information according to procedural rules/knowledge. For example, developing expertise in a number of educational (solving mathematical word problems, completing verbal reasoning tasks, etc.) and practical (e.g., technology troubleshooting, providing task orders/direction in an emergency room, etc.) applications occurs as individuals learn how to distinguish relevant from irrelevant information, interpret its relative value to the desired performance outcome, and correctly formulate interpretations based on that information (Sweller et al., 1983). Part of the learning process involved in such tasks, therefore, is the development of strategic heuristics that explicitly orient these information gathering and combination efforts in the most efficient manner. However, these cognitive tasks may be more difficult for threatened individuals as they have less working memory capacity available to assist in information screening and evaluation activities (Schmader et al., 2003; Schmader et al., 2008). As a result, these persons may be more likely to attend to irrelevant declarative knowledge facts during learning and incorporate this information into suboptimal performance strategies. It is therefore predicted that: 54 Hypothesis 12: Females who learn under conditions of stereotype threat will develop less optimal procedural decision strategies for task completion than females who learn under control conditions. Stereotype threat and performance. The overwhelming majority of stereotype threat research has been directed towards identifying decrements in ability test performance (cf., Nguyen & Ryan, 2008). Though stereotype threat has been shown to influence outcomes from other sensorimotor and social tasks (e.g., Stone 2002; Stone et al., 1999; Stone & McWhinnie, 2008; Kray et al., 2002), few investigations have examined threat effects on performance outcomes in computer-based simulation tasks. However, simulation use has become an exceedingly common method of assessment and training in educational and industry areas (Bell, Kanar, & Kozlowski, 2008) and thus represents an important extension and application of stereotype threat research. Advances in technology have made the design, maintenance, and implementation of realistic job/educational simulations more affordable and accessible to decision-makers than ever before (Bell & Kozlowski, 2007), leading many industries to adopt these tools as part of their everyday repertoire. For example, one survey claims that upwards of 97.5% of business schools use some form of simulation gaming as part of their standard curricula (Faria, 1998). Furthermore, it was estimated that at least 75% of organizations in the United States with more than 1,000 employees use business simulations for hiring/training purposes (Faria & Nulsen, 1996) and that between $623 and $712 million dollars in global revenue was generated by the simulation-based training industry in 2003 (Summers, 2004). In the present study, a computer-based simulation task will serve as the learning and performance environment for participants. Many simulations are designed to teach, evaluate, and/or relate to skills and abilities comparable to those targeted by traditional mediums (e.g., 55 spatial ability, mathematical ability, etc.); it is therefore plausible that individuals would believe that a given simulation is indicative of a particular capability if so informed even if, on its face, that simulation does not appear to be a direct measure of the ability in question (e.g., Rydell, Shiffrin, et al., 2011). Thus, the same performance discrepancies predicted by the broader stereotype threat literature on standard ability tests between threatened and non-threatened individuals are also expected in the present simulation-based task. Additionally, because stereotype threat is expected to produce learning difficulties within the task domain, it is also predicted that the performance trajectories across conditions of threat will differ over time. Hypothesis 13: Females who learn under conditions of stereotype threat will demonstrate worse performance on the learned task than females who learn under control conditions. Hypothesis 14: Females who learn under conditions of stereotype threat will improve their performance on the learned task at a slower rate than females who learn under control conditions. 56 METHOD Participants Participants were 198 undergraduate students (M age = 19.49, SD = 2.04) from psychology courses at a large Midwestern university. All individuals were informed that the experiment spanned three consecutive days and that interested persons should only volunteer for the experiment if they were able to complete the study in its entirety. Table 2 provides a breakdown of the number of participants who completed each day of the experiment by sex and experimental condition. Given the focus on negative stereotypes towards female achievement in mathematical tasks and the within-group nature of the stereotype threat theory/predictions, women were the primary group of analytic interest; as such, they were purposely overrepresented in the sample relative to men. Across the entire sample, 79.8% of participants completed all three days of the experiment, with a slightly higher attrition rate occurring between Days 2 and 3 relative to Days 1 and 2. A two-way analysis of variance (ANOVA) on number of days attended by participants in the study revealed no significant main effects for sex (F(1,198) = .228, ns) or condition (F(1,198) = .013, ns), nor was there a significant sex by condition interaction (F(1,198) = .820, ns). These results indicate that the observed attrition rates were not differentially influenced by the stereotype threat manipulation nor did they differ for males and females in the experimental conditions. For their participation in the study, all individuals were compensated with course credit; as additional incentive, participants were also informed that the top 10% of performers in the experiment would receive a $60 cash prize. Because women facing stereotype threat were expected to do more poorly in the study overall, the cash prizes were awarded to the top 10% of performers within each of the 2 (sex: male, female) x 2 (condition: stereotype threat, control) 57 Table 2 Total Sample Size and Attrition Rates across Days by Sex and Experimental Condition Stereotype Threat Control Total Females Males Females Males Day 1 71 27 74 26 198 Day 2 65 22 69 24 180 Day 3 58 22 56 22 158 % Attrition from Day 1 to Day 2 8.45% 18.52% 6.76% 7.70% 9.09% % Attrition from Day 2 to Day 3 10.77% 0% 18.84% 8.33% 12.22% % Attrition from Day 1 to Day 3 18.31% 18.52% 24.32% 15.38% 20.20% experimental design cells to ensure that all participants had a fair chance of earning the award. To be eligible for the cash prize, participants were required to attend all three of the scheduled sessions as described in the experimental procedure below. Experimental Task A modified version of the Tactical Naval Decision Making system (TANDEM, Weaver, Bowers, Salas, Cannon-Bowers, 1995) was used as the experimental platform. TANDEM is a complex, dynamic, information-processing and decision task set in the context of a low fidelity radar-tracking simulation. The TANDEM task paradigm has been used to study a variety of phenomena, and has a particularly rich history in investigations of learning and self-regulation at both the individual- and team-level (e.g., Bell & Kozlowski, 2002; 2008; DeShon, Kozlowski, Schmidt, Milner, & Wiechmann, 2004; Inzana, Driskell, Salas, & Johnston, 1996; Kozlowski, Gully, et al., 2001). TANDEM is well suited for the purposes of the present study as it requires 58 participants to learn basic declarative facts which share clearly identifiable relations as well as more advanced procedural strategies in order to effectively complete the simulation. Furthermore, the nature of the task environment and associated cognitive load creates a demanding operational environment for learners, characteristics which have been shown to exacerbate the salience of stereotype threat and thereby improve the likelihood of observing its effects on learning outcomes (cf., Steele, 1997; Steele & Davies, 2003). The primary objective of TANDEM is to earn points by accurately and quickly identifying, evaluating, and making decisions regarding what action to take against targets that appear on one’s computer screen. Participants are presented with a circular radar display which shows multiple targets in motion around a central radar-tracking station (Figure 4). Targets are either present on the screen from the beginning of a trial or appear (“pop-up”) after some period of time elapses. Within the radar space are two defensive perimeters: an inner perimeter clearly marked on screen, and an outer perimeter that is not visible. In the current task design, the location of the outer perimeter could be approximated by expanding the display window ("zooming out”) to 256 NM (nautical miles) and locating a ring of six stationary targets designated as “markers.” The marker targets were identical in appearance to all other contacts except they did not move and engaging would them would neither earn nor lose points. Points were gained for every target correctly prosecuted and lost for every target incorrectly prosecuted or which crossed into one of the defensive perimeters. To prosecute a contact and earn points, participants needed to (1) “hook” a target by selecting it with the mouse cursor, (2) view and interpret cues that provided information about various decision-relevant characteristics of that target (i.e., Speed, Direction of Origin, Countermeasures, etc.), (3) make three subdecisions based on those cue values, and (4) indicate a final engagement decision 59 Figure 4. TANDEM graphical user interface based on the selected subdecisions. To earn points, all three subdecisions and the final engagement decision needed to be correct; making even one of these decisions incorrectly resulted in a loss of points. As shown in Table 3, each subdecision (corresponding to a target’s Type, Class, and Intent) was informed by three cues whose values were uniquely associated with a single subdecision outcome. For example, when making the Type subdecision, participants could classify the target as an Aircraft, Surface, or Submarine vessel. A target whose Speed was greater than 35 knots, Altitude/Depth greater than zero feet, and communication time between one and forty seconds was classified as an Air vessel; similarly, a Speed between 25 and 34 knots, an 60 Table 3 Subdecision Outcomes and Relevant Identifying Information Cues/Values in TANDEM Subdecision Subdecision Outcomes Identifying Cues & Cue Values Speed ≥ 35 knots > 0 feet 1-40s 25-34 knots 0 feet 41-80s 0-24 knots < 0 feet 81-120s Countermeasures Signal Strength Maneuvering Pattern Civilian None Moderate Code Foxtrot Unknown Inactive Indistinct Code Echo Military Jamming Weak Code Delta Identification Direction of Origin Response Peaceful Prince Green Beach Authorized Unknown Golf Blue Lagoon Inaudible Hostile Intent Air Sub Class Communication Time Surface Type Altitude/Depth Tango Orange Bay Invalid Note. Some targets possessed cue values for the Class and Intent subdecisions which identified them as Unknown; however, participants were not permitted to classify a target’s Class or Intent as Unknown. Altitude/Depth of zero feet, and a communication time between 41 and 80 seconds were indicative of a Surface contact. The same protocol was used to identify a target’s Class and Intent, though each of these subdecisions possessed only two possible outcomes (Civilian or Military for the Class subdecision, Peaceful or Hostile for the Intent subdecision). Once all three subdecisions were made, participants could make the final engagement decision for a target based on the “rules of engagement” shown in Table 4. The rules indicated how a target should be prosecuted according to its Type, Class, and Intent; thus, a target whose Type was Air, Class was Civilian, and Intent was Peaceful should be Warned, whereas a target that was a Sub, Military, Hostile should be Marked. 61 Table 4 Rules of Engagement for Determining Final Engagement Decisions Clear Warn Mark Air, Military, Peaceful Air, Civilian, Hostile Air, Military, Hostile Surface, Civilian, Peaceful Air, Civilian, Peaceful Surface, Military, Hostile Surface, Military, Peaceful Surface, Civilian, Hostile Sub, Civilian, Hostile Sub, Military, Peaceful Sub, Civilian, Peaceful Sub, Military, Hostile A comprehensive operations manual was available to participants prior to each trial that contained all the relevant information needed to operate TANDEM. The information in the manual could be categorized into three major topic areas: basic gameplay information, cue value interpretation, and task strategies. The sections describing basic gameplay included information on the computer functions needed to operate the task (i.e., how to hook targets, access cue menus, etc.), scoring rules, and the task objectives/background context (i.e., engaging contacts quickly and accurately, etc.). Material pertaining to cue value interpretation included the cue values needed to make accurate subdecisions, the rules of engagement, and the order in which task decisions needed to be made. This section also indicated that the radar sometimes provided conflicting/ambiguous cue values for a target (i.e., two cue values supported one decision while the remaining cue value supported a different decision) and that, in such cases, the option supported by the majority of cues should be selected. Lastly, details about task strategies included advanced/tactical aspects of the task related to perimeter identification (using the zoom functions, locating/using marker targets to identify the invisible defensive perimeter) and target prioritization (gauging contact speed and location relative to perimeters, switching between defensive perimeters, cost of perimeter breach). 62 Procedure Online signup. Individuals were recruited and registered to participate in the study through the Psychology Department subject pool website. In addition to basic information about the experiment, the online study description/recruitment materials described incentives for participation and indicated that participants would be eligible to receive a monetary award for completing all portions of the study. Approximately 15 participants were permitted to sign up for a single experimental session at a time, though efforts were made to ensure that each session contained both male and female participants. Each experimental session was assigned to either the control or stereotype threat condition with attempts to maintain a balanced sample size across both conditions. After signing up for the experiment, individuals completed an online consent form (Appendix A) as well as a short questionnaire containing a small number of background/demographic items and a survey on math domain identification. In total, the sign-up procedure and questionnaire lasted approximately 10-15 minutes. Upon completing the online portion of the experiment, participants were directed to attend the computer lab where they would play the TANDEM simulation on the first day of their scheduled experimental session. A reminder e-mail was sent to all participants approximately one week before the date of their lab session which provided the dates, times, and room number in which the in-person portion of the experiment would be held. Experimental sessions. An overview of the sequencing and timing of the experimental sessions is presented in Table 5. Each session took place over three consecutive days. At the beginning of Day 1, individuals completed an additional informed consent containing information about the lab portion of the experiment and their rights as research participants (Appendix B). Following the consent procedure, participants completed a computerized 63 Table 5 Summary of Experimental Session Sequence and Timings Day Activity Informed consent Time -- Working memory assessment Introductory presentation Day 1 20 mins 8 mins Familiarization trial 3 mins Practice trials (6) 8 mins (48 mins) Performance trial 10 mins Post-trial measurement 25 mins Familiarization trial 3 mins 8 mins (48 mins) Performance trial 10 mins Post-trial measurement 25 mins Familiarization trial Day 2 Practice trials (6) 3 mins 8 mins (48 mins) Performance trial 10 mins Post-trial measurement Day 3 Practice trials (6) 25 mins Note. Days 1-3 occurred consecutively. assessment of working memory; once all participants finished the assessment, a short, automated training presentation was displayed. A brief familiarization trial followed which enabled participants to familiarize themselves with the TANDEM interface and the experimenter to explain the rules and procedures participants should follow for completing the experiment. Next, participants completed six practice trials with TANDEM where they learned to engage targets using the radar interface; at set intervals between these trials, instructions containing both the 64 experimental manipulations and guided exploratory learning recommendations were presented to participants. Once participants had completed the practice rounds, a single performance trial was completed; scores from the performance trials were later used to determine the winner of the monetary awards. Lastly, individuals were asked to complete a series of post-trial measures. At the end of Days 1 and 2, participants were provided with a reminder slip that indicated the date, time, and location of their next session. At the end of Day 3, participants were debriefed and the manner by which the monetary awards would be distributed was described (Appendix C). In its entirety, the lab portion of the experiment took approximately 5 hours to complete. The full experimental protocol for the experimental sessions is provided in Appendix D. Task introduction and familiarization trial. The automated introductory presentation shown to all participants at the beginning of Day 1 was projected on a large screen at the front of the computer lab. The narrated video lasted approximately eight minutes and described the purpose of TANDEM, the sequence of events in the study, procedural rules for the lab, and how to operate the task interface and manual. The experimental manipulation instructions were also presented for the first time near the start of the presentation. Following the introductory training video, participants completed a short familiarization trial using the TANDEM task environment. During this period, individuals were permitted access to the online instruction manual for 30 seconds and then completed a one minute trial that enabled participants to practice using the manual, starting TANDEM, and selecting, viewing information, and inputting target decisions on the radar screen. Participants were informed that the purpose of the familiarization trial was simply to orient themselves with the computer equipment and how to perform these common task operations. No feedback was given regarding performance or activities during the familiarization trial. 65 Practice trials. Following the familiarization period, participants engaged in six practice trials each day, for a total of 18 trials across the course of the experimental session. Each practice trial followed a standard progression of two minutes for studying the task manual, five minutes for hands-on practice with the radar interface, and one minute to review post-trial feedback about performance during that trial (Appendix E). Scores for the practice trials were computed based on scoring algorithms made available to participants in the online manual; 100 points were earned for every target correctly identified and engaged (i.e., all three subdecisions and engagement decision correct) while 100 points were deducted for every misidentification/incorrect prosecution and every target that crossed into either the inner or outer defensive perimeters. Each practice trial consisted of a set of targets which comprised the task scenario. Participants were presented with the same scenario for all trials on a single day, though a different scenario was used across days. Thus, participants saw the same set of targets in all six practice trials on Day 1, with different scenarios used for each of the practice trials on Days 2 and 3. Each practice trial consisted of 21 valid targets (plus six marker targets) distributed at various locations on the radar screen, fourteen of which were visible at the beginning of the trial and seven of which were pop-up targets. The pop-up targets appeared one at a time every 27 seconds from the start of the trial. Although the specific cue values (cf., Table 3) for a given target were generated randomly, significant consideration was given to the distribution of target decision outcomes and the manner by which targets were programmed to behave. Specifically, the construction of each practice scenario was standardized to ensure that: 66 A. The total number of targets representing each of the possible Type, Class, and Intent outcomes was approximately equal across all three days B. The total number of targets representing each of the final engagement decision outcomes as well as the unique combinations of Type, Class, and Intent outcomes (cf., Table 4) was approximately equal across all three days C. The same number of targets crossed the inner (4) and outer (7) perimeters in each scenario D. Five of the seven pop-up targets would cross a defensive perimeter if not prosecuted (3 crossed the outer, 2 crossed the inner) in each scenario E. The speed and location of targets were such that participants could realistically prosecute all “high priority” targets (i.e., those that would cross a defensive perimeter and cost points) in a single scenario (A) and (B) in the list above were particularly important both for analytic purposes and to ensure that, on average, participants could practice interpreting and making decisions using all possible cue values during the practice scenarios. The middle column of Table 6 shows the distribution of subdecision and final engagement outcome characteristics across all targets in the three practice scenarios; for example, across all 63 viable targets constructed for the practice scenarios, 23 were designed to be Cleared, 21 to be Warned, and 19 to be Marked. Similarly equal distributions were achieved for the Type, Class, and Intent subdecisions as well. In sum, across all the practice trials, participants were provided with equal opportunity to practice identifying and classifying all possible target characteristics. Performance trial. At the completion of each day’s practice trials, participants engaged in a single performance trial; the performance trial was similar to the practice trials, 67 Table 6 Distribution of Target Characteristics Across all Scenarios for Practice Trial Targets (n = 63) and Performance Trial Targets (n = 126) Target Characteristics Practice Trials Performance Trials 23 36 21 19 48 42 Air 19 42 Surface Sub 22 22 42 42 Class Civilian 32 72 Military 31 54 Final Engagement Clear Warn Mark Type Intent Peaceful 33 60 Hostile 30 66 Note. Cell values represent total number of targets possessing a given characteristic across all practice and performance trials, respectively though more difficult. Procedurally, the length of the trial was increased from five minutes to eight minutes and participants only received one minute to view the task manual prior to the scenario. Additionally, the following changes were introduced to increase the complexity and cognitive demands of the task (Bell, 2002): (1) the number of prosecutable targets was increased (from 21 to 42); (2) the number of pop-up targets in the scenario was increased (from 7 to 13); (3) scoring was changed such that more points were deducted for targets crossing the inner and outer perimeters (from 100 points to 150 points); (4) more targets were created which crossed a defensive perimeter (from 11 to 18); and (5) the number of pop-up targets that appeared close to a defensive perimeter were increased (from 5 to 8). Participants received instructions describing 68 many of these critical differences before beginning the performance trial. Of relevance to the present study, these changes were designed so that even if high-capacity threatened individuals were capable of “brute forcing” their way through the learning trials by memorizing the correct cue decisions for contacts, they would still experience difficulties during the performance trial because they would not have developed parsimonious and efficiently organized knowledge structures or well-rehearsed cognitive task strategies that would facilitate performance in a more challenging context. Exploratory learning recommendations. During each day’s practice trials, participants were shown a set of exploratory learning recommendations at set intervals; Figure 5 depicts the sequencing of instructional delivery between practice trials for all three days of the experimental session. All instructional text was presented visually on-screen as well as audibly through headphones worn by each participant throughout the experiment. The duration for which the instructions were displayed was controlled by a timer that would automatically advance participants to the next stage of the experiment after it had elapsed. The content for the exploratory learning recommendations was adapted from Bell (2002) and provided participants with reflective questions intended to stimulate learning and exploration of the procedures required to perform TANDEM. The recommendations focused on three key areas relevant to task completion: gathering and interpreting information, monitoring defensive perimeters, and prioritizing targets/maximizing score (full text of the recommendations are provided in Appendix F). During the initial presentation of the recommendations on Day 1, the oral instructions which accompanied the delivery of the recommendations described the intent of the learning recommendations and suggestions for how to focus participants’ learning efforts: This page presents a list of questions that you can use to guide your learning activities within the task. You may find it useful to focus more of your early 69 Figure 5. Sequencing of exploratory learning recommendations and manipulation instructions during daily practice rounds learning efforts in the radar control simulation on how to effectively and efficiently gather and interpret information in order to make accurate decisions. As you become more skilled at accurately processing targets, you may wish to shift your focus to learning how to monitor defensive perimeters and prioritize targets in order to maximize your score. For all subsequent presentations of the learning guidelines, the audio text was slightly altered to encourage participants to reflect on their performance and consider which areas may need greater attention during their practice with the task: Take a moment to assess how you are currently doing on the radar control task. If you are still having difficulty correctly processing targets, you may find it useful to focus more time on learning how to interpret information to make accurate decisions. If your accuracy is improving, you may wish to consider learning more about how to monitor your defensive perimeters and prioritize targets to maximize your score. 70 The exploratory learning recommendations were always presented prior to the delivery of the experimental manipulation instructions preceding practice trials 1 and 5 each day and were displayed on-screen for 50 seconds before terminating. Experimental manipulation. The delivery of the experimental manipulation text was similar to that of the exploratory learning recommendations. A timer-controlled page containing the manipulation instructions was presented to participants at specified intervals. Participants were asked to read the text on screen and/or listen along with the instructions as they were narrated to them through their headphones. To facilitate the pace of the experiment and maintain participant engagement, two versions of the stereotype threat and control condition manipulation instructions were created. The longer, full-length version of the instructions was presented at the start of each day prior to the first practice trial, while a shorter, reduced-length version of the instructions was presented prior to trials 3 and 5 (the only exception to this sequence was on Day 1, in which the longer manipulation was presented during the introductory video and the shortened version presented prior to beginning practice trial 1). The instructions for both the stereotype threat and control conditions are presented in Appendix G and H, respectively. The instructional text was based directly on experimental manipulations used in previous investigations of stereotype threat (e.g., Beilock et al., 2007; Rydell, Shiffrin, et al., 2010; Spencer et al., 1999). The stereotype threat manipulation text incorporated a number of features shown to enhance the cognitive imbalance elicited by the presence of negative group-ability stereotypes. First, participants in the stereotype threat condition were instructed that the purpose of the experiment was to examine possible explanations for why women tend to perform more poorly than men on math problems like those found on the SAT or ACT. Individuals were further informed that one reason for this finding was that women may have more difficulty 71 distinguishing relevant information needed to solve a problem from irrelevant/distracting information, and that TANDEM was designed to examine how these skills develop differently in men and women. The intent of this instructional element was to logically connect the behaviors exhibited in TANDEM to the sex-stereotyped domain of math; similar instructional paradigms have been successfully used to induce stereotype threat effects even in experimental tasks that are not directly indicative of traditional math performance/ability (Beilock et al., 2007; Rydell, Shiffrin et al., 2010). Second, the diagnosticity/normative value of the task was emphasized to participants in the stereotype threat condition by reiterating that the task was capable of detecting differences in the above stereotyped skills. Research by Steele and colleagues (Steele, 1997; Steele & Aronson, 1995; Steele & Davies, 2003) has found that such information can increase the perceived risk of failure and likelihood of adhering to the negative stereotype by threatened individuals. Third, prior to the performance trials, participants in the stereotype threat condition were reminded that their goal was to score as many points as possible and that the top performers on these trials would be eligible for monetary rewards. Research has shown that similar performance approach perspectives are inconsistent with the failure avoidance mindset typically primed by negative stereotypes and can stimulate self-regulatory mismatches for participants in the stereotype condition that could contribute to poor learning/performance behaviors (e.g., Grimm et al., 2009; Jamieson & Harkins, 2007). Finally, all individuals were asked to input their sex into the computer after receiving the first on-screen presentation of the manipulation instructions and just prior to beginning the first practice trial each day. A number of researchers (Ambady et al., 2001; Danaher & Crandall, 2008; McGlone & Aronson, 2006; Shih et al., 1999; Shih et al., 2006; Steele and Aronson, 1995; Yopyk & Prentice, 2005) report that such group saliency reminders 72 can heighten awareness of one’s group affiliation, thereby making it easier for threatened individuals to draw the negative group-ability propositional relation that triggers stereotype threat (but see Stricker and Ward, 2004). Steele and Davies (2003) note it is imperative that cues which might otherwise elicit threat in the control/comparison condition be removed or minimized to the greatest extent possible in order to obtain an accurate test of threat effects. As such, participants in the control condition were informed that the purpose of the experiment was simply to examine individual differences in learning and problem-solving skills (e.g., Steele & Aronson, 1995; Rydell, Shiffrin et al., 2010) and made no mention of sex or sex differences. Furthermore, no indication of the diagnosticity of the task was provided. Lastly, control condition participants were informed that the performance trials should be viewed as opportunities to demonstrate their newly learned skills in a more challenging environment. Measures Over the course of the study, participants were asked to complete a number of individual difference measures either online (Appendix I) or in person (Appendix J). Additionally, several measures were computed from data directly recorded in the TANDEM program Demographics. Participants provided basic demographic information about their age, sex, and proficiency with English in an online questionnaire prior to arriving to the lab. Additionally, participants were asked to indicate their handedness and overall familiarity/experience with playing video games. Cognitive ability. Given its documented relation to learning and performance within the TANDEM task paradigm (Kozlowski, Gully, et al., 2001), a measure of cognitive ability was gathered as a potential control variable in the online questionnaire as well. Participants were 73 asked to report their highest score achieved on the SAT or ACT, which were verified through the university registrar. Research has demonstrated that scores on these tests typically possess a large g component and are generally internally consistent (e.g., Frey & Detterman, 2004; Koenig, Frey, & Detterman, 2008). Math domain identification. Identification with a domain characterizes an individual’s perceptions of the attractiveness, importance, and relevance to self of one’s performance in a particular area of functioning (Steele, 1997). As noted previously, a number of researchers have suggested that the individuals most susceptible to stereotype threat effects are those who care about or are strongly invested in the domain/area where the stereotype threat applies (Crocker et al., 1998; Steele & Aronson, 1995; Steele & Davies, 2003). Although heightened domain identification may exacerbate the experience of stereotype threat, meta-analytic evidence suggests that stereotype threat effects can still emerge for threatened individuals who do not report being strongly identified with a given domain (Nguyen & Ryan, 2008). Consequently, data were collected on participant’s math domain identification as a potential control variable using Smith and White’s (2001) nine-item Domain Identification Measure. The item content focused on respondents’ enjoyment, interest, and performance in mathematics (e.g., “I have always done well in Math”, “How much is Math to the sense of who you are?”); the internal consistency reliability of the scale was α = .91. Working memory. Working memory capacity was also assessed for use as a possible control variable using a modified version of the automated Operation Span (OSPAN) task (Unsworth, Heitz, Schrock, & Engle, 2005). The automated OSPAN task was administered inperson electronically to each participant using the E-Prime 2.0 software package (Psychology 5 Software Tools, 2012; www.pstnet.com) . OSPAN requires individuals to memorize and later 74 recall a series of stimuli (letters, words, etc.) while solving simple mathematics problems (Turner & Engle, 1989). Prior to administration of the OSPAN measure, participants proceeded through a guided training exercise that provided instructions on the operational procedures of the task and opportunities to familiarize themselves with memorizing/recalling letters, solving the math problems, and then memorizing/recalling letters while solving math problems just as they would during actual administration of the memory task. Each set of items on the OSPAN began by presenting participants with a relatively simple mathematical operation (e.g., (9/3) + 2 = ?, (3*2) – 1 = ?) which they were to solve in their heads and then click on-screen once they knew the answer. The amount of time participants were given to answer each question was based on the average response time needed to solve the math items during the practice trials; if participants took longer than their average response time plus 2.5 SD, the task would automatically advance and counted that trial as an error. Following the math operation, a single number was then shown near the top of a new screen along with two buttons labeled true and false which participants used to indicate whether that number was the correct answer to the math problem just viewed. After providing their answer, a random letter (e.g., “F”) was then shown on-screen for 800ms. This procedure repeated until three to seven math operation-letter presentation pairings were completed. Once all the pairings in a sequence were administered, a screen containing 12 letters arranged in a 3x4 grid was displayed; participants were asked to recall the letters in the same order in which they were presented during the pairings by selecting the appropriate letters and then clicking a button to submit their answer. In total, respondents completed three memorization sets from each of the five sizes of operationletter pairings (i.e., three sets of three operation-letter pairings, three sets of four operation-letter pairings, etc.); thus, individuals saw a total of 75 letters and math problems. The specific math 75 operations, letters, and order in which the set sizes were presented were randomized across participants. The recommendations of Conway et al. (2005) were followed to compute respondents’ scores from the OSPAN task. First, participants were required to accurately answer 85% of the math problems they attempted and have no more than 15% speed errors across all items. Second, to compute the final OSPAN scores, the number of letters correctly recalled in the correct serial position was summed across all 15 item sets (i.e., a participant who saw the sequence of letters RSTLNE, but recalled RETSNL would receive a score of 3 for their memory span on the item set as the letters R, T, and N were recalled in the correct serial position). Although some researchers advocate computing OSPAN scores based on whether the entire sequence of letters is reported in the correct order (i.e., a participant who saw the sequence of letters RSTLNE, but recalled RETSNL would receive a score of 0 for their memory span on the item set as the letters S, L, and E were not reported in the correct sequence, cf., Turner & Engle, 1989), previous research shows that this approach can lead to poorer construct/criterion validity and reliability of the working memory span measures (Conway et al., 2005). Thus, performance on the OSPAN was calculated using the partial credit/serial position algorithm described above; scores on the measure could vary from 0 (no letters correctly recalled in the correct location in any item set) to 75 (all letters correctly recalled in the correct location on all item sets). Metacognitive activity. As an alternative measure of cognitive strategy development (Kraiger et al., 1993), individual’s self-reported metacognition was measured at the end of each day using a 12-item measure developed by Ford et al. (1998) and adapted by Bell (2002) to specifically fit within the context of the TANDEM task paradigm. The questions were administered through an online survey system during the post-trial measurement period. Each 76 question asked participants to indicate the extent to which they consciously reflected on their learning and performance activities during the task (e.g., “As I performed in the practice trials, I evaluated how well I was learning the skills of the simulation,” “When my methods were not successful, I experimented with different procedures for performing the task”), with responses given on a 5-point scale (1—Never, 2—Rarely, 3—Sometimes, 4—Frequently, 5—Constantly). Coefficient alphas for the measure were α = .86, .91, and .95 for the assessments on Days 1, 2, and 3, respectively; test-retest reliability between assessment periods was also reasonably strong (r = .68 between Days 1 and 2, r = .83 between Days 2 and 3, and r = .59 between Days 1 and 3). Manipulation checks. Two separate scales were administered to participants through an online questionnaire administered at the end of Day 3 in an attempt to assess the efficacy of the stereotype threat manipulation and participants’ belief that TANDEM was related to mathematical ability, respectively. As a check on the stereotype threat manipulation, a 7-item self-report measure of perceived stereotype threat adapted from Ployhart et al. (2003) was administered. The measure asked participants the extent to which they agreed with statements regarding negative perceptions/expectations about their gender’s performance in the experimental task (e.g., “A negative opinion exists about how members of my gender should perform on this type of task.”). Although Steele (1997) contends that individuals need not be consciously aware that they are under the influence of a negative stereotype to experience its effects, previous research has found that threatened individuals often do perceive this threat and that such self-report measures may be a useful means to assess the saliency of stereotype threat across experimental conditions (Grand et al., 2011). Lastly, a simple two-item measure was constructed to examine participant’s belief about the feasibility of TANDEM’s relatedness to mathematical ability (e.g., “The radar control task assesses skills related to mathematical 77 ability.”). The coefficient alphas for the perceived stereotype threat and manipulation check measures were α = .67 and .74, respectively. Declarative knowledge. An 11-item, multiple-choice test of declarative knowledge pertaining to TANDEM adapted from Bell and Kozlowski (2002) was completed by participants during the post-trial measurement period at the end of each day. The test questions focused exclusively on basic content concerning the interpretation of cue values (“If a target’s characteristics are Speed = 35 knots and Altitude/Depth = 15 feet, which of the following actions should you take?”), and thus provided a measure of the extent to which individuals were able to learn and retrieve foundational knowledge about the task. In an attempt to minimize participant’s reliance on memory of past administrations when answering the items (e.g., Lievens, Reeve, & Heggestad, 2007), a different set of test items was given to participants each day. The additional items for Days 2 and 3 were constructed by altering the item stems and responses from the Day 1. A single item from the Day 1 knowledge test was removed during analyses due to an error in the question response options, leaving only 10 items for this assessment period. The final Cronbach’s alpha coefficients for the test versions administered on Day 1 (α = .60), Day 2 (α = .65), and Day 3 (α = .69) were all moderate and typical for a dichotomously scored assessment. Knowledge structure assessment. To assess the development of knowledge structures, participants were asked to provide proximity ratings indicating perceived similarity between 16 concepts identified as critical to performance in TANDEM at the end of each day (Table 7). Participants provided their ratings using an online survey system. A detailed set of instructions (Appendix K) was provided to participants prior to beginning the rating task that described the purpose of the assessment and recommendations about how the rating task should be completed (cf., Goldsmith et al., 1991). When providing ratings, participants were presented with two 78 Table 7 TANDEM Knowledge Concepts with Descriptions Focus Concept Description 1. Identify contact Type as Air 2. Identify contact Type as Surface Classifying a target as a Submarine 4. Identify contact Class as Civilian Classifying target as a Civilian craft 5. Identify contact Class as Military Classifying target as a Military craft 6. Identify contact Intent as Peaceful Classifying target as a Peaceful craft 7. Identify contact Intent as Hostile Classifying target as a Hostile craft 8. Make decision to Clear contact Making final engagement decision to Clear the target 9. Make decision to Warn contact Making final engagement decision to Warn the target 10. Make decision to Mark contact Making final engagement decision to Mark the target 11. Gain/lose points Objective indicator of task performance 12. Zoom out/zoom in Changing radar display resolution 13. Monitor inner perimeter Tracking potential boundary intrusions around the smaller visible defensive perimeter 14. Monitor outer perimeter Tracking potential boundary intrusions around the larger invisible defensive perimeter 15. Find/engage pop-up targets Identification/prosecution of new targets 16. Prioritize targets (engage targets likely to cross perimeter first) Procedural/ Strategic Classifying a target as a Surface vessel 3. Identify contact Type as Submarine Decisionmaking Classifying a target as an Aircraft Identification/prosecution of high priority targets likely to cross a defensive perimeter concepts and asked to indicate how related they were to one another using a 9-point scale (1— not at all related to 9—highly related). A proximity rating was provided for every unique pairwise combination of concepts, resulting in 16*(16-1)/2 = 120 ratings per knowledge 79 6 structure . To ensure that participants did not see the same ordering of concept pairs across days, concept pairs were presented twelve at a time in random order on each survey page and the order in which survey pages was presented was randomized. The knowledge concepts rated by participants were adapted from those employed in previous research examining knowledge structures using the TANDEM task environment (Kozlowski, Gully, et al., 2001; Kraiger et al., 1995). Unlike previous investigations, however, an explicit focus of the present study was directed towards investigating differences in the manner by which individuals learned and formed mental representations of concepts/information relevant to decision-making in the task, as opposed to the acquisition of broader operational gameplay procedures (e.g., hooking contacts, gathering information, monitoring feedback, etc.). Consequently, the concepts shown in Table 7 were purposefully constructed to represent two distinct foci; those that dealt with outcomes related to decision-making in the task and those that concerned more procedural/strategic aspects. An advantage of this stimulus set is that it permits a number of options for visually and statistically interpreting the structural relations reported by individuals. At the coarsest level, the extent to which distinctive decision-making and procedural/strategic concept clusters emerge in individual’s knowledge structures provides insight into participants’ ability to distinguish these two aspects of task performance. However, more subtle distinctions can also be examined by considering the specific pattern of relations that emerge among the decision-making concepts. As can be inferred from Table 3 presented earlier, similarity among decision-making concepts could be based on the degree to which concepts correspond to the same subdecision/decision outcome (e.g., Concepts 1-3 in Table 7 all refer to making the Type subdecision, while Concepts 4 and 5 correspond with the Class subdecision). A knowledge structure demonstrating this pattern of 80 clustering would be indicative of learning based on feature in which decision-relevant information is related on the basis of decision class. Alternatively, a relational pattern among decision-making concepts based on the extent to which a given subdecision outcome (e.g., Air, Civilian, etc.) is indicative of a particular final engagement decision outcome (e.g., Clear, Warn, Mark) would be indicative of learning based on functional similarity. To better clarify this structural pattern, consider the distribution of each Type, Class, and Intent outcome across the three possible final engagement decision outcomes (e.g., Table 4). Based on this distribution, the probability with which any given subdecision outcome is associated with a particular final engagement outcome can be calculated (Table 8). Functionally, these probabilities indicate the likelihood that one would Clear, Warn, or Mark a target given that the target has a particular Type, Class, or Intent. For example, if a target is identified as Civilian, there is a 67% chance that the correct final engagement decision for that target is Warn, regardless of its Type or Intent; similarly if a target is identified as an Aircraft, there is only a 25% chance the correct final engagement decision would be Clear. The relative values of these probabilistic relations therefore represent the degree to which a subdecision outcome is functionally informative of the correct final engagement decision outcome. Consequently, participants who come to learn the rules of engagement effectively might generate knowledge structures which demonstrate a more functional organization of decision-critical information such that each Type, Class, and Intent outcome is more strongly associated with (i.e., seen as more similar to) its most probable final engagement decision as opposed to concepts with which it shares similar features. In addition to these qualitative descriptors, the Pathfinder algorithm and software (Interlink, 2012; http://www.interlinkinc.net/) was used to compute the four quantitative metrics 81 Table 8 Relative Probabilities Shared between Subdecision Outcomes and Final Engagement Decision Outcomes Final Engagement Outcomes Class Intent Clear Warn Mark .25 .5 .25 Surface .5 .25 .25 Sub Type Subdecision Outcomes Air Subdecision .25 .25 .5 Civilian .17 .67 .17 Military .5 0 .5 Peaceful .67 .33 0 Hostile 0 .33 .67 Note. Values in a single row should add to 1 (within rounding error). The value within each cell can be interpreted as the probability with which a single subdecision outcome is associated with a final engagement decision outcome based on the rules of engagement established for participants (Table 4). of network structural quality noted earlier; two indices (similarity and correlation) provide information about the relatedness among different network structures, while the remaining two indices (coherence and number of links) provide descriptive information about a single network’s composition. With respect to the relatedness metrics, similarity is a measure of the correspondence in links between networks; it is formally computed as the number of links held in common between any two networks divided by the total number of unique links in the structures (Goldsmith & Davenport, 1990). Alternatively, the correlation index measures the degree to which the concept-pair ratings in two networks/proximity matrices covary (Schuelke et al., 2009). These operationalizations have led some researchers to characterize similarity as a measure of agreement and correlation as a measure of consistency between two networks (e.g., Webber et al., 2000). Both structural similarity and correlation range from 0 to 1, with higher 82 values indicating that the two contrasted structures share more links in common (similarity) or similar strengths of relations among concepts (correlation). With respect to the descriptive indices, measures of structural coherence are based on the assumption that relatedness between a pair of concepts can be predicted by the relations of those concepts to other concepts in the network (Interlink, 2012). Specifically, coherence is calculated by first correlating, for each pair of concepts, the proximities between those concepts and all others in the network/proximity matrix. This “indirect” measure of relatedness is then correlated with the original proximity data to produce the coherence measure. Coherence values range from 0 to 1, with higher values indicating that the raw proximity/relatedness data is consistent with the indirect relatedness data inferred from the proximity matrix. The final measure, number of links in a network, is simply a count of the number of links retained in knowledge structure following application of the Pathfinder network algorithm. A meaningful referent network was required in order to compute both the similarity and correlation indices described above. As described in Hypotheses 1, 2, 6, and 7, the referent structures of interest in the present study were those of males and top scoring performers on the TANDEM task. To compose these referent networks, the proximity ratings provided by males at each day were averaged together to form three separate proximity matrices and, subsequently, three separate networks representing the average male knowledge structure at the end of each day. A similar process was followed using the proximity ratings provided by the top 15 highest7 scoring participants across all three performance trials . The observed similarity and correlation indices at Day 1 were then computed by comparing the Day 1 knowledge structure of each participant to the averaged Day 1 knowledge structures of males and top performers separately; likewise, the observed indices for Days 2 and 3 were computed by comparing participants’ Day 2 83 structure to the averaged males/top performers’ Day 2 structures and participants’ Day 3 structure to the averaged males/top performers’ Day 3 structures, respectively. Consequently, the relatedness metrics computed in the present study reflect the extent to which participant’s knowledge structures were more or less similar to/correlated with those of males and top performers at the same point in time. Lastly, network structure metrics based on traditional graph theory analytic techniques were also computed for exploratory analyses (see Watts, 1999). These measures included the average shortest path lengths between all pairs of nodes (L), network diameter (D; the single longest path length across all nodes), and the clustering coefficient (C; the probability that two neighbors of a randomly chosen node will themselves be neighbors, indicative of a network’s “clumpiness,” Watts & Strogatz, 1998). Two additional indices were also computed in an attempt to quantify the extent to which structures exhibited feature similarity and functional similarity as described previously. For the former, the shortest path lengths between all concepts within a given subdecision were first computed (e.g., L computed separately for only Type concepts, only Class concepts, and only Intent concepts) and then averaged together to provide a measure of the average shortest path lengths among feature concepts (Lfeature). For the latter, the shortest path lengths between each subdecision outcome and its most strongly associated final engagement decision were computed (e.g., L between Civilian and Warn, L between Peaceful and Clear, etc.) and then averaged together to provide a measure of the average shortest path lengths among function concepts (Lfunction). All graph theoretic computations were performed in MATLAB version 7.14 (MathWorks, 2012) using the MatlabBGL library (http://dgleich.github.com/matlab-bgl/). 84 Strategic learning behaviors. Task strategy was operationalized using two sets of variables. First, the amount of time participants spent looking at the manual pages containing basic gameplay, cue value, and task strategy information was recorded for every trial. The amount of time spent reading materials on each of these task manual sections provides an indication of how participants structured their learning efforts during information acquisition phases and thus the approach they took to learning the task space. In general, individuals should be expected to spend less time on the basic gameplay manual pages and more time on the cue value and task strategy pages as more experience is gained with the task across trials. In addition to information acquisition, the manner by which individuals interacted with the radar interface and performed certain operations in TANDEM was examined as an indicator of task comprehension/strategic learning. As participants gain experience and learn how various task mechanisms interact with the rules of engagement (i.e., how points are gained/lost) to dictate performance, they should begin to formulate implicit heuristics which guide their selection of targets and gameplay behaviors in TANDEM. The online task manual presented to participants contains information which describes the most effective strategy for prioritizing targets in order to maximize task performance; this strategy is depicted in graphical form in Figure 6. In short, this strategic approach suggests that individuals start by first locating—but not engaging—the marker targets which outline the invisible outer perimeter. Once this critical boundary is identified, individuals can begin to prioritize target engagement by attempting to clear those targets most likely to breach this outer perimeter or the visible inner perimeter. While working to clear these targets, individuals should continually be on the lookout for pop-up and other high priority targets which threaten to cross either perimeter. Consequently, a number of behavioral indicators were measured which reflected participants adherence to these advanced 85 Figure 6. Cognitive strategy heuristic for TANDEM performance task strategies, including the number of marker targets engaged (fewer engaged is indicative of better strategic task performance), number of times an individual zoomed the radar screen in/out (more is indicative of better strategic task performance), and the number of high priority targets 86 processed (i.e., pop-up targets and targets which would cross a defensive perimeter if not engaged; more engaged is indicative of better strategic task performance). Decision-making strategy. At its core, TANDEM is a multiple-cue decision-making task. Participants are provided with a number of different pieces of information (e.g., the identifying cues and cue values shown in Table 3) that must be interpreted and integrated in order to make a series of decisions about the identity of and appropriate course of action against a target (cf., Table 4). Consequently, each cue viewed by a participant contributes some unique informational value to those decisions; sampled across multiple targets and combinations of cue values, these decision “weights” can be reconstructed empirically and used to draw inferences about the manner by which individuals combine information in order to prosecute targets in TANDEM. Furthermore, given that any specific cue value is directly associated with a known and veridical decision outcome (e.g., a Speed value of 115 knots is indicative of a specified Type outcome, etc.), these decision weights can be interpreted as the extent to which participants have correctly learned to distinguish, evaluate, and integrate task-relevant information in order to make accurate task decisions. Two pieces of data were extracted from the TANDEM game files in order to evaluate participants’ decision-making heuristics. First, the content and associated meaning of the informational cues examined by participants for each processed target was collected; thus, data on which cues were viewed and what information those cues conveyed were gathered for every target processed by each participant. Lastly, the specific outcomes selected for each of the Type, Class, Intent, and Final Engagement decisions for every target processed were recorded. Task performance. Performance scores in TANDEM are a combination of both effective procedural decision-making and strategic target selection. Consequently, task 87 performance was broken into these unique components and analyzed separately to provide a more accurate picture of how participants were performing on the task. The number of targets engaged correctly and incorrectly and the number of targets that crossed the inner and outer defensive perimeters were gathered for each trial in the game; additionally, performance scores based on the algorithms denoted previously were computed as an overall indicator of task effectiveness. 88 RESULTS Descriptive Statistics and Data Cleaning Means, standard deviations, and interrcorrelations for all study variables are presented in Table 9. Prior to performing all analyses, the integrity of participants’ OSPAN and knowledge structure data was evaluated to improve the quality of the dataset. As described previously, participants’ performance on the mathematics portion of the OSPAN was assessed to screen participants who may not have been adequately attending to the processing component of the task and/or were using that time to rehearse the letters rather than solve the math problems (Turner & Engle, 1989; Unsworth et al., 2005). Seven and five participants failed to reach the 85% accuracy and speed error criteria, respectively; additionally, a computer recording error resulted in complete loss of OSPAN data for one participant. Consequently, OSPAN scores for these 13 participants (6.6% of the total sample) were not included in the dataset. The number of links produced in participants’ knowledge structures was also examined for each day to identify individuals who were likely not attending to the proximity rating task seriously. Specifically, any network containing 120 links was not included in subsequent knowledge structure analyses—a network with 120 links reflects a structure in which each concept is linked to all other concepts and, by extension, a participant who provided the same numeric rating for all 120 pairwise concept comparisons during the rating task. This procedure resulted in the removal of seven participants (4.4% of the total 3-day sample), six of whom were women (four from the stereotype threat condition and two from the control condition). A computer error resulted in the loss of knowledge structure data at Day 2 for 11 participants in the stereotype threat condition (8 females, 3 males), though data from Days 1 and 3 for these participants were used in subsequent analyses. 89 Table 9 Means, Standard Deviations and Interrcorrelations for Study Variables Variable M SD 1 2 a .27 .44 — 1. Sex b 2. Condition 3. ACT 4. OSPAN 5. Video game experience 6. Math domain identification 7. Perceived stereotype threat 8. Metacognitive activity (T1) 9. Metacognitive activity (T2) 10. Metacognitive activity (T3) 11. Knowledge test (T1) 12. Knowledge test (T2) 13. Knowledge test (T3) 14. Total Points (P1) 15. Total Points (P2) 16. Total Points (P3) 17. Number targets correct (P1) 18. Number targets correct (P2) 19. Number targets correct (P3) 20. Number targets incorrect (P1) 21. Number targets incorrect (P2) 22. Number targets incorrect (P3) 23. Total perimeter intrusions (P1) 24. Total perimeter intrusions (P2) 25. Total perimeter intrusions (P3) 26. Avg basic manual time (L1) 27. Avg basic manual time (L2) 28. Avg basic manual time (L3) .49 23.55 56.59 2.47 2.99 2.92 3.82 3.87 3.86 .70 .78 .76 -2186 -1411 -780 6.56 9.93 13.00 10.93 8.05 6.35 17.48 15.99 14.45 21.86 18.62 21.43 .50 3.63 12.54 1.32 .87 .61 .57 .68 .79 .21 .18 .20 835 1283 1468 3.99 6.04 7.03 5.03 6.20 6.07 2.46 3.26 4.20 18.77 15.79 17.97 .02 .07 .19 .42 .21 -.46 .19 .13 .11 .05 .12 .11 .28 .24 .26 .21 .19 .22 -.24 -.22 -.23 -.11 -.16 -.21 -.09 -.06 -.12 — .14 -.08 .03 -.12 .27 -.12 -.20 -.24 .03 -.17 -.22 -.02 -.03 -.13 .03 -.04 -.12 .01 .06 .19 .10 -.05 -.03 -.12 .10 -.04 90 3 4 5 6 7 8 9 — .39 .11 .21 -.04 .28 .12 .12 .31 .19 .21 .43 .48 .49 .42 .46 .47 -.28 -.37 -.35 -.21 -.32 -.45 -.34 -.16 -.06 — .16 .15 -.12 .23 .11 .10 .25 .23 .22 .18 .32 .34 .22 .31 .30 -.09 -.28 -.34 -.07 -.13 -.22 -.20 .02 -.03 — .18 -.14 .27 .29 .20 .14 .10 .16 .25 .24 .31 .26 .22 .31 -.16 -.17 -.24 -.10 -.21 -.23 -.17 .00 .00 — -.03 .17 .18 .13 .13 .11 .21 .08 .25 .25 .07 .21 .22 -.08 -.23 -.22 -.01 -.14 -.19 .04 -.14 -.02 — -.05 .03 .00 -.02 -.01 .03 -.18 -.09 -.12 -.10 -.08 -.10 .11 .05 .12 .19 .09 .07 -.06 .00 .09 — .68 .60 .33 .27 .33 .32 .27 .31 .40 .25 .29 -.11 -.21 -.25 -.19 -.22 -.23 -.35 -.04 -.01 — .83 .25 .34 .40 .26 .31 .31 .28 .29 .30 -.14 -.25 -.27 -.16 -.21 -.22 -.31 -.22 -.03 Table 9 (cont’d) Variable 29. Avg cue manual time (L1) 30. Avg cue manual time (L2) 31. Avg cue manual time (L3) 32. Avg strategy manual time (L1) 33. Avg strategy manual time (L2) 34. Avg strategy manual time (L3) 35. Coherence (T1) 36. Coherence (T2) 37. Coherence (T3) 38. Number of links (T1) 39. Number of links (T2) 40. Number of links (T3) c 41. Correlation (T1) M 84.42 81.15 74.62 41.33 51.03 47.38 .29 .34 .32 29.38 31.66 35.05 .38 SD 14.45 23.01 34.55 20.24 24.32 29.73 .27 .31 .38 13.46 17.47 19.61 .24 1 -.02 -.05 -.03 -.04 -.02 -.18 .00 -.04 -.17 .01 -.01 -.03 .02 2 .04 .01 -.31 .04 -.06 -.08 -.02 -.08 -.02 .03 .05 -.09 -.02 3 -.19 -.24 -.28 .24 .02 .01 .33 .23 .17 -.03 .07 .11 .51 4 .01 -.18 -.13 .11 .07 -.13 .13 .21 .10 -.01 .12 .16 .23 5 .06 -.12 -.06 .02 .03 -.06 .17 .13 .04 -.01 -.06 -.11 .22 6 -.01 -.03 -.03 .08 .11 -.06 .09 .20 .05 .02 .01 .05 .16 7 .00 .10 -.04 .02 .02 .11 -.06 -.02 .10 -.01 -.09 -.09 -.03 8 .12 -.10 -.12 .23 .05 -.04 .24 .21 .20 -.05 .09 .13 .27 9 .06 .01 -.08 .11 .02 -.04 .15 .17 .20 .01 .00 .10 .20 42. Correlation (T2) c .45 .27 .04 -.09 .44 .24 .19 .26 -.05 c .21 .23 43. Correlation (T3) .46 .29 .00 -.05 .38 .25 .16 .17 .08 .20 .18 44. Similarity (T1) c .16 .07 .10 -.01 .39 .34 .17 .09 -.02 c .25 .15 .20 .10 .16 -.09 .26 .20 .27 .19 -.04 c .18 .20 .22 .15 .17 .19 4.77 4.50 4.16 2.32 2.27 2.12 .14 .09 .10 .12 1.37 1.54 1.62 .46 .53 .51 .21 .05 .05 .04 -.05 -.16 -.12 -.03 -.11 -.05 -.01 -.03 .01 .00 -.03 .02 -.02 .01 .03 .03 .26 .09 .23 .33 .01 -.02 -.19 .05 -.02 -.13 .20 .09 .26 .18 .04 -.12 -.16 .05 -.12 -.17 .36 .06 .14 .03 -.01 -.15 -.02 -.03 -.11 .03 .17 .10 .10 .10 .01 -.05 -.09 -.01 .00 -.01 -.01 -.06 -.06 -.10 -.08 .11 .05 -.08 .11 .07 .25 .14 .28 .24 -.11 -.17 -.16 -.08 -.17 -.18 .21 .17 .17 .15 -.12 -.05 -.07 -.11 -.02 -.08 45. Similarity (T2) 46. Similarity (T3) 47. Clustering coefficient (T1) 48. Clustering coefficient (T2) 49. Clustering coefficient (T3) 50. Diameter (T1) 51. Diameter (T2) 52. Diameter (T3) 53. Avg. path length (T1) 54. Avg. path length (T2) 55. Avg. path length (T3) 91 Table 9 (cont’d) Variable M SD 56. Avg. feature path length (T1) 2.14 .78 57. Avg. feature path length (T2) 2.07 .75 58. Avg. feature path length (T3) 1.95 .63 59. Avg. function path length (T1) 2.09 .48 60. Avg. function path length (T2) 2.01 .64 61. Avg. function path length (T3) 1.89 .60 Correlations in bold are significant at p < .05 a Dummy-coded variable (Female = 0, Male = 1) 1 .05 -.03 .09 .03 -.05 .11 2 -.06 .06 .08 .08 .17 .02 3 -.09 -.03 -.13 -.07 -.04 -.07 4 -.07 -.02 -.08 -.10 -.16 -.10 5 .04 -.07 .04 .03 -.11 .07 6 -.03 -.05 .01 .02 -.08 .12 7 .04 .05 .05 .03 .13 -.06 8 -.06 -.08 -.07 -.10 -.13 -.15 9 .00 -.07 -.10 -.08 -.08 -.11 b c Dummy-coded variable (Control = 0, Stereotype threat = 1) Referent knowledge structure for computations was the averaged knowledge structure of the top 15 highest-scoring performers at the same point in time Note. The code in parentheses following each variable name indicates when measurement was taken. Specifically, T1-T3 refer to the end of Days 1-3; P1-P3 refer to performance trials on Days 1-3; and L1-L3 refer to learning trials on Days 1-3. 92 Table 9 (cont’d) Variable 1. Sex 2. Condition 3. ACT 4. OSPAN 5. Video game experience 6. Math domain identification 7. Perceived stereotype threat 8. Metacognitive activity (T1) 9. Metacognitive activity (T2) 10. Metacognitive activity (T3) 11. Knowledge test (T1) 12. Knowledge test (T2) 13. Knowledge test (T3) 14. Total Points (P1) 15. Total Points (P2) 16. Total Points (P3) 17. Number targets correct (P1) 18. Number targets correct (P2) 19. Number targets correct (P3) 20. Number targets incorrect (P1) 21. Number targets incorrect (P2) 22. Number targets incorrect (P3) 23. Total perimeter intrusions (P1) 24. Total perimeter intrusions (P2) 25. Total perimeter intrusions (P3) 26. Avg basic manual time (L1) 27. Avg basic manual time (L2) 28. Avg basic manual time (L3) 10 11 12 13 14 15 16 17 18 19 20 — .19 .23 .38 .19 .21 .30 .20 .18 .28 -.08 -.17 -.29 -.14 -.16 -.16 -.27 -.21 -.01 — .47 .46 .45 .51 .47 .50 .50 .47 -.30 -.45 -.38 -.11 -.25 -.30 -.22 -.29 .03 — .64 .38 .58 .62 .33 .57 .58 -.29 -.55 -.57 -.17 -.19 -.37 -.24 -.37 -.05 — .40 .49 .59 .35 .44 .56 -.34 -.50 -.55 -.13 -.19 -.31 -.11 -.30 .01 — .67 .60 .87 .66 .58 -.83 -.57 -.49 -.26 -.36 -.43 -.23 -.30 -.07 — .82 .59 .95 .78 -.55 -.88 -.69 -.21 -.51 -.56 -.22 -.36 .07 — .53 .79 .94 -.47 -.71 -.87 -.24 -.43 -.65 -.25 -.35 -.06 — .62 .55 -.59 -.43 -.38 -.16 -.37 -.37 -.29 -.25 -.04 — .82 -.49 -.78 -.65 -.22 -.40 -.46 -.22 -.29 .07 — -.43 -.63 -.78 -.21 -.38 -.50 -.23 -.32 -.06 — .56 .44 -.17 .18 .28 .02 .21 .08 93 Table 9 (cont’d) Variable 29. Avg cue manual time (L1) 30. Avg cue manual time (L2) 31. Avg cue manual time (L3) 32. Avg strategy manual time (L1) 33. Avg strategy manual time (L2) 34. Avg strategy manual time (L3) 35. Coherence (T1) 36. Coherence (T2) 37. Coherence (T3) 38. Number of links (T1) 39. Number of links (T2) 40. Number of links (T3) 41. Correlation (T1) 42. Correlation (T2) 43. Correlation (T3) 44. Similarity (T1) 45. Similarity (T2) 46. Similarity (T3) 47. Clustering coefficient (T1) 48. Clustering coefficient (T2) 49. Clustering coefficient (T3) 50. Diameter (T1) 51. Diameter (T2) 52. Diameter (T3) 53. Avg. path length (T1) 54. Avg. path length (T2) 55. Avg. path length (T3) 10 .07 -.02 -.05 .11 .06 -.07 .14 .11 .14 .02 -.03 .14 .16 .12 .11 .12 .06 .13 .18 .07 .14 -.16 .05 -.06 -.16 .08 -.10 11 .15 -.23 -.28 .05 .15 .02 .11 .17 .24 .03 .04 .12 .31 .28 .33 .33 .22 .22 .14 .11 .20 -.12 -.03 -.09 -.08 -.02 -.07 12 .04 -.08 -.23 .12 .07 .12 .18 .30 .27 .07 .04 .03 .26 .33 .46 .25 .29 .35 .16 .15 .13 -.09 -.03 -.08 -.07 .00 -.01 13 -.06 -.14 -.18 .20 .19 -.04 .29 .17 .30 .02 .01 .02 .31 .32 .41 .30 .29 .31 .27 .15 .15 -.04 .00 -.04 -.03 .04 .00 94 14 .08 -.33 -.24 .01 .24 -.01 .15 .20 .12 .06 .07 .07 .29 .30 .32 .29 .25 .33 .15 .18 .17 -.13 -.22 -.15 -.06 -.18 -.12 15 -.04 -.21 -.38 .16 .13 -.04 .30 .34 .29 .05 .02 .00 .43 .41 .47 .40 .35 .41 .24 .23 .22 -.12 -.15 -.10 -.06 -.08 -.02 16 -.06 -.20 -.32 .15 .05 -.03 .34 .31 .29 .06 -.01 .05 .47 .49 .48 .39 .45 .42 .25 .22 .23 -.16 -.09 -.06 -.11 -.01 -.01 17 .21 -.31 -.20 -.03 .26 -.01 .12 .19 .16 .05 .12 .09 .29 .28 .30 .30 .25 .28 .15 .21 .16 -.15 -.25 -.14 -.09 -.21 -.12 18 .03 -.18 -.35 .12 .09 .00 .29 .35 .32 .03 .04 .01 .44 .42 .49 .38 .33 .40 .17 .22 .23 -.11 -.13 -.05 -.04 -.06 .02 19 -.02 -.18 -.31 .15 .00 -.03 .35 .32 .34 .00 -.02 .03 .50 .50 .49 .39 .43 .41 .20 .20 .20 -.10 -.08 .03 -.05 .00 .06 20 .07 .31 .15 .00 -.21 -.12 -.10 -.17 -.04 -.01 .02 .07 -.15 -.21 -.19 -.17 -.21 -.28 -.05 -.08 -.02 .02 .14 .07 -.03 .09 .01 Table 9 (cont’d) Variable 56. Avg. feature path length (T1) 57. Avg. feature path length (T2) 58. Avg. feature path length (T3) 59. Avg. function path length (T1) 60. Avg. function path length (T2) 61. Avg. function path length (T3) 10 -.04 -.04 -.06 -.12 -.03 -.09 11 -.05 -.05 -.12 -.23 -.10 -.02 12 -.13 -.11 -.13 -.19 -.15 .02 13 -.09 -.11 -.12 -.19 -.16 .01 95 14 -.03 -.12 -.08 -.15 -.16 -.03 15 -.06 -.17 -.13 -.16 -.15 .01 16 -.08 -.13 -.09 -.21 -.15 -.03 17 -.06 -.12 -.12 -.17 -.17 -.04 18 -.05 -.16 -.16 -.16 -.15 -.01 19 -.07 -.16 -.09 -.17 -.13 -.01 20 -.02 .07 -.04 .05 .07 -.01 Table 9 (cont’d) Variable 1. Sex 2. Condition 3. ACT 4. OSPAN 5. Video game experience 6. Math domain identification 7. Perceived stereotype threat 8. Metacognitive activity (T1) 9. Metacognitive activity (T2) 10. Metacognitive activity (T3) 11. Knowledge test (T1) 12. Knowledge test (T2) 13. Knowledge test (T3) 14. Total Points (P1) 15. Total Points (P2) 16. Total Points (P3) 17. Number targets correct (P1) 18. Number targets correct (P2) 19. Number targets correct (P3) 20. Number targets incorrect (P1) 21. Number targets incorrect (P2) 22. Number targets incorrect (P3) 23. Total perimeter intrusions (P1) 24. Total perimeter intrusions (P2) 25. Total perimeter intrusions (P3) 26. Avg basic manual time (L1) 27. Avg basic manual time (L2) 28. Avg basic manual time (L3) 21 22 23 24 25 26 27 28 — .72 .09 .13 .40 .12 .38 -.10 — .16 .17 .31 .13 .36 .05 — .26 .26 .27 .16 .00 — .62 .22 .13 .03 — .33 .15 .05 — .04 .07 — .16 — 96 29 30 31 Table 9 (cont’d) Variable 29. Avg cue manual time (L1) 30. Avg cue manual time (L2) 31. Avg cue manual time (L3) 32. Avg strategy manual time (L1) 33. Avg strategy manual time (L2) 34. Avg strategy manual time (L3) 35. Coherence (T1) 36. Coherence (T2) 37. Coherence (T3) 38. Number of links (T1) 39. Number of links (T2) 40. Number of links (T3) 41. Correlation (T1) 42. Correlation (T2) 43. Correlation (T3) 44. Similarity (T1) 45. Similarity (T2) 46. Similarity (T3) 47. Clustering coefficient (T1) 48. Clustering coefficient (T2) 49. Clustering coefficient (T3) 50. Diameter (T1) 51. Diameter (T2) 52. Diameter (T3) 53. Avg. path length (T1) 54. Avg. path length (T2) 55. Avg. path length (T3) 21 .05 .18 .36 -.11 -.13 -.02 -.24 -.32 -.27 -.07 .00 .02 -.33 -.35 -.40 -.33 -.28 -.34 -.23 -.18 -.18 .11 .07 .09 .06 .02 .01 22 -.02 .15 .18 -.04 -.03 .00 -.22 -.25 -.23 -.07 .02 -.05 -.33 -.38 -.38 -.28 -.34 -.33 -.21 -.17 -.18 .13 .05 .08 .12 -.02 .04 23 -.07 -.03 .18 -.08 .03 .23 -.13 -.04 -.06 -.11 -.08 -.22 -.20 -.16 -.22 -.13 -.03 -.08 -.16 -.11 -.25 .14 .07 .14 .13 .08 .18 24 .12 .14 .18 -.20 -.11 .17 -.18 -.08 -.05 -.02 -.01 -.03 -.24 -.17 -.20 -.23 -.23 -.23 -.19 -.16 -.13 .07 .18 .13 .02 .14 .09 97 25 .20 .19 .33 -.22 -.11 .06 -.28 -.19 -.12 -.09 -.02 -.06 -.35 -.33 -.31 -.31 -.34 -.29 -.23 -.19 -.20 .18 .10 .13 .13 .06 .07 26 -.16 .16 .13 -.10 .06 -.08 -.11 -.09 -.06 -.08 -.02 -.02 -.19 -.18 -.19 -.15 -.18 -.15 -.15 -.12 -.12 .09 .06 .04 .07 .03 .03 27 -.06 -.13 .12 .02 -.05 -.06 .04 -.14 -.15 .04 .03 -.09 -.13 -.16 -.23 -.16 -.09 -.08 -.05 -.05 -.12 -.08 -.03 .12 -.07 -.07 .07 28 .10 -.05 -.25 -.09 .04 .15 -.15 -.12 -.10 .21 .13 -.04 -.13 -.11 -.03 -.04 -.01 .16 .07 .07 -.01 -.24 -.18 -.05 -.28 -.17 -.02 29 — .14 .02 -.38 .06 -.10 -.13 .04 .06 .08 .11 .05 -.08 -.03 -.03 -.08 -.05 -.05 .02 .12 .02 -.13 -.11 .05 -.09 -.10 -.03 30 31 — .29 -.04 -.32 -.13 -.15 -.10 .11 .09 .13 .16 -.18 -.16 -.16 -.21 -.21 -.33 -.01 -.01 .02 .03 .14 .09 .00 .09 .04 — -.21 -.03 .05 -.14 -.07 -.07 .07 .17 .11 -.22 -.24 -.24 -.28 -.26 -.33 -.03 .04 -.08 .12 -.08 .15 .05 -.12 .06 Table 9 (cont’d) Variable 56. Avg. feature path length (T1) 57. Avg. feature path length (T2) 58. Avg. feature path length (T3) 59. Avg. function path length (T1) 60. Avg. function path length (T2) 61. Avg. function path length (T3) 21 .11 .14 .07 .14 .10 -.01 22 .05 .06 .04 .25 .15 .05 23 .05 .07 .15 .12 .14 .07 24 -.04 .09 .08 .05 .11 -.06 98 25 .10 .11 .13 .08 .08 .02 26 .10 .00 .02 .09 .04 .04 27 -.15 -.05 .04 -.03 .03 -.01 28 -.13 -.12 -.11 -.13 -.06 .05 29 -.06 -.08 .05 -.06 -.11 .00 30 .06 .04 -.03 .04 .05 -.03 31 .07 -.10 -.05 .07 -.12 .04 Table 9 (cont’d) Variable 29. Avg cue manual time (L1) 30. Avg cue manual time (L2) 31. Avg cue manual time (L3) 32. Avg strategy manual time (L1) 33. Avg strategy manual time (L2) 34. Avg strategy manual time (L3) 35. Coherence (T1) 36. Coherence (T2) 37. Coherence (T3) 38. Number of links (T1) 39. Number of links (T2) 40. Number of links (T3) 41. Correlation (T1) 42. Correlation (T2) 43. Correlation (T3) 44. Similarity (T1) 45. Similarity (T2) 46. Similarity (T3) 47. Clustering coefficient (T1) 48. Clustering coefficient (T2) 49. Clustering coefficient (T3) 50. Diameter (T1) 51. Diameter (T2) 52. Diameter (T3) 53. Avg. path length (T1) 54. Avg. path length (T2) 55. Avg. path length (T3) 32 33 34 35 36 37 38 39 40 41 42 — .08 .01 .28 .16 .14 -.10 -.11 -.13 .27 .22 .16 .20 .12 .14 .09 .01 -.03 .06 .09 .04 .06 .10 .09 — .34 -.06 -.05 -.20 .00 -.11 -.11 .02 -.03 .00 .11 .00 .14 .03 -.08 -.17 -.05 -.11 -.07 -.06 -.05 .01 — .08 .18 .02 .03 -.15 -.11 .06 .07 .03 .08 .01 .04 .01 -.06 -.03 .00 .12 .09 -.02 .19 .13 — .59 .54 -.08 .12 .15 .72 .56 .50 .43 .27 .23 .21 .35 .38 .16 -.05 .00 .21 -.03 -.01 — .75 .04 .17 .21 .59 .70 .63 .40 .25 .24 .16 .39 .38 .16 -.04 .04 .20 .01 .03 — .02 .17 .22 .50 .56 .58 .33 .21 .13 .18 .32 .39 .10 .08 .11 .14 .10 .08 — .28 .46 -.08 .01 .02 -.12 -.12 -.11 .72 .15 .29 -.58 .00 -.24 -.70 -.07 -.33 — .51 .06 .02 .06 -.03 -.20 -.18 .24 .74 .43 -.17 -.58 -.18 -.15 -.71 -.30 — .12 .07 .07 .04 -.26 -.29 .31 .35 .71 -.15 -.10 -.54 -.21 -.15 -.71 — .78 .75 .64 .44 .48 .18 .32 .37 .06 -.10 -.12 .10 -.04 -.07 — .89 .50 .57 .57 .21 .33 .38 .06 -.10 -.07 .10 -.02 -.01 99 Table 9 (cont’d) Variable 56. Avg. feature path length (T1) 57. Avg. feature path length (T2) 58. Avg. feature path length (T3) 59. Avg. function path length (T1) 60. Avg. function path length (T2) 61. Avg. function path length (T3) 32 -.09 .09 -.03 .02 .01 .04 33 .09 .11 -.01 -.05 -.01 .09 34 .06 .27 .20 -.01 -.05 -.11 35 -.17 -.30 -.27 .00 -.19 -.13 100 36 -.18 -.34 -.28 -.02 -.18 -.17 37 -.23 -.38 -.29 -.04 -.12 -.20 38 -.36 -.19 -.15 -.50 -.10 -.15 39 -.07 -.40 -.23 -.15 -.50 -.30 40 -.10 -.20 -.38 -.21 -.16 -.55 41 -.14 -.23 -.24 -.12 -.14 -.10 42 -.16 -.28 -.23 -.13 -.18 -.10 Table 9 (cont’d) Variable 29. Avg cue manual time (L1) 30. Avg cue manual time (L2) 31. Avg cue manual time (L3) 32. Avg strategy manual time (L1) 33. Avg strategy manual time (L2) 34. Avg strategy manual time (L3) 35. Coherence (T1) 36. Coherence (T2) 37. Coherence (T3) 38. Number of links (T1) 39. Number of links (T2) 40. Number of links (T3) 41. Correlation (T1) 42. Correlation (T2) 43. Correlation (T3) 44. Similarity (T1) 45. Similarity (T2) 46. Similarity (T3) 47. Clustering coefficient (T1) 48. Clustering coefficient (T2) 49. Clustering coefficient (T3) 50. Diameter (T1) 51. Diameter (T2) 52. Diameter (T3) 53. Avg. path length (T1) 54. Avg. path length (T2) 55. Avg. path length (T3) 43 44 45 46 47 48 49 50 51 52 53 — .49 .50 .63 .18 .34 .38 .05 -.08 -.13 .06 .00 -.06 — .27 .30 .20 .23 .28 .16 -.08 -.09 .18 -.04 -.05 — .77 -.12 .14 .09 -.03 -.20 -.14 .01 -.11 .02 — .09 .18 .11 -.10 -.26 -.24 -.05 -.14 -.06 — .37 .36 -.56 -.19 -.21 -.60 -.19 -.26 — .53 -.14 -.59 -.23 -.07 -.66 -.25 — -.09 -.20 -.53 -.11 -.22 -.60 — .22 .19 .92 .24 .21 — .33 .17 .94 .30 — .26 .32 .93 — .19 .29 101 Table 9 (cont’d) Variable 56. Avg. feature path length (T1) 57. Avg. feature path length (T2) 58. Avg. feature path length (T3) 59. Avg. function path length (T1) 60. Avg. function path length (T2) 61. Avg. function path length (T3) 43 -.14 -.31 -.24 -.10 -.13 -.16 44 -.05 -.16 -.20 -.18 -.18 -.07 45 .01 -.01 .00 -.09 -.24 -.02 46 -.02 -.10 -.03 -.03 -.12 -.04 102 47 -.47 -.33 -.18 -.49 -.19 -.08 48 -.07 -.47 -.29 -.13 -.51 -.27 49 -.21 -.38 -.50 -.15 -.22 -.53 50 .40 .04 -.01 .52 .03 .05 51 -.03 .31 .04 .12 .61 .16 52 -.06 -.03 .22 .14 .17 .56 53 .39 .02 -.05 .56 .03 .09 Table 9 (cont’d) Variable 29. Avg cue manual time (L1) 30. Avg cue manual time (L2) 31. Avg cue manual time (L3) 32. Avg strategy manual time (L1) 33. Avg strategy manual time (L2) 34. Avg strategy manual time (L3) 35. Coherence (T1) 36. Coherence (T2) 37. Coherence (T3) 38. Number of links (T1) 39. Number of links (T2) 40. Number of links (T3) 41. Correlation (T1) 42. Correlation (T2) 43. Correlation (T3) 44. Similarity (T1) 45. Similarity (T2) 46. Similarity (T3) 47. Clustering coefficient (T1) 48. Clustering coefficient (T2) 49. Clustering coefficient (T3) 50. Diameter (T1) 51. Diameter (T2) 52. Diameter (T3) 53. Avg. path length (T1) 54. Avg. path length (T2) 55. Avg. path length (T3) 54 55 — .32 56 57 — 103 58 59 60 61 Table 9 (cont’d) Variable 56. Avg. feature path length (T1) 57. Avg. feature path length (T2) 58. Avg. feature path length (T3) 59. Avg. function path length (T1) 60. Avg. function path length (T2) 61. Avg. function path length (T3) 54 -.04 .32 .03 .13 .64 .18 55 -.01 .01 .24 .19 .18 .63 56 — .37 .27 .32 -.09 -.02 57 58 59 60 61 — .54 .09 .23 .03 — .05 -.01 .20 — .23 .19 — .31 — 104 Manipulation Check As noted in the description of the experimental manipulation, participants in the stereotype threat condition were told that the purpose of the study was to examine sex differences in information acquisition skills and that such skills may be a reason for male-female performance discrepancies on mathematical assessments (Beilock et al., 2007; Rydell, Shiffrin et al., 2010). Two sets of items were included at the end of Day 3 to examine whether this manipulation led female participants to believe that mathematical ability was related to performance in TANDEM and/or whether perceptions of felt stereotype threat differed across study conditions. A one-sample t-test revealed that females’ overall response to the items asking about the relevance of mathematical ability to performance in TANDEM was significantly below the mid-point of the scale (M = 2.76, SD = .85; t(157) = 3.19, p < .01). Furthermore, responses to this check did not tend to differ between females in the control (M = 2.86, SD = .84) and experimental (M = 2.66, SD = .86) conditions (t(112) = 1.22, ns), suggesting that women generally did not believe that mathematical ability was important to the experimental task. Nevertheless, females in the stereotype threat condition (M = 3.29, SD = .50) did report significantly higher levels of perceived stereotype threat than females in the control condition (M = 2.89, SD = .51) (t(112) = 4.18, p < .001, d = .79). Thus, although females were not generally convinced that TANDEM tapped skills related to mathematical ability, the stereotype threat manipulation did appear to produce the desired reaction in the targeted group. Knowledge Structure Analyses Hypotheses 1 through 10 investigated the influence of stereotype threat on the development of knowledge structures; specifically, Hypotheses 1-5 examined the main effects of stereotype threat on the similarity, correlation, coherence, number of links, and clustering 105 patterns of female knowledge structures, while Hypotheses 6-10 examined changes in these indices across each day (i.e., interactive effect of condition and day on knowledge structure development). The structure of the data was such that multiple (three) knowledge structure observations were nested within subjects; consequently, multilevel random coefficient modeling (MRCM) was used to examine the predicted effects for each set of knowledge structure indices. MRCM offers a number of advantages for analyses involving longitudinal/nested data, such as the ability to produce growth estimates when data is missing without the need for imputation and the flexibility to allow different residual covariance patterns in the data to account for nonindependence among repeated measures (Bryk & Raudenbush, 1987). Unless otherwise noted, the following procedures were followed to test Hypotheses 1-10 (cf., Bliese & Ployhart, 2002). First, a model (Model 1) in which only the time variable was included was fit to the data: DVti = π0i + π1i(Dayti) + rti (1) π0i = β00 + u0i π1i = β10 + u1i , where DVti is the dependent variable of interest at time t for individual i; π0i is individual i’s overall mean on the DV across observations; π1i is the amount by which the DV changes for individual i at each time point t; rti is residual variance in the DV for individual i at time t; β00 is the overall mean of the sample on the DV; u0i estimates residual variance in individual i’s standing on the DV relative to the sample mean; β10 represents the average change in the DV 106 over time for the sample; and u1i estimates residual variance in individual i’s change in the DV over time relative to the sample. The results of Model 1 (specifically the estimates of rti, u0i, and u1i) can be used to provide an overall estimate of the proportion of variance attributable to 2 within-person (σ ) versus between-person (τ) sources through the calculation of the intraclass 2 correlation coefficient (ICC = τ / (τ + σ ); Bryk & Raudenbush, 1992). In the present model, between-person variance in both slopes (change in the DV over time, τ11) and intercepts (mean level of the DV controlling for slope variation, τ00) are computed, and thus the relative proportion of variance attributable to within-person and between-person sources can be examined for both of these parameters. Note that the categorical time variable (Day) was mean-centered prior to entry into the Level-1 MRCM equation; thus, the value of the β00 intercept reflects the overall sample mean for the DV estimated from the regression model collapsed over time. In most applications of MRCM to longitudinal analyses, one often codes the repeated measures variable such that the model intercept term(s) reflect the sample’s mean standing on the DV relative to the first observation (e.g., code Day 1 as 0, Day 2 as 1, Day 3 as 2, etc.). However, the significance test of the model intercept term when the time variable has been mean centered is equivalent to the statistical test of a between-subjects factor in a repeated measures ANOVA—which is desirable in the present study given that many of the hypotheses propose simple main effects/betweengroup differences in the DV of interest. Next, the main and interaction effects of the stereotype threat manipulation were modeled by adding the Condition variable to both Level-2 equations (Model 2): 107 DVti = π0i + π1i (Dayti) + rti (2) π0i = β00 + β01(Conditioni) + u0i π1i = β10 + β11(Conditioni) + u1i β01 represents the main effect of the experimental condition manipulation on the DV for individual i, while β11 models the interaction effect of the experimental manipulation on changes in the DV over time for individual i. The categorical Condition variable was dummy-coded such that the control condition (coded as 0) served as the reference variable for comparison with the stereotype threat condition (coded as 1). Consequently, the value and direction of β01 indicates the extent to which individuals in the stereotype threat condition differed from individuals in the control condition averaged across time, whereas the β11 coefficient reflects the average degree by which individuals in the stereotype threat condition changed over time relative to individuals in the control condition. An additional model was run if the β01 or β11 coefficients achieved significance in Model 2 by adding control variables to the appropriate Level-2 equation(s) to determine whether the effects of stereotype threat remained significant after accounting for relevant between-person predictors (Model 2A). Based on the zero-order correlations presented in Table 9 and the conceptual rationale outlined previously, cognitive ability (ACT scores), working memory (OSPAN scores), and video game experience were included as control variables. The interaction between condition and math domain identification was also considered for inclusion as a Level-2 control variable, as previous researchers have argued that the influence of stereotype threat on 108 performance outcomes is generally greatest for individuals who are highly identified with the domain of interest (Crocker et al., 1998; Steele & Aronson, 1995; Steele & Davies, 2003). However, because participants generally did not perceive mathematical ability as relevant to completing the objectives of TANDEM, the meaning/significance of this interaction in the present study is questionable. Nevertheless, the Model 2A MRCM equations were defined as: DVti = π0i + π1i (Dayti) + rti (3) π0i = β00 + β01(Conditioni) + β02(ACTi) + β03(OSPANi) + β04(Gamesi) + β05(MathIDi*Conditioni) + u0i π1i = β10 + β11(Conditioni) + β12(ACTi) + β13(OSPANi) + β14(Gamesi) + β15(MathIDi*Conditioni) + u1i Note that because all study hypotheses only concerned differences in the knowledge structures of female participants (as they were the intended target of the stereotype threat manipulation), the above analyses only include data from female participants. As such, the continuous control variables for Model 2A were mean-centered based on the female sample prior to entry into the MRCM equations to facilitate interpretation of their regression coefficients. In sum, the analytic approach for testing the effects of stereotype threat on knowledge structure formation was as follows:  Model 1: Fit a model including only the time variable at Level-1 to examine the proportion of variance in slopes and intercepts attributable to between- and within-person sources.  Model 2: Add the condition effect to both Level-2 equations; evaluate the direction and significance of the β01 coefficient to examine the main effect of stereotype threat on the 109 outcome of interest (e.g., DV differed between individuals in the stereotype threat versus control conditions on average) and the direction and significance of the β11 coefficient to determine whether changes in the DV over time differed between conditions.  Model 2A: If either β01 or β11 are significant, add control variables to the Level-2 model to determine whether the influence of stereotype threat contributed above and beyond relevant between-person predictors. All MRCM models were computed in R version 2.15 (R Development Core Team, 2012) using the lme4 package (Bates, Maechler, & Bolker, 2011). Since the tests for the predicted main and interactive effects were conducted within the same single MRCM model (Model 2 and/or Model 2A), the results below are organized by knowledge structure outcome rather than hypothesis ordering for convenience. Knowledge structure similarity. Hypotheses 1 and 6 examined differences in the similarity of knowledge structures produced by threatened versus non-threatened females to 8 those produced by males and top performers ; Table 10 summarizes the results of the MRCM models for these hypotheses. With respect to similarity with male knowledge structures, Model 1 indicated that 53% of the variance in mean similarity and 43% of the variance in similarity change over time could be attributed to between-person differences. However, the results of Model 2 indicated that neither the main effect (β01 = -.01, ns) nor interaction effect (β11 = -.01, ns) of stereotype threat on structural similarity was significant, suggesting that the knowledge structures for females in both experimental conditions were equally similar to male knowledge structures on average and that the rate of change in similarity did not significantly differ across conditions of stereotype threat. With respect to the similarity with the top performers’ knowledge 110 Table 10 MRCM Parameter Estimates for Female’s Knowledge Structure Similarity with Males and the Top 15 Performers (Hypotheses 1 & 6) Parameter estimates Referent Model 2 Structure β β β β σ τ τ 00 01 10 11 00 11 .18 — .01 — .005 .006 .004 .19 -.01 .02 -.01 .005 .006 .004 .18 — .02 — .003 .005 .002 .19 -.01 .02 -.01 .003 .005 .002 Model 1 Similarityti = π0i + π1i(Dayti) + rti π0i = β00 + u0i Males π1i = β10 + u1i Model 2 Similarityti = π0i + π1i(Dayti) + rti π0i = β00 + β01(Conditioni) + u0i π1i = β10 + β11(Conditioni) + u1i Model 1 Similarityti = π0i + π1i(Dayti) + rti π0i = β00 + u0i Top 15 π1i = β10 + u1i Model 2 Similarityti = π0i + π1i(Dayti) + rti π0i = β00 + β01(Conditioni) + u0i π1i = β10 + β11(Conditioni) + u1i Coefficient estimates in bold are significant at p < .05 structures, approximately 61% of the variance in mean similarity and 41% of the variance in similarity change over time was attributable to between-person differences. The results from Model 2 revealed that although females’ knowledge structures tended to become more similar to top performers over time on average (β10 = .02, p < .05), there were no differences in knowledge structure similarity across condition (β01 = -.01, ns) nor did changes in similarity to top performers over time differ between conditions (β11 = -.01, ns). In sum, neither Hypothesis 1 nor Hypothesis 6 was supported. 111 Knowledge structure correlation. Hypothesis 2 proposed that the knowledge structures of threatened females relative to those produced by males and top performers would be less correlated on average than the knowledge structures of non-threatened females, while Hypothesis 7 postulated that this correlation would increase more slowly over time for threatened females than for non-threatened females. The results of the MRCM models for these hypotheses are summarized in Table 11. Using male knowledge structures as the referent, Model 1 revealed that a significant portion of the variance in females’ mean correlation index was attributable to between-person factors (84%), while the amount of change in knowledge structure correlations across days tended to vary less across individuals (31% of variance in slopes attributable to between-person variables). Results from Model 2 revealed that, on average, the knowledge structures of females and males did become more strongly correlated over time (β10 = .03, p < .05); however, the knowledge structures of threatened females were not significantly less correlated on average (β01 = -.03, ns) nor did they converge towards male knowledge structures at a significantly slower rate (β11 = -.02, ns). Using top performers as the referent again indicated that average knowledge structure correlations varied substantially across females (ICC = .83), with rates of change tending to be less variable (ICC = .32). Similar to the previous set of analyses, on average female knowledge structures tended to become more strongly correlated with top performers over time (β10 = .04, p < .05); again, however, the main (β01 = -.02, ns) and interaction effects (β11 = -.01, ns) for Condition failed to reach significance, indicating that stereotype threat did not influence the average structural correlation with top performers or changes in correlation across days. Consequently, the pattern of results failed to support either Hypothesis 2 or Hypothesis 7. 112 Table 11 MRCM Parameter Estimates for Female’s Knowledge Structure Correlation with Males and the Top 15 Performers (Hypotheses 2 & 7) Parameter estimates Referent Model 2 Structure β β β β σ τ τ 00 01 10 11 00 11 .42 — .02 — .010 .052 .005 .43 -.03 .03 -.02 .010 .053 .005 .40 — .04 — .011 .054 .005 .41 -.02 .04 -.01 .011 .055 .005 Model 1 Corrti = π0i + π1i(Dayti) + rti π0i = β00 + u0i Males π1i = β10 + u1i Model 2 Corrti = π0i + π1i (Dayti) + rti π0i = β00 + β01(Conditioni) + u0i π1i = β10 + β11(Conditioni) + u1i Model 1 Corrti = π0i + π1i(Dayti) + rti π0i = β00 + u0i Top 15 π1i = β10 + u1i Model 2 Corrti = π0i + π1i (Dayti) + rti π0i = β00 + β01(Conditioni) + u0i π1i = β10 + β11(Conditioni) + u1i Coefficient estimates in bold are significant at p < .05 Knowledge structure coherence. Hypothesis 3 and Hypothesis 8 examined differences in the coherence of knowledge structures produced by females in the stereotype threat versus control conditions as well as differences in changes to knowledge structure coherence over time (Table 12). Computation of the ICCs revealed that 73% of the variance in females’ knowledge structure coherence and 23% of the variance in coherence growth rates was attributable to differences at the individual level. The addition of the between-subject Condition variable in Model 2 revealed that in general, the coherence of females’ knowledge structures did not significantly improve over time (β10 = .03, ns); furthermore, no significant differences between 113 Table 12 MRCM Parameter Estimates for Female’s Knowledge Structure Coherence (Hypotheses 3 & 8) Parameter estimates Model 2 β00 β01 β10 β11 σ τ00 Model 1 Coherenceti = π0i + π1i(Dayti) + rti — .02 — .026 .070 .32 π0i = β00 + u0i π1i = β10 + u1i Model 2 Coherenceti = π0i + π1i(Dayti) + rti π0i = β00 + β01(Conditioni) + u0i .34 -.05 .03 -.02 .026 .070 τ11 .008 .008 π1i = β10 + β11(Conditioni) + u1i Coefficient estimates in bold are significant at p < .05 mean levels of coherence (β01 = -.05, ns) or changes in coherence over time (β11 = -.02, ns) were observed between conditions of stereotype threat, thus failing to support Hypotheses 3 and 8. Number of knowledge structure links. The next set of predictions examined differences in the number of links present in the knowledge structures of threatened versus control females (Hypothesis 4) and the manner by which the number of network links changed over time across condition (Hypothesis 9). Table 13 presents the results of the MRCM analyses for this outcome. Overall, both the average number of links in females’ knowledge structures (ICC = .44) and the rate of change in number of links across days (ICC = .08) did not vary dramatically across individuals. The final Model 2 analyses revealed that, on average, female participants’ knowledge structures tended to become more interconnected over time, growing by approximately 3 links each day (β10 = 2.96, p < .05). However, the average number of links in the knowledge structure of females under stereotype threat did not significantly differ from those in the control condition (β01 = 3.12, ns); furthermore, there were no differences between 114 Table 13 MRCM Parameter Estimates for Number of Links in Female’s Knowledge Structures (Hypotheses 4 & 9) Parameter estimates Model 2 β00 β01 β10 β11 σ τ00 τ11 Model 1 Linksti = π0i + π1i(Dayti) + rti — 162.2 129.4 14.97 32.52 — 3.39 π0i = β00 + u0i π1i = β10 + u1i Model 2 Linksti = π0i + π1i(Dayti) + rti π0i = β00 + β01(Conditioni) + u0i 31.01 3.12 2.96 .90 162.1 130.05 15.31 π1i = β10 + β11(Conditioni) + u1i Coefficient estimates in bold are significant at p < .05 conditions in the number of links added per day (β11 = .90, ns). In sum, no differences in the average number of knowledge structure links nor in the number of knowledge structure links added over time were observed for females between conditions of stereotype threat, therefore failing to support Hypotheses 4 and 9. Knowledge structure clustering. Hypotheses 5 and 10 predicted that differences in the average compositional form of knowledge structures as well as the development of functional relations among knowledge structure concepts over time would emerge between females learning TANDEM under conditions of stereotype threat versus females in the control condition. To examine differences among clustering patterns, two sets of visual representations for participants’ knowledge structure were computed using the Pathfinder software and qualitative comparisons of their form were performed. To investigate whether the experimental manipulation exerted an effect on the development of structural relationships amongst concepts on average (Hypothesis 5), a single proximity matrix composed of the average pair-wise 115 similarity ratings amongst concepts collapsed across days was computed for females in each of the experimental conditions. This procedure resulted in the creation of two networks representing the knowledge structures for females in each condition averaged across all time points (Figure 7). To investigate changes in knowledge structure development over time, an averaged proximity matrix was also computed for female participants in each condition at each day, resulting in six knowledge structures depicting the average network at each day separately for females in the stereotype threat and control conditions (Figures 8-10). In large part, Figure 7 illustrates that the knowledge structures of females in the stereotype threat and control conditions were remarkably alike when averaged across time. Both groups appeared to draw a distinction between the decision-making and procedural/strategic concepts, with gaining/losing points (Points) generally serving as the logical connection between these two sets of clusters. Control condition females appeared to more strongly associate monitoring of the inner perimeter (MonInn) with scoring points in the task, whereas threatened females generally associated monitoring of the outer perimeter (MonOut) with this aspect of the task. Control females also seemed to associate the prioritization of critical targets (Priority) primarily with monitoring the inner defensive perimeter, while stereotype threat females associated prioritization with finding/engaging pop-up targets (PopUp)—perhaps suggesting that stereotype threat females were more actively seeking out pop-up targets and/or willing to immediately prosecute new targets rather than focusing their attention on protecting a particular defensive perimeter. Evidence for average differences in knowledge organization on the basis of feature and functional similarity was mixed. Females in both conditions seemed to draw equally distinctive functional relationships with respect to the Intent subdecisions; the classification of a target as 116 Female Stereotype Threat Female Control Figure 7. Knowledge structures for female participants in the stereotype threat and control conditions averaged across days 117 Female Control Day 1 Female Stereotype Threat Figure 8. Average knowledge structures for female participants in the stereotype threat and control conditions at end of Day 1 118 Female Control Day 2 Female Stereotype Threat Figure 9. Average knowledge structures for female participants in the stereotype threat and control conditions at end of Day 2 119 Female Control Day 3 Female Stereotype Threat Figure 10. Average knowledge structures for female participants in the stereotype threat and control conditions at end of Day 3 120 Peaceful (idPeac) or Hostile (idHost) were both related with their most probable Final Engagement outcomes (Clear and Mark, respectively; see Table 8). The relations amongst many of the other decision-making concepts, though, were more ambiguous. However, this pattern of results was perhaps not entirely surprising given that the outcomes of the Intent subdecision are among the most diagnostic/informative regarding how to apply the rules of engagement (Table 4) in order to make the correct Final Engagement decision. As shown in Table 8, the Intent subdecision contained only two possible outcomes (Peaceful or Hostile), each of which was only associated with two (as opposed to all three) Final Engagement decision outcomes; further, one of those Final Engagement options was always twice as likely to be correct (e.g., Clearing a Peaceful contact was likely to be correct 67% of the time, while Warning that target was likely to be correct only 33% of the time). As such, the classification of a target as either Peaceful or Hostile carried with it a relatively high degree of certainty/informative value regarding the correct Final Engagement decision compared with the Type and Class subdecisions, suggesting that a strong functional association among the Intent and Final Engagement concepts may have been easier for participants to infer. Regardless, the lack of noticeable differences in the aggregate structural relations of females between conditions does not lend support to the predictions of Hypothesis 5. Although comparison of the aggregated knowledge structures provides a broad overview of the manner by which females in both conditions perceived relations among task-relevant concepts and information, it does not account for the fact that individuals’ knowledge structures were likely to change as they gained more experience within the task domain. Hypothesis 10 therefore sought to examine whether the development and growth in the knowledge structures of females under stereotype threat differed from that of females in the control condition. To this end, 121 Figures 8-10 reveal a number of intriguing differences in the pattern of knowledge structure development between these two groups of female learners. By the end of Day 3, a distinctive pattern had developed in the knowledge structure of female participants in the control condition that was not present in the structure of stereotype threat females. Namely, the acquisition of points emerged as a central hub in the network of control females (similar to the knowledge structures of male participants), which was subsequently linked to all three Final Engagement decision outcomes (Clear, Warn, Mark). In turn, each of these Final Engagement outcomes was linked to its single most probable Class and Intent subdecision outcome (e.g., Mark related to the identification of a Hostile target, Warn related to the identification of a Civilian target, etc.; see Table 8). Lastly, although the outcomes related to the Type subdecision (idAir, idSurf, and idSub) were not associated with a particular Final Engagement decision, they were each most strongly/directly related to gaining/losing points. One plausible interpretation of this structural pattern is that, by the end of Day 3, control females had developed an efficient/simplified heuristic for making Final Engagement decisions that they used for earning points in the task. Before describing the specific form of this heuristic, it is helpful to again consider Table 8 and the relative probabilities between each subdecision outcome and the three Final Engagement outcomes in order to understand why it is an efficient and effective means by which to make decisions in the present version of TANDEM. As was detailed above, the Intent subdecision was arguably the most easily interpretable subdecision for helping an individual determine how to apply the rules of engagement in order to produce the correct Final Engagement decision. After this, the Class subdecision was likely the next most informative subdecision. Similar to the Intent subdecision, there were only two possible Class outcomes from which to choose (Civilian and Military), and one of these outcomes (Civilian) 122 was highly diagnostic of the correct Final Engagement decision. In contrast, the Type subdecision possessed three possible outcomes (Air, Surface, Sub), each of which could be associated with any of the three Final Engagement decisions with probabilities that were not vastly different from one another (e.g., 50% of Air targets were likely to be Warned, 25% Cleared, and 25% Marked); consequently, integrating the Type subdecision into one’s Final Engagement choice was likely the most difficult part of the decision procedure. Returning to the implicit heuristic implied by the control condition females’ knowledge structures then, by Day 3, learners in this group appeared to have made the diagnostic functional connections between the Class and Intent subdecisions for a target and the Final Engagement decision which those pieces of information suggested was most likely correct and which would earn them points in the task. Once these pieces of information about a target are known, the correct Final Engagement decision becomes substantially easier and, more often than not, can be made correctly regardless of whether an individual has learned the relatively more difficult functional relationships for the Type subdecision (note that the person must still make the Type subdecision correctly in order to earn points, but in many cases they do not need to functionally integrate that information in order to make the correct Final Engagement decision). More specifically, the information presented in Tables 4 and 8 indicate that if a person follows the heuristic: 1. If a target is Civilian, Warn it; if target is Military, go to Step 2 2. Choose whatever Final Engagement decision is most probable based on the targets’ Intent, then the correct Final Engagement decision will be made, on average, 84% of the time without even needing to consider the target’s Type (Step 1 will lead to correct Final Engagement 123 decision in 67% of occasions, whereas Step 2 will lead to the correct Final Engagement decision in 100% of occasions). While simply examining the knowledge structures for control condition females does not indicate that these participants were following this decision-making heuristic (the results of Hypothesis 12 present a more detailed examination of this possibility), the observed changes in the pattern of network concepts over time is consistent with learners in this group acquiring this highly efficient/effective decision process. Control females appeared to have learned the relatively easier functional relationships for the Intent subdecisions by the end of Day 1, though they had yet to fully make sense of the remaining subdecision outcomes. By Day 2, the importance of the three Final Engagement outcomes in relation to scoring points had been established and learners seemed to be drawing more clearly interpretable associations between the Final Engagement and Class outcomes. Finally, at the end of Day 3, the structural relations consistent with the decision heuristic outlined above had been achieved. Also of note, an organized structure amongst the procedural/strategic task concepts did not emerge until Day 3 as well, perhaps indicating that control condition females delayed learning/practicing these functions until they had developed more expertise with the foundational decision-making components of the TANDEM task environment. In contrast, the pattern of concept relations in the knowledge structures of females in the stereotype threat condition differed rather substantially from that described above. Unlike control condition females, there was never a point in time where stereotype threat females’ average daily knowledge structure exhibited a pattern in which gaining/losing points was associated with all three Final Engagement decisions, which in turn were then associated with their single most diagnostic Class/Intent outcomes. Furthermore, stereotype threat females’ knowledge structures 124 were the only networks in which gaining/losing points was not always the most interconnected node in the knowledge structure on any given day. In fact, at Day 1, gaining/losing points was most strongly related to monitoring the outer perimeter, perhaps implying that females under threat were too concerned early on with gaining/losing points as a result of procedural/strategic aspects of the task (e.g., ensuring targets did not cross the invisible outer perimeter) rather than the manner by which the more fundamental target engagement/decision-making processes affected task performance. A greater focus on these advanced task concepts is further evidenced by the fact that relations among the procedural/strategic task concepts had already begun to exhibit a logical structure by the end of Day 2, which was earlier than what was observed for control condition females. Of final interest, it did appear that threatened females were generally inferring appropriate functional relationships between the various subdecision and Final Engagement outcome concepts; however, the pattern of structural relations suggests that this group of learners may have been doing so in a less efficient manner. More specifically, the knowledge structures of stereotype threat learners were more likely to possess relational patterns in which a single subdecision outcome (e.g., identification of target as Hostile) was related to multiple Final Engagement outcomes (e.g., Warn and Mark). As noted above in the description of the efficiency decision heuristic, the key interpretative inference that is needed when an individual identifies a target as possessing a particular Class or Intent is “What is the single most likely Final Engagement decision for the target based on its classification?,” not “What are all the possible Final Engagement decision outcomes for the target based on its classification?” That is, if an individual identifies a target as Hostile, then it is relatively more efficient/diagnostic to know that the most probable engagement action is to Mark that contact rather than knowing that a Hostile 125 contact could be either Warned or Marked. The clustering pattern in which multiple engagement outcomes were linked to a single subdecision outcome was observed once in the Day 2 knowledge structure (identification of Hostile targets) and twice in the Day 3 knowledge structure (identification of Hostile targets and identification of Military targets) of threatened females. Although these functional relations are not necessarily “wrong,” they do suggest that threatened female learners may have been focusing their learning efforts on memorizing the entire distribution of final engagement decisions rather than attempting to learn the seemingly more efficient heuristic approach noted above. To summarize, a number of differences were observed in the progression of female’s knowledge structure development over time across conditions of stereotype threat. In general, females who did not experience the stereotype threat manipulation appeared to organize information related to decision-making in TANDEM in a manner consistent with an efficient and reasonably effective decision heuristic for scoring points in the task. Furthermore, there was tentative evidence that these female learners may have also delayed learning/practicing more advanced strategic aspects until these more fundamental processes were learned. Alternatively, the knowledge structures of females experiencing stereotype threat during learning appeared to be organized in a less efficient manner, and were instead consistent with an approach in which individuals attempted to learn through brute memorization rather than identifying the most informative/diagnostic relations among concepts. Threatened females may have also been somewhat more likely to attempt learning the more advanced procedures of task performance earlier during learning activities well before they had effectively learned more basic task concepts. On the basis of this evidence, the predictions of Hypothesis 10 were largely supported. 126 Exploratory analyses using the graph theoretic metrics were also conducted to examine mean differences in the composition of female knowledge structures between conditions and across time. The results of these MRCM analyses are presented in Table 14. Overall, the pattern of results revealed no significant main effects or interaction effects between groups. On the whole, the results revealed that the knowledge structures of all female learners tended to become more tightly interconnected over time (e.g., average shortest path length between node pairs decreased, diameter of network decreased, and clustering coefficient increased). Additionally, this trend appeared to equally influence the distances among both feature (links between concepts from the same subdecision) and functional (links between subdecision concepts and most probable Final Engagement outcome) network relations. Cognitive Strategy Analyses Strategic learning behaviors. Hypothesis 11 proposed that females learning under stereotype threat would exhibit poorer/more basic task strategies than females in the control condition. The analytic approach used to evaluate this prediction was essentially identical to that outlined for the knowledge structure analyses. As noted in the Methods section, a number of variables were recorded and analyzed to evaluate the strategic learning/performance demonstrated by individuals in the task. Consequently, the results in this section are organized into three areas. The first, knowledge acquisition behaviors, presents analyses from participants’ use and study with the online task manual prior to each TANDEM trial. The second section, task practice behaviors, examines data from participants’ actions within the game that are representative of strategic learning and performance; these data includes the number of marker targets engaged, the number of times participants zoomed their radar screen in/out to help monitor defensive perimeters, the number of high priority targets engaged, and the total number 127 Table 14 MRCM Parameter Estimates for Graph Theoretic Metrics (Exploratory Analyses) Parameter estimates Dependent Model 2 Variable β β β β σ τ τ11 00 01 10 11 00 2.25 — -.10 — .190 .070 .002 2.28 -.04 -.08 -.04 .190 .071 .003 4.56 — -.27 — 1.77 .548 .034 4.64 -.16 -.24 -.07 1.77 .554 .035 .17 — .02 — .006 .004 .000 .16 .01 .02 .01 .006 .004 .000 Model 1 Lti = π0i + π1i(Dayti) + rti π0i = β00 + u0i Avg. π1i = β10 + u1i Shortest Path Model 2 Length Lti = π0i + π1i(Dayti) + rti π0i = β00 + β01(Conditioni) + u0i π1i = β10 + β11(Conditioni) + u1i Model 1 Dti = π0i + π1i(Dayti) + rti π0i = β00 + u0i π1i = β10 + u1i Network Diameter Model 2 Dti = π0i + π1i(Dayti) + rti π0i = β00 + β01(Conditioni) + u0i π1i = β10 + β11(Conditioni) + u1i Model 1 Cti = π0i + π1i(Dayti) + rti π0i = β00 + u0i Cluster Coeff. † π1i = β10 + u1i Model 2 Cti = π0i + π1i(Dayti) + rti π0i = β00 + β01(Conditioni) + u0i π1i = β10 + β11(Conditioni) + u1i 128 † Table 14 (cont’d) DV Parameter estimates Model Avg. Feature Path Length π1i = β10 + u1i Model 2 L-Featti = π0i + π1i(Dayti) + rti π0i = β00 + β01(Conditioni) + u0i β10 β11 — -.10 — .281 .226 .046 2.03 .02 -.12 .05 .283 .227 .046 1.98 π0i = β00 + u0i β01 2.04 Model 1 L-Featti = π0i + π1i(Dayti) + rti σ 2 β00 τ00 τ11 — -.12 — .242 .100 .011 1.95 .07 -.08 -.06 .222 .055 .012 π1i = β10 + β11(Conditioni) + u1i Model 1 L-Functi = π0i + π1i(Dayti) + rti π0i = β00 + u0i Avg. π1i = β10 + u1i Function Path Model 2 Length L-Functi = π0i + π1i (Dayti) + rti π0i = β00 + β01(Conditioni) + u0i † π1i = β10 + β11(Conditioni) + u1i Coefficient estimates in bold are significant at p < .05 † Signficant at p < .10 of targets engaged. Lastly, the section on self-regulation presents results from the self-reported metacognitive activity scale completed at the end of each day. The same progression of MRCM equations used for the knowledge structure analyses (Equations 1 through 3) were fit for each of these dependent variables, and the direction and significance of the main effect (β01) and crosslevel interaction (β11) terms for the Condition variable were examined. Given that the primary focus of these analyses was to determine whether significant differences between the experimental conditions for females were observed during learning activities, data from the 129 practice trials (6-18 observations per person) rather than the performance trials (1-3 observations) were used in the analyses for the task manual and practice behavior variables listed above. Knowledge acquisition behaviors. To facilitate interpretation of how participants spent their time studying the online task manual prior to each learning trial, pages within the manual were coded into categories according to the type of information each page conveyed (basic gameplay, cue value, and task strategy information). An additional category (null) was defined that included time spent on the introductory menu page and/or the task exit screen. To provide an overall perspective on the manner by which individuals organized their task manual study time, the cumulative average and average amount of time spent on the various manual sections were plotted for the control and stereotype threat female learners for each trial (Figures 9 and 10, respectively). Figure 9 clearly shows that the study patterns of females in both conditions were fairly similar throughout the Day 1 and Day 2 learning trials; however, a marked decrease was observed in the average amount of time stereotype threat females spent studying the manual during the final Day 3 learning trials. Of additional interest, Figure 10 indicated that the relationship between time/trial and the amount of time spent studying certain sections of the task manual was likely not a strict linear function. Consequently, a quadratic time term was added as random effect to the Level-1 equation of the subsequent MRCM analyses to better model these observations in the subsequent analyses. Table 15 presents the results from the MRCM analyses for each of the manual sections as well as the overall time spent studying the manual. As expected, a comparison of the Model 1 β00 coefficients across each of the manual sections revealed that on average, learners spent just over half of their available study time (57%) on the cue value manual pages and an additional one quarter (25%) of their time reading the task strategy pages. An examination of the linear 130 Figure 11. Cumulative average time spent viewing manual pages during learning trials 131 Figure 12. Average time spent viewing manual pages during learning trials 132 Table 15 MRCM Parameter Estimates for Time Spent on Task Manual Sections (Hypothesis 11) Parameter Estimates Dependent Model Variable β β β β β 00 01 10 11 20 β21 4.04 — -.60 — .09 — 3.02 2.79 -.60 .12 .13 -.11 68.59 — -2.28 — -.10 — -1.10 -2.59 -.05 -.10 Model 1 Basicti = π0i + π1i(Trialti) + 2 π2i(Trialti) + rti π0i = β00 + u0i Time spent on basic manual pages π1i = β10 + u1i π2i = β20 + u2i a Model 2A Basicti = π0i + π1i(Trialti) + 2 π2i(Trialti) + rti π0i = β00 + β01(Conditioni) + u0i π1i = β10 + β11(Conditioni) + u1i π2i = β20 + β21(Conditioni) + u2i Model 1 Cueti = π0i + π1i(Trialti) + 2 π2i(Trialti) + rti π0i = β00 + u0i Time spent on cue value manual pages π1i = β10 + u1i π2i = β20 + u2i a Model 2A Cueti = π0i + π1i(Trialti) + 2 π2i(Trialti) + rti π0i = β00 + β01(Conditioni) + u0i π1i = β10 + β11(Conditioni) + u1i π2i = β20 + β21(Conditioni) + u2i 133 71.49 -8.07 Table 15 (cont’d) Dependent Variable Parameter Estimates Model β00 β01 β10 β11 β20 β21 29.50 — -.82 — -.27 — -.39 -.87 -.28 -.01 Model 1 Stratti = π0i + π1i(Trialti) + 2 π2i(Trialti) + rti π0i = β00 + u0i Time spent on strategy manual pages π1i = β10 + u1i π2i = β20 + u2i a Model 2A Stratti = π0i + π1i(Trialti) + 2 π2i(Trialti) + rti π0i = β00 + β01(Conditioni) + u0i 32.27 -4.99 π1i = β10 + β11(Conditioni) + u1i π2i = β20 + β21(Conditioni) + u2i Model 1 Nullti = π0i + π1i(Trialti) + 2 π2i(Trialti) + rti 10.43 π0i = β00 + u0i Time spent on null manual pages — .82 — .01 — 10.04 -.17 .99 -.47 .04 -.07 π1i = β10 + u1i π2i = β20 + u2i a Model 2A Nullti = π0i + π1i(Trialti) + 2 π2i(Trialti) + rti π0i = β00 + β01(Conditioni) + u0i π1i = β10 + β11(Conditioni) + u1i π2i = β20 + β21(Conditioni) + u2i 134 Table 15 (cont’d) Dependent Variable Parameter Estimates Model β00 β01 β10 β11 β20 β21 112.6 — -2.78 — -.27 — -1.08 -3.80 -.15 -.29 Model 1 Totalti = π0i + π1i(Trialti) + 2 π2i(Trialti) + rti π0i = β00 + u0i Total time spent on manual pages π1i = β10 + u1i π2i = β20 + u2i a Model 2A Totalti = π0i + π1i(Trialti) + 2 π2i(Trialti) + rti π0i = β00 + β01(Conditioni) + u0i 117.2 -10.9 π1i = β10 + β11(Conditioni) + u1i π2i = β20 + β21(Conditioni) + u2i Coefficient estimates in bold are significant at p < .05 † Signficant at p < .10 a Model includes control variables (coefficients not printed for ease of presentation) main effect for trial (β10) on total time spent studying the manual revealed that, on average, all female learners tended to spend less time reading the manual over time; this was true for each manual section except the null pages, which demonstrated a slight increase over time. Consistent with the trends shown in Figure 9, the significant negative coefficient for the quadratic trial variable (β20) on total study time and task strategy study indicated that the decrease in the amount of time spent viewing the manual tended to become more exaggerated during the later study trials. Analysis of the main effects of the stereotype threat manipulation revealed that stereotype threatened female participants spent significantly more time on the basic manual pages (β01 = 135 2.79, p < .05), less time on the cue value pages (β01 = -8.07, p < .05), and less total time studying overall (β01 = -10.9, p < .05). Interaction effects between stereotype threat and the linear time variable were also observed for time spent on the cue value pages (β11 = -2.59, p < .05), task strategy pages (β11 = -87, p < .05), and total time spent studying (β11 = -3.80, p < .05); in all cases, the direction of the effect indicated that stereotype threatened females tended to spend less time studying these pages at each trial than control condition females. Lastly, a significant interaction between stereotype threat and the quadratic trial variable was observed for both the amount of time spent studying the basic gameplay section (β21 = -.11, p < .05) as well as overall study time (β21 = -.29, p < .05). In the former case, Figure 10 shows that control condition females tended to spend slightly longer on the basic gameplay pages early on in the task, which tapered away throughout the trials; alternatively, stereotype threatened females spent less time on these pages early on and did not tend to revisit them later in the study. The significant quadratic interaction for overall study time reflects the noted drop-off in study time observed for stereotype threat participants following Day 2 relative to control condition females (Figure 9). In sum, analysis of the task manual data largely supported the predictions of Hypothesis 11. The significant mean differences found for both the overall time spent studying and time spent studying the critical cue value/task strategy portions of the manuals reflected poorer learning behaviors on the part of stereotype threat individuals. Although not predicted as such, the unique pattern of variation in study time over the course of the learning trials for stereotype threat participants was also consistent with predictions from stereotype threat theory, though perhaps more so with its influence on motivation rather than cognition. 136 Task practice behaviors. Results from the MRCM analyses and graphs contrasting control and stereotype threatened females on the four focal task practice behaviors are presented in Table 16 and Figure 11, respectively. ICCs for variance in the intercept terms were relatively moderate across the four task practice behaviors, ranging from .39 to .55; however, variation in slopes was virtually nonexistent (ICCs ranging from .01 to .02), indicating that changes in time for the modeled variables were highly similar across all females. In general, the MRCM analyses revealed that the number of marker targets engaged (β10 = .02, p = .08), zoom activities performed (β10 = .42, p < .05), high priority targets engaged (β10 = .15, p < .05), and total number of targets engaged (β10 = .30, p < .05) tended to increase across task trials for all females. However, no main or interaction effect of stereotype threat achieved significance for any of the task practice variables, indicating that females in both groups were typically engaging in similar practice behaviors during the learning trials. As can be seen in Figure 11, it appeared that the number of marker targets engaged over time may have been changing at a different rate for the different groups, suggesting that modeling a quadratic time variable might improve model parameter estimates. Although the quadratic time coefficient achieved significance in this model (β20 = -.01, p < .05, indicating that the number of marker targets engaged initially increased rapidly and then decreased in later trials), the main effect and both the linear and quadratic interaction effects of stereotype threat failed to achieve significance. In sum, analyses of the task practice behaviors did not support the predictions of Hypothesis 11. Threatened females were not more likely to engage marker targets, ignore defensive perimeters, or fail to prosecute high priority targets than control condition females. Contrary to the results observed with participants’ use of the task manual, both groups also 137 Table 16 MRCM Parameter Estimates for Task Practice Behaviors (Hypothesis 11) Parameter estimates Dependent Model 2 Variable β00 β01 β10 β11 σ τ00 Model 1 Markerti = π0i + π1i(Trialti) + rti Number of Marker Targets Engaged τ11 .65 — .02 — .945 .641 .006 .55 .21 .02 † .00 .945 .635 .006 9.49 π0i = β00 + u0i — .48 — 62.7 40.6 .57 .42 .11 62.7 40.7 .57 π1i = β10 + u1i Model 2 Markerti = π0i + π1i(Trialti) + rti π0i = β00 + β01(Conditioni) + u0i π1i = β10 + β11(Conditioni) + u1i Model 1 Zoomti = π0i + π1i(Trialti) + rti Number of Zoom Actions π0i = β00 + u0i π1i = β10 + u1i Model 2 Zoomti = π0i + π1i(Trialti) + rti π0i = β00 + β01(Conditioni) + u0i 8.97 1.06 π1i = β10 + β11(Conditioni) + u1i Model 1 HiPriorti = π0i + π1i(Trialti) + rti Number of High Priority Targets Engaged. π0i = β00 + u0i 4.89 — .16 — 1.15 1.44 .027 4.81 .16 .15 .01 1.15 1.45 .027 π1i = β10 + u1i Model 2 HiPriorti = π0i + π1i(Trialti) + rti π0i = β00 + β01(Conditioni) + u0i π1i = β10 + β11(Conditioni) + u1i 138 Table 16 (cont’d) Dependent Variable Parameter estimates Model σ 2 β00 β01 β10 β11 τ00 τ11 10.6 — .28 — 4.64 3.63 .036 10.3 .55 .30 -.03 4.64 3.59 .036 Model 1 TotEngti = π0i + π1i(Trialti) + rti Total Number of Targets Engaged π0i = β00 + u0i π1i = β10 + u1i Model 2 TotEngti = π0i + π1i(Trialti) + rti π0i = β00 + β01(Conditioni) + u0i π1i = β10 + β11(Conditioni) + u1i Coefficient estimates in bold are significant at p < .05 † Signficant at p < .10 appeared to be exerting equivalent levels of effort in their practice behaviors given that no significant mean or longitudinal differences in the total number of targets engaged were found. In short, stereotype threat and control condition females appeared to engage in highly similar practice activities relative to the advanced/strategic aspects of the task. Self-regulation. Kraiger et al. (1993) propose that heightened metacognitive awareness is a hallmark of advanced cognitive learning. As a final investigation of female learners’ strategic task learning then, results from the self-reported metacognitive activity measure assessed at the end of each day were analyzed (Table 17). Computation of the ICCs revealed that approximately 81% of the variance in mean metacognitive activity and 47% of the variance in change in metacognitive activity over time was attributable to between-person factors. Adding the experimental condition variable to the Level-2 equation revealed a significant main (β01 = -.20, p < .05) and interaction effect (β11 = -.11, p < .05) of stereotype threat on metacognitive activity. The direction of these coefficients indicated that females in the stereotype threat group tended to 139 Figure 13. Female’s average task practice behaviors across learning trials 140 Figure 13 (cont’d). 141 Table 17 MRCM Parameter Estimates for Female’s Metacognitive Activity (Hypothesis 11) Parameter estimates Model 2 β00 β01 β10 β11 σ τ00 τ11 Model 1 Metacogti = π0i + π1i(Dayti) + rti π0i = β00 + u0i π1i = β10 + u1i Model 2 Metacogti = π0i + π1i(Dayti) + rti π0i = β00 + β01(Conditioni) + u0i 3.77 — .01 — .075 .32 .066 3.87 -.20 .06 -.11 .075 .31 .063 3.84 -.21 .02 -.08 .074 .27 .050 π1i = β10 + β11(Conditioni) + u1i Model 2A Coefficient estimates in bold are significant at p < .05 report lower levels of metacognitive activity overall and that this average tended to decrease each day relative to control condition females (though the addition of the control variables negated this significant interaction). Thus, this pattern of results generally supports the predictions of Hypothesis 11 as well. Summarizing across the analyses above, stereotype threat appeared to exert a demonstrative influence on female learners’ knowledge acquisition strategies and the manner by which these individuals organized/focused their efforts during study time with the task manual. However, this apparent lack of engagement during the knowledge acquisition phase exerted little influence on the task practice behaviors of threatened females relative to their control counterparts. The experimental manipulation did appear to exert a negative influence on females’ self-regulatory metacognitive activities though, which was compounded over time. In aggregate, the accumulated evidence was largely supportive of the predictions advanced in Hypothesis 11. 142 Decision-making strategy. Hypothesis 12 focused directly on the decision-making threat condition learned/relied on less optimal procedural decision strategies/heuristics for task completion than non-threatened women. To examine this hypothesis, a policy capturing approach (Aiman-Smith, Scullen, & Barr, 2002; Karren & Barringer, 2002) was employed. Policy capturing is a regression-based procedure which assesses how individuals or groups of individuals differentially weigh the importance of relevant informational cues when making an evaluation or decision. Within the organizational research literature, the methodology has been used to examine the extent to which individuals differentially value information about fit on perceptions of satisfaction (Kristof-Brown, Jansen, & Colbert, 2002), work behaviors on ratings of overall job performance (Rotundo & Sackett, 2002), and compensation packages on job pursuit intentions (Cable & Judge, 1994), among other applications. Policy capture studies require respondents to make a series of preference ratings/choices based on various combinations of decision-relevant information that are presented. For example, Kristof-Brown et al. (2002) asked participants to provide evaluations of perceived work satisfaction given information about the degree of person-job (PJ: low, medium, high), person-group (PG: low, medium, high), and person-organization (PO: low, medium, high) fit they would experience in a given organization. For that study, individuals provided ratings of perceived work satisfaction for 27 scenarios constructed by crossing all combinations of cues and cue values (3 cues with 3 values each). Analytically, individuals’ decisions/responses given a set of cues for a scenario are then regressed onto the specific cue values provided for that scenario; once aggregated across all decisions, regression coefficients for each informational cue are generated which reflect the relative importance of a particular piece of information/cue to an individual’s choices. This information can be used in an idiographic manner to interpret the decision processes of one 143 person in particular or combined across people to draw nomothetic conclusions about individuals’ general information processing tendencies (Aiman-Smith et al., 2002). Within the latter approach, cluster analytic techniques may be employed to empirically group individuals into categories of like decision-makers (e.g., managers who tend to favor information about task performance vs. counter-productivity in ratings of job performance, Rotundo & Sackett, 2002) or MRCM analyses can be used to test a priori hypotheses about the influence of between-person variables on differential cue weighting (Kristof-Brown et al., 2002). The goal of Hypothesis 12 was most similar to the second of these approaches; that is, the primary prediction concerned whether the learning strategies of threatened versus non-threatened females led to differential weighting of the informational value/relevance of a target’s Type, Class, and Intent (i.e., the informational sources/cues) in the critical Final Engagement decision for a target. Unlike most real-world decisions in which the correct decision is often unknown or ambiguous, an objectively correct engagement decision existed for every target which participants prosecuted in the task; consequently, each of the Type, Class, and Intent outcomes possessed an “optimal” cue weighting that indicated its diagnostic/informative value to a particular Final Engagement decision (similar to what is summarized in Table 8). These optimal weights could thus be extracted from the task and compared to the decision weights produced by learners. As a result, the focus for the present set of analyses was to examine whether A) the decision weights developed by female learners in the stereotype threat condition significantly differed from those in the control condition; and B) whether the decision weights developed by female learners in the stereotype threat condition were further from the optimal set of decision weights than their control condition counterparts. Note that to evaluate this hypothesis, only females’ target/decision data from the three performance trials were used. Using data from only 144 the three performance trials was far more computationally tractable and therefore likely to lead to better model convergence and more accurate parameter estimates than using data from the 18 learning trials. Because the dependent variable (Final Engagement decision) was a categorical variable with more than two levels (Clear, Warn, Mark), multinomial logistic regression was used for all analyses. In multinomial logistic regression, one sets a single level of the dependent variable to serve as the referent against which the other levels of the dependent variable are contrasted. Thus, k -1 regression equations are computed for the set of predictors, where k equals the number of categorical levels in the dependent variable. Given that there was no clearly logical/best choice among the Final Engagement categories to serve as the referent level, the decision was made to use the Clear category for these purposes. Consequently, two regression equations were modeled for each target: the first tested the likelihood that a given predictor was more strongly related to making a Warn as opposed to Clear decision, while the other tested the likelihood that a given predictor was more strongly related to making a Mark as opposed to Clear decision. Additionally, the categorical Type, Class, and Intent predictors necessitated the creation of dummy coded variables in order to be properly included in the regression model; k -1 dummy variables were therefore also needed for each categorical predictor. For these purposes, two dummy variables were created for the three-level Type (Air, Surface, Sub) variable using the Air subdecision outcome as the referent, while a single dummy variable was created for both the Class and Intent subdecisions in which Civilian and Peaceful subdecision values served as the referent categories, respectively. Two separate policy capture analyses using multinomial logistic regression were required in order to evaluate the propositions stated above—one to extract the optimal decision weights 145 from the task and the other to estimate participants’ observed decision weights. The data required to compute the first of these models were the objectively correct Type, Class, Intent, and Final Engagement decision for the targets constructed for the performance trials. Note that because the optimal weighting configuration for targets does not vary over time, it was not necessary to include the effect of time in the policy capture regression model. Subsequently, a simple singlelevel multinomial logistic regression analysis was performed on the data: ( ) β0 + β1(TypeSurface) + β2(TypeSub) + (10) β3(ClassMilitary) + β4(IntentHostile) + ej ( ) β0 + β1(TypeSurface) + β2(TypeSub) + β3(ClassMilitary) + β4(IntentHostile) + ej Logistic regression models will not converge or produce parameter estimates for data in which all observations of a particular predictor have the same outcome and/or can be perfectly classified into one category of the dependent variable (i.e., complete or quasi-separation exists in the data, Heinze & Schemper, 2002). This issue arises in the analysis of the optimal decision weights as a number of the predictors have a zero probability of being associated with a particular level of the dependent variable (e.g., Hostile targets are never Cleared, see Table 4, Table 8; as a result, perfect categorization/complete separation exists in the data for this variable). Consequently, Firth’s bias reduction method was applied to the estimate of the multinomial logistic regression models in order to generate the optimal decision weights (Firth, 1993). The Firth corrected multinomial logistic models were run in R version 2.15 (R Development Core Team, 2012) using the pmlr package (Colby, Lee, Lewinger, & Bull, 2010). 146 With respect to the computation of the policy capture analyses using the observed data from participants, the structure of the data was such that multiple targets were nested within multiple days, which were nested in individuals. As a result, multinomial logistic MRCM was the preferred analytic approach for conducting the regression analyses. However, rather than attempt to fit a 3-level model to the data (which would substantially increase the computational intensity and complexity required for interpreting the decision weight estimates), the trial/time variable was collapsed into the Level-1 equation to create a 2-level model. This effectively removes estimation of the random effect for time from the MRCM equation and therefore does not allow for random variation in factors at the time-level to influence the relationship between information cue and engagement decision for individuals in the sample. Given that there was no reason to suspect that any one day/time point in the experiment was systematically different than any other day/time point in the study, this assumption seemed reasonable and, more importantly, unlikely to have a significant influence on the estimate of the decision weights. Thus, the final MRCM model was specified as follows (note that for ease of presentation, the separate logit link functions for the multinomial dependent variable categories are not presented): Final Engagementti = π0i + π1i(TypeSurface) + π2i(TypeSub) + π3i(ClassMilitary) + π4i(IntentHostile) + π5i(Day) + π6i(Day*TypeSurface) + π7i(Day*TypeSub) + π8i(Day*ClassMilitary) + π9i(Day*IntentHostile) + rti π0i = β00 + β01(Conditioni) + u0i π1i = β10 + β11(Conditioni) + u1i π2i = β20 + β21(Conditioni) + u2i π3i = β30 + β31(Conditioni) + u3i 147 (4) π4i = β40 + β41(Conditioni) + u4i π5i = β50 + β51(Conditioni) + u5i π6i = β60 + β61(Conditioni) + u6i π7i = β70 + β71(Conditioni) + u7i π8i = β80 + β81(Conditioni) + u8i π9i = β90 + β91(Conditioni) + u9i Of greatest relevance to the present hypothesis, the fixed effect intercept terms β10 through β40 indicate the average decision weight observed in the entire female sample for each of the dummy-coded information variables collapsed across days, while the slope terms β11 through β41 reflect the extent to which the average decision weights differed for stereotype threat women relative to the control condition women. Similarly, the fixed effect intercept terms β60 through β90 indicate the average change in decision weighting across days observed in the entire female sample for each of the dummy-coded information variables, while the slope terms β61 through β91 reflect the extent to which changes in decision weighting over time differed for stereotype threat women relative to the control condition women. The multinomial logistic MRCM analyses were conducted using HLM version 6.08 (Scientific Software International, 2009). Of final note, the Type, Class, and Intent outcomes used as the informational cues for the Final Engagement decision were also themselves decisions made by participants (as opposed to 148 veridical information provided by the environment). Since participants were required to determine the correct Type, Class, and Intent for each target, it is likely that individuals may have based a number of their Final Engagement decisions on objectively “incorrect” information if they made the wrong classification for any of these subdecisions. While such errors would prevent an individual from making the actually correct Final Engagement decision for a target, they do not influence estimation of the decision weights in the policy capture analysis as these analyses simply examine the consistency with which a given piece of information is associated with a particular decision outcome. That is, the accuracy of individuals’ Type, Class, and Intent classification decisions for any particular target is entirely independent of whether the participant made optimal decisions based on the information they had available. For example, consider a target whose correct classification was Air, Civilian, Hostile, yet a participant classified the target as Surface, Military, Peaceful and subsequently chose to Clear the target (which is the correct Final Engagement decision for a Surface, Military, Peaceful target, see Table 8). In this case, the policy capture analyses would indicate that the person had made the optimal decision, even though the information used to make that decision—as well as the Final Engagement decision itself—was objectively inaccurate. In theory then, a participant could fail to correctly prosecute any targets in the game yet still learn to make optimal decisions so long as they made the correct engagement decision based on the information they had available. Consequently, the policy capture analyses do not provide insight into whether individuals or groups of individuals were more or less accurate in their decisions (which would influence performance), but rather whether they had learned to correctly interpret cue values. In short, participants could be optimal decision-makers without necessarily being accurate, but they could not be accurate without also being optimal. 149 Figures 14 through 21 summarize the results of the policy capture analyses. The y-axis for each graph reflects the decision weight associated with (i.e., the likelihood of selecting) either the Warn or Mark engagement decision relative to selecting the Clear engagement decision for each of the dummy-coded predictor variables at each day in the experiment. Thus, the graph in Figure 12 reflects the relative likelihood of Warning rather than Clearing a target if that target was a Surface as opposed to some other Type of vessel over time. Negative decision weights indicate that individuals were less likely to Warn/Mark rather than Clear targets given the informational cue, while positive values indicate that individuals were more likely to Warn/Mark rather than Clear targets given the informational cue. The graphs also depict the optimal weighting criteria for each of the Final Engagement by cue value combinations. Note that the direction and magnitude of the optimal decision weights perfectly mirrors the pattern of probabilities summarized in Table 8. For example, Table 8 indicated there was a higher probability that the correct Final Engagement decision for any given Surface target would be Clear (50%) as opposed to Warn (25%); this same pattern is reflected in the negative optimal decision weight coefficient (β = -4.65) shown in Figure 14. Of final note, for purposes of the present study, the actual numerical value of the decision weight carries no substantive meaning for evaluating Hypothesis 12. The interpretation of greater interest is the overall direction (i.e., positive or negative) and the relative differences in the decision weights produced by stereotype threat and control females to the optimal decision weights. An examination of the main effects of stereotype threat on female learners’ procedural decision-making effectiveness revealed that the only significant difference in the average decision weights across the groups was for the Surface cue; specifically, stereotype threat females tended to have more difficulty distinguishing whether to both Warn (β21 = .81, p < .05; 150 Figure 14. Observed and optimal decision weights for Type cue (Surface) on decision to Warn rather than Clear targets for stereotype threat females and control females at each day 151 Figure 15. Observed and optimal decision weights for Type cue (Surface) on decision to Mark rather than Clear targets for stereotype threat females and control females at each day 152 Figure 16. Observed and optimal decision weights for Type cue (Sub) on decision to Warn rather than Clear targets for stereotype threat females and control females at each day 153 Figure 17. Observed and optimal decision weights for Type cue (Sub) on decision to Mark rather than Clear targets for stereotype threat females and control females at each day 154 Figure 18. Observed and optimal decision weights for Class cue (Military) on decision to Mark rather than Clear targets for stereotype threat females and control females at each day 155 Figure 19. Observed and optimal decision weights for Class cue (Military) on decision to Mark rather than Clear targets for stereotype threat females and control females at each day 156 Figure 20. Observed and optimal decision weights for Intent cue (Hostile) on decision to Warn rather than Clear targets for stereotype threat females and control females at each day 157 Figure 21. Observed and optimal decision weights for Intent cue (Hostile) on decision to Mark rather than Clear targets for stereotype threat females and control females at each day 158 Figure 14) or Mark (β21 = .94, p < .05; Figure 15) Surface targets relative to Clearing those targets than control females. Marginally significant main effects were also observed for the decision weights involving whether to Warn versus Clear Military targets (β41 = .76, p = .09; Figure 18) and whether to Mark versus Clear Hostile targets (β51 = -1.65, p = .06; Figure 21); in both cases, stereotype threat women had more difficulty than control condition females reaching the appropriate decision. No significant differences in changes in decision weights over time were observed across the two experimental conditions, indicating that both groups of females were generally learning the procedural decision aspects of target engagement at the same rate. Although the MRCM policy capture analyses provide insight into relative differences in learning between the two conditions, of perhaps greater interest is whether one group of learners was more effective at learning the optimal procedural decision making strategies than the other. As one might expect, all individuals appeared to become more optimal decision-makers over time; that is, the observed decision weights for all female learners—regardless of experimental condition—tended to become closer to optimal as more experience was accrued in the task. However, control condition females appeared closer to achieving the optimal decision weights than stereotype threat females for virtually all engagement decision and informational cue combinations. In fact, the overlap in the 95% confidence intervals between the observed and optimal decision weights at each time point indicated that control condition females had achieved near perfect optimality by Day 3 in their interpretation of a target’s Intent and Class while the stereotype threat learners had not yet achieved this proficiency. Thus, relative to those in the stereotype threat condition, female learners in the control condition nearly always selected the most probable Final Engagement decision associated with a given Intent or Class by the end 159 of the study. Furthermore, although they had not yet achieved optimality, the confidence intervals in the decision weights for stereotype threat and control condition females for the Type decisions on Day 3 did not overlap for three of the four observed decision weights, indicating that control condition females were generally making more probabilistically appropriate Final Engagement decisions based on the Type cues by this point as well. This pattern of results is remarkably similar to those described in the analysis for the knowledge structure clustering observed across conditions, thus lending a degree of support to the notion that control condition female learners appeared to be more proficient at learning efficient decision heuristics within the task than stereotype threat females. In sum, although the significance tests for the main and interaction effects of stereotype threat observed in the estimated MRCM coefficients did not reveal dramatically different patterns of procedural decision-making strategy development between conditions of female learners, the comparison of participants’ observed decision weights to the optimal decision weights inferred from the task revealed a number of insights. Of specific note, control condition females were generally closer to optimal throughout the entire experiment and even achieved optimality for certain informational cues by the end of the study. Thus, the overall patterns of the policy capture analyses generally support the predictions advanced in Hypothesis 12. Task Performance Hypotheses 13 and 14 examined the impact of learning under conditions of stereotype threat on the demonstration of effective performance on the learned task. In the present study, individuals participated in both practice/learning trials plus a final performance trial at the end of each day. The final performance trials were different from the learning trials each day and were designed to be more challenging for all participants; however, they should have been particularly 160 more difficult for individuals who had not effectively learned the procedural and/or strategic aspects of the task. As mentioned in the description of the task environment, performance scores in TANDEM were a function of the number of targets engaged correctly, the number of targets engaged incorrectly, and the number of targets which crossed either of the two defensive perimeters; subsequently, these variables were analyzed with data from the performance trials and the learning/practice trials using the same MRCM equations outlined in Equations 1-3. Table 18 present the results of the MRCM analyses for data from the learning/practice trials. With respect to the performance outcomes measured during the practice trials, no significant main effects or interactions over time were observed for the stereotype threat manipulation. On average, all female learners tended to engage approximately the same number of targets correctly each trial, with the number of correct engagements increasing by approximately one every three trials (β10 = .36, p < .05). Interestingly, the increase in number of correct engagements did not necessarily correspond with a similarly sized decrease in the number of incorrect engagements per trial (β10 = -.07, p < .05), indicating that the number of incorrect target engagements was relatively steady throughout the learning trials. Additionally, although very few of the four targets designed to cross the inner perimeter during the practice trials ever did so (β00 = -.50, p < .05), participants rarely engaged and/or improved their performance over time at engaging/prosecuting the seven targets designed to cross the outer perimeter during the practice trials (β00 = 6.58, β10 = -.05, both ps < .05). Consequently, the relatively small increase in the number of points scored per trial (β10 = 53.7, p < .05) was largely attributable to individuals’ improvement in the procedural decision-making aspects of task performance (i.e., making correct engagement decisions) rather than improvements in strategic 161 Table 18 MRCM Parameter Estimates for Performance Outcomes Measured during Learning/Practice Trials (Hypothesis 13 & 14) Parameter estimates Dependent Model 2 Variable β00 β01 β10 β11 σ τ00 Model 1 Crctti = π0i + π1i(Trialti) + rti Number Correct Engaged τ11 4.76 — .34 — 2.51 5.13 .056 4.85 -.17 .36 -.04 2.51 5.17 .056 5.16 π0i = β00 + u0i — -.07 — 3.74 3.84 .046 4.93 .47 -.07 .00 3.74 3.82 .047 .50 — -.05 — .785 .161 .002 .56 -.12 -.05 .00 .785 .160 .002 π1i = β10 + u1i Model 2 Crctti = π0i + π1i(Trialti) + rti π0i = β00 + β01(Conditioni) + u0i π1i = β10 + β11(Conditioni) + u1i Model 1 Incrctti = π0i + π1i(Trialti) + rti Number Incorrect Engaged π0i = β00 + u0i π1i = β10 + u1i Model 2 Incrctti = π0i + π1i(Trialti) + rti π0i = β00 + β01(Conditioni) + u0i π1i = β10 + β11(Conditioni) + u1i Model 1 InnPenti = π0i + π1i(Trialti) + rti π0i = β00 + u0i Number π1i = β10 + u1i Crossed Inner Model 2 Perimeter InnPenti = π0i + π1i(Trialti) + rti π0i = β00 + β01(Conditioni) + u0i π1i = β10 + β11(Conditioni) + u1i 162 Table 18 (cont’d) Dependent Variable Parameter estimates Model σ 2 β00 β01 β10 β11 τ00 τ11 6.56 — -.06 — .372 .389 .007 6.58 -.03 -.05 -.01 .372 .391 .007 -746 — 52.1 — 9.2e4 1.8e5 1984 Model 1 OutPenti = π0i + π1i(Trialti) + rti π0i = β00 + u0i Number Crossed π1i = β10 + u1i Outer Perimeter Model 2 OutPenti = π0i + π1i(Trialti) + rti π0i = β00 + β01(Conditioni) + u0i π1i = β10 + β11(Conditioni) + u1i Model 1 Scoreti = π0i + π1i(Trialti) + rti π0i = β00 + u0i Task Score π1i = β10 + u1i Model 2 Scoreti = π0i + π1i(Trialti) + rti π0i = β00 + β01(Conditioni) + u0i -720 -52.6 53.7 -3.35 9.2e4 1.8e5 1998 π1i = β10 + β11(Conditioni) + u1i Coefficient estimates in bold are significant at p < .05 target selection. Alternatively, results from the performance trial data (Table 19) revealed significant main effects of stereotype threat on the number of targets incorrectly engaged (β01 = 1.74, p < .05) and marginally significant effects on the number of targets correctly engaged (β01 = -1.52, p = .06) and total number of points scored (β01 = -305, p = .08). The pattern of these results indicate that, on average, females who learned under conditions of stereotype threat tended to make more incorrect engagement decisions while making slightly fewer correct engagements than control condition females, leading to generally lower performance scores. Furthermore, although all 163 Table 19 MRCM Parameter Estimates for Performance Outcomes Measured during Performance Trials (Hypothesis 13 & 14) Parameter estimates Dependent Model 2 Variable β β β β σ τ τ11 00 01 10 11 00 9.03 — 2.89 — 6.92 21.0 5.84 Model 1 Crctti = π0i + π1i(Dayti) + rti π0i = β00 + u0i Number Correct Engaged π1i = β10 + u1i a Model 2A Crctti = π0i + π1i(Dayti) + rti † π0i = β00 + β01(Conditioni) + u0i 10.0 -1.52 3.82 -1.60 6.89 16.4 5.87 π1i = β10 + β11(Conditioni) + u1i Model 1 Incrctti = π0i + π1i(Dayti) + rti 9.18 π0i = β00 + u0i Number Incorrect Engaged — -2.28 — 11.33 20.90 4.00 π1i = β10 + u1i a Model 2A Incrctti = π0i + π1i(Dayti) + rti π0i = β00 + β01(Conditioni) + u0i 8.16 1.74 -3.10 1.59 10.92 17.78 3.49 π1i = β10 + β11(Conditioni) + u1i Model 1 InnPenti = π0i + π1i(Dayti) + rti .94 π0i = β00 + u0i Number π1i = β10 + u1i Crossed Inner Model 2 Perimeter InnPenti = π0i + π1i(Dayti) + rti π0i = β00 + β01(Conditioni) + u0i π1i = β10 + β11(Conditioni) + u1i 164 — -.19 — 1.19 .76 .135 1.06 -.25 -.10 -.18 1.19 .75 .133 Table 19 (cont’d) Dependent Variable Parameter estimates Model π0i = β00 + u0i Number π1i = β10 + u1i Crossed Outer Model 2 Perimeter OutPenti = π0i + π1i(Dayti) + rti π0i = β00 + β01(Conditioni) + u0i β01 β10 β11 9.98 — -.63 — 1.13 1.81 .947 10.0 -.03 .58 .10 1.13 1.83 .954 -1652 Model 1 OutPenti = π0i + π1i(Dayti) + rti σ 2 β00 τ00 τ11 — 642 — 2.9e5 9.4e5 2.4e5 π1i = β10 + β11(Conditioni) + u1i Model 1 Scoreti = π0i + π1i(Dayti) + rti π0i = β00 + u0i Task Score π1i = β10 + u1i a Model 2A Scoreti = π0i + π1i (Dayti) + rti † π0i = β00 + β01(Conditioni) + u0i -1470 -305 796 -273 2.8e5 7.5e5 2.5e5 π1i = β10 + β11(Conditioni) + u1i Coefficient estimates in bold are significant at p < .05 † Signficant at p < .10 a Model includes control variables (coefficients not printed for ease of presentation) learners tended to make more correct engagements (β10 = 3.82, p < .05), fewer incorrect engagements (β10 = -3.10, p < .05) and therefore score more points each performance trial (β11 = 796, p < .05), these rates were significantly lower for the stereotype threatened female learners (β11 = -1.60, p < .05 for correct engagements; β11 = 1.59, p < .05 for incorrect engagements; β11 = -273, p < .05 for task score). Thus, despite achieving approximately similar levels of performance during the learning trials, stereotype threat learners were generally less effective on 165 the more complex performance trials (which is also consistent with previous research on stereotype threat effects and task difficulty, Nguyen & Ryan, 2008). Lastly, an additional set of exploratory analyses were performed on the declarative knowledge tests assessed at the end of each day. As noted previously, many investigations in the research literature have used similar declarative knowledge tests to evaluate the presence of stereotype threat effects in a variety of contexts; consequently, such analyses serve as a useful point of comparison for characterizing the stereotype threat effects observed in the present study. Table 20 presents the results from the MRCM analyses for performance on the declarative knowledge tests. In general, all female learners improved their performance on the examinations slightly each day (β10 = .03, p < .05) suggesting that individuals were improving in their basic understanding of the task with additional experience. Both the main effect of stereotype threat (β01 = -.04, p = .10) and the interactive effect of stereotype threat over time (β11 = -.03, p = .06) achieved marginal levels of significance, indicating a trend in which females in the experimental condition appeared to perform slightly worse on the tests on average and generally did not improve their test performance across days relative to control condition learners. In sum, evidence from the performance trial data and the declarative knowledge test generally appeared to support Hypotheses 13 and 14. 166 Table 20 MRCM Parameter Estimates for Performance on the Declarative Knowledge Assessments Parameter estimates Model 2 β00 β01 β10 β11 σ τ00 Model 1 Testti = π0i + π1i(Dayti) + rti — — .017 .021 .73 .02 π0i = β00 + u0i π1i = β10 + u1i Model 2 Testti = π0i + π1i(Dayti) + rti π0i = β00 + β01(Conditioni) + u0i .76 -.04 † .03 † -.03 .026 .070 π1i = β10 + β11(Conditioni) + u1i Coefficient estimates in bold are significant at p < .05 a Model includes control variables (coefficients not printed for ease of presentation) 167 τ11 .002 .008 DISCUSSION Research on stereotype threat theory and its effects has a rich investigative history, spanning numerous applications, content domains, and psychological/educational disciplines (cf., Nguyen & Ryan, 2008). A primary goal of the present investigation was to extend work in this area of study to the acquisition and development of task-relevant knowledge by individuals facing conditions of stereotype threat during learning activities. Initial research (Rydell, Rydell, & Boucher, 2010; Rydell, Shiffrin, et al., 2010) suggested that the presence of negative stereotypes relevant to a content domain was capable of adversely influencing a targeted individual’s ability to successfully learn and later recall/demonstrate learned task-relevant skills. The current study built upon this proof of concept in a number of ways. First, the incorporation of Kraiger et al.’s (1993) empirically supported and widely cited taxonomy of learning outcomes provided a strong theoretical foundation from which to approach and explore stereotype threat effects during learning. Second, the use of multiple metrics/measurement techniques each tapping substantively different activities and consequences relevant to the knowledge acquisition process permitted a detailed interpretation of the specific learning outcomes affected by stereotype threat. Lastly, the longitudinal design of this investigation enabled examination of the way in which expertise and the knowledge acquired by groups of threatened versus nonthreatened individuals developed over time—a crucial consideration when studying the inherently dynamic process of learning. Table 21 provides a summary of the research hypotheses and the overall result of their accompanying analytic tests. Analyses of the characteristic “shape” and descriptive nature of participants’ knowledge structures, behaviors/cognitions related to efficient and effective task strategy development, and task-specific performance outcomes each revealed unique insights 168 Table 21 Hypothesis Summary Hypotheses Result Hypothesis 1: The knowledge structures of females who learn under conditions of stereotype threat will be less similar to those from top performers/men than the knowledge structures of females who learn under control conditions. Not Supported Hypothesis 2: The knowledge structures of females who learn under conditions of stereotype threat will be less correlated with those from top performers/men than the knowledge structures of females who learn under control conditions. Not Supported Hypothesis 3: The knowledge structures of females who learn under conditions of stereotype threat will be less coherent than the knowledge structures of females who learn under control conditions. Not Supported Hypothesis 4: The knowledge structures of females who learn under conditions of stereotype threat will have significantly more links (i.e., be less parsimonious) than the knowledge structures of females who learn under control conditions. Not Supported Hypothesis 5: The clustering of concepts in the knowledge structures of females who learn under conditions of stereotype threat will be significantly different than that for non-threatened women. Not Supported Hypothesis 6: The similarity between the knowledge structures of females who learn under conditions of stereotype threat with those from top performers/men will improve at a slower rate compared to females who learn under control conditions. Not Supported Hypothesis 7: The correlation between the knowledge structures of females who learn under conditions of stereotype threat with those from top performers/men will improve at a slower rate compared to females who learn under control conditions. Not Supported Hypothesis 8: The coherence of the knowledge structures of females who learn under conditions of stereotype threat will improve at a slower rate compared to females who learn under control conditions. Not Supported 169 Table 21 (cont’d) Hypotheses Result Hypothesis 9: The number of links in the knowledge structures of females who learn under conditions of stereotype threat will increase at a faster rate (i.e., structures will become less parsimonious) compared to females who learn under control conditions. Not Supported Hypothesis 10: The knowledge structures of females who learn under conditions of stereotype threat will demonstrate less integration of related task concepts over time (i.e., fewer and less efficient associations between related task concepts) compared to females who learn under control conditions. Supported Hypothesis 11: Females who learn under conditions of stereotype threat condition will exhibit poorer/more basic cognitive task strategies than females who learn under control conditions. Supported Hypothesis 12: Females who learn under conditions of stereotype threat will develop less optimal procedural decision strategies for task completion than females who learn under control conditions. Supported Hypothesis 13: Females who learn under conditions of stereotype threat will demonstrate worse performance on the learned task than females who learn under control conditions. Supported Hypothesis 14: Females who learn under conditions of stereotype threat will improve their performance on the learned task at a slower rate than females who learn under control conditions. Supported into stereotype threat’s effects on the learning process. Consequently, the discussion of the results below will begin with a brief overview of the rationale and findings from the present study which attempts to integrate the observed pattern of results into a coherent picture of stereotype threat effects during learning. Next, more specific comments/treatments for each of the primary dependent variables (knowledge structures, task strategy, and performance) are explored. Lastly, this section concludes with implications regarding the influence of stereotype 170 threat during the learning process, suggestions for future research within the domain, and study limitations. Summary of Key Findings Identifying the psychological mechanisms and outcomes associated with the learning process which help characterize an individual’s transition from novice to expert within a domain/task has been of significant interest to researchers in a variety of disciplines. A general consensus from this broad stream of research is that the act of learning typically begins with the acquisition of basic declarative and operational facts/statements, advances to the generation of generalizable procedural rules or definitions that help organize task-relevant activities/cognitions, and culminates in the ongoing development of conditional principles and strategies which improve the efficiency/effectiveness with which an individual is able to access and/or apply knowledge towards the completion of task-relevant objectives (Anderson, 1982, 1996; Gagne, 1984; Kanfer & Ackerman, 1989). The learning outcomes associated with these latter two activities are of particular interest as they are often key indicators of the degree of proficiency one has developed within a task/domain comprehension; that is, the factors which appear to most directly distinguish a novice from a learned expert is the possession of well-organized, efficient knowledge schemata that facilitate problem representation and minimize the amount of effortful processing required to derive/coordinate task-relevant strategies and solutions (e.g., Anderson, 1993b; Chase & Simon, 1973; Kahneman & Frederick, 2005). Notably, the production of both of these advanced learning outcomes relies heavily on working memory capacity as it necessitates the ability to perceive and hold information in active awareness while simultaneously integrating that input with either existing knowledge or additional information from the environment (cf., Hunt, 1994). Consequently, factors which affect working memory capacity—such as stereotype 171 threat (Schmader et al., 2008)—should have a noticeable impact on the development of efficient/effective knowledge structures as well as related task strategies/heuristics (Kraiger et al., 1993). The results of the present study were fully consistent with this conclusion. Females learning a complex decision-making task under conditions of stereotype threat appeared to develop knowledge structures whose schematic relations were less efficiently organized than females who did not experience such adverse conditions over the course of the three experimental sessions. The policy capturing analyses used to decompose the observed procedural decision-making strategies developed by individuals within these experimental groups provided further evidence of significant differences in learning proficiency as a result of stereotype threat. Specifically, this set of findings revealed that control condition females were generally more effective at and achieved optimal levels of proficiency more quickly when interpreting and applying learned information in order to make critical task decisions than stereotype threatened females. As expected, differences in the manner by which these concepts were learned ultimately manifested in differences in objective indicators of task performance as well. In sum, the observed pattern of evidence supports what would be anticipated if the working memory capacity of individuals was negatively influenced by stereotype threat within the learning environment—affected participants appear to have greater difficulty integrating task information into procedurally useful rules and strategically efficient heuristics, which ultimately influences their ability to operate as effectively within the performance domain. Stereotype Threat Effects on Knowledge Organization The organization of information into well-constructed procedural knowledge relations is among the most critically important outcomes of the learning process (Anderson, 1993b, 1996; 172 Anderson et al., 2004; Kraiger et al., 1993; Rouse & Morris, 1986). A whole host of cognitive machinery influences learners’ ability to transform their basic comprehension of a task space into meaningful inferences about the interdependencies among components, events, rules, actions, etc. relevant to producing task-specific outcomes/results. Among these mechanisms, working memory plays a significant role by enabling individuals to simultaneously coordinate, interpret, and integrate multiple sources of information into contextually informative knowledge capable of directing task behaviors (e.g., Cantor & Engle, 1993; Engle, 2002; Johnson-Laird, 1983; Kane & Engle, 2003). Consequently, the present study hypothesized that stereotype threat’s empirically documented deleterious influence on working memory efficiency (i.e., Beilock et al., 2006; Beilock et al., 2007; Schmader, 2010; Schmader et al., 2009; Schmader & Johns, 2003) would negatively influence knowledge organization by impairing threatened individuals’ ability to adequately maintain task-relevant information in an activated/accessible state needed to develop an integrated mental representation of the content. Hypotheses 1-10 in the present study examined this general prediction for female learners under conditions of stereotype threat using a variety of measurement techniques intended to characterize a knowledge structure’s representational form. Perhaps the most readily obvious observation from this set of hypotheses was the lack of significant differences between females who learned under stereotype threat conditions and control condition females using the quantitative knowledge structure metrics (i.e., similarity, correlation, coherence, number of links, and the various descriptive metrics from the exploratory graph theory analyses) versus the qualitative analyses. Although there may be many explanations for why this pattern of nonsignificant findings was observed, two interpretations seem particularly plausible. First, the specific set of knowledge concepts used in the present study (Table 7) may have limited the 173 degree of variability in the possible linkages among concepts. As Goldsmith et al. (1991) note, the number and content of concepts used when eliciting knowledge structures holds significant influence on the composition/interpretation of individuals’ observed knowledge organization. In the present study, the use of relatively homogenous knowledge concept subsets (i.e., decisionmaking and procedural/strategy concepts) may have contributed to less variability in the linkages likely to emerge in a given knowledge structure—which is precisely what the quantitative metrics are designed to capture. Furthermore, unlike previous studies which examined knowledge structures using the TANDEM task paradigm, the majority of concepts employed in this investigation were selected so as to draw inferences about how individuals integrated taskcritical information in a manner meaningful to decision-making rather than to examine differences in target selection and/or task operations. Although analyses of the task practice behaviors did not suggest that female learners in either condition approached these task activities in significantly different manners, subtle differences in such “gameplay” activities could have differed enough to be detected by quantitative assessments of knowledge structures had more operational task concepts been used. As it stands though, the relatively more structured/patterned nature of the knowledge concepts implemented in the present study may have been too rigid to generate substantially large differences in the quantitative knowledge structure metrics across female learners. A second possible interpretation is simply that the quantitative indices may not have been particularly informative metrics given that stereotype threat was expected to primarily influence the effectiveness/efficiency of participants’ organization of task-relevant knowledge concepts. Many studies which assess learners’ knowledge structures do so for the explicit purpose of evaluating the degree of relatedness between novice and expert mental models (e.g., Chi, Glaser, 174 & Farr, 1988; Day et al., 2001; Ford & Kraiger, 1995) and/or to use the metrics as predictors of some task-relevant performance outcome (e.g., Dorsey et al., 1999; Kozlowski, Gully, et al., 2001; Schuelke et al., 2009). In such cases, the underlying theoretical assumption is generally that the pattern of relations among network concepts carries some “meaning” about the way in which learners have made sense of a given domain space, though deducing that particular meaning is not of central importance or is too speculative in nature. As a result, the use of quantitative indices in these investigations serve as convenient indicators of knowledge structure quality, and conclusions about what individuals have learned and/or whether the meaning of that organization is logical/interpretable are largely ignored. The present study, on the other hand, specifically proposed that impediments to working memory elicited by stereotype threat would make it more difficult for learners facing such conditions to integrate and store task-critical information in a manner that promoted efficient and effective task decisions and strategies. The specific relations which emerged among concepts in an individual’s knowledge structure and the meaning that those relations implied were therefore of critical concern and expected to be among the most sensitive to stereotype threat. In fact, differences in the characteristic shape/make-up of a network (i.e., what is measured by the quantitative knowledge structure indices) would likely only manifest in situations where a large or severely disruptive situational factor was present—a relatively rare finding in the area of stereotype threat research (Steele & Aronson, 1995; Nguyen & Ryan, 2008). Consequently, the qualitative interpretations of the knowledge structures pursued in Hypotheses 5 and 10 arguably stood to offer the most significant insights into the effects of stereotype threat during learning. To this end, the unique choices made regarding the design elements of the TANDEM architecture in the present study relative to previous administrations 175 of the task (e.g., consideration of multiple subdecision outcomes to make a Final Engagement decision, creating variation in the relative probabilities between subdecision and Final Engagement outcomes, etc.) made it possible to consider a variety of probable structural patterns a priori and then examine whether such relations emerged across groups of learners. As described in the Methods and Results sections, examinations of between-group knowledge structure differences in functional versus feature relations as well as relations indicative of efficient heuristic reasoning were of particular interest. Interestingly, the average knowledge structures of stereotype threatened and non-stereotype threatened females revealed that individuals in both groups tended to exhibit functional network relations among the decisionmaking knowledge concepts consistent with the rules/parameters of the task environment outlined in Tables 4 and 8. However, when changes in the clustering patterns over time were examined, the average knowledge structure of stereotype threat learners appeared to differ noticeably from that of control condition learners. More specifically, by the end of Day 3 in the study, the knowledge structures of females in the control condition appeared to be organized in a manner consistent with a highly efficient heuristic useful for making the correct Final Engagement decision quickly and relatively accurately based on the two most diagnostic pieces of information (a target’s Class and Intent). Alternatively, females in the stereotype threat condition had not extracted this heuristic; their organization of the decision-making concepts seemed more consistent with individuals attempting to memorize all possible outcome permutations rather than the most likely outcomes based on a given piece of information. One useful way in which to characterize this difference between the observed patterns of knowledge structures can be gleaned from the works of Simon (1956, 1990) and later Gigerenzer and colleagues (e.g., Gigerenzer 1991, 1993; Gigerenzer, Todd, & the ABC Research Group, 176 1999; Todd & Gigerenzer, 2007) on bounded rationality and the manner by which heuristics are developed to serve the inferential needs of human decision-makers. In brief, the theory of bounded rationality proposes that because of limitations in information processing capabilities and environmental conditions which impose restrictions on the availability of information, intelligent systems (i.e., humans) make use of approximate computational/judgment processes for many decisions. Such heuristic approximations are generally satisficing in that they stipulate a decision process should end when a reasonable, though not necessarily perfect, conclusion is reached (Simon, 1957). Building upon this notion, Gigerenzer’s “adaptive toolbox” indicates that individuals develop such satisficing heuristics within a given task domain over time based on their past experiences and the situational influences/demands of the environment in which they operate (cf., Gigerenzer & Selten, 2001). Consequently, given the demands of the TANDEM task environment and the performance objectives imposed on individuals, both bounded rationality and the adaptive toolbox would suggest that a critical indicator of effective learning in the present study is the acquisition of efficient decision heuristics which minimize the amount of information processing needed in order to make a reasonably accurate Final Engagement decision for a target. Thus, although all female learners drew accurate functional relations among knowledge structure concepts, the introduction of the stereotype threat manipulation appeared to impair the ability of females in the stereotype threat condition to organize these relations into a pattern consistent with an efficient satisficing heuristic. An explanation derived from the principles of the adaptive toolbox would suggest that this discrepancy might be attributable to differences in individuals’ representation of the environmental/situational demands in the stereotype threat versus control conditions. In their original conceptualization of the effect, Steele and Aronson 177 (1995) postulated that a key motivation for individuals facing stereotype threat is to avoid confirming the validity of the negative stereotype through their actions. In the language of the adaptive toolbox then, individuals facing stereotype threat would perceive this need as an added requirement in their operational environment while those free from such pressures would not. One way in which a person might meet this perceived environmental demand in a learning context is by attempting to perfectly memorize/acquire all information about a particular task and its components, as this would seem to ensure that the individual would not be deficient in the amount of knowledge they possess about the problem space. However, the rationale underlying bounded rationality and the adaptive toolbox indicates that this strategy is likely to be less effective/efficient at improving an individual’s ability to apply that knowledge to domainrelevant problems. Consequently, the organization of knowledge concepts into schemata conducive to simple yet effective heuristics—a hallmark of expert cognition within a domain (e.g., Chase & Simon, 1973; Lipschitz, Levy, & Orchen, 2006)—may run counter to the environmental demands perceived by learners facing conditions of stereotype threat and therefore be less likely to emerge. In sum, differences observed in the longitudinal comparisons of female participants’ knowledge structures across conditions of stereotype threat appear to be relatively consistent with broader theories of cognition depicting the manner by which individuals develop adaptive heuristic reasoning. The reported findings generally support the notion that stereotype threat creates an added environmental pressure for learners (Steele & Aronson, 1995), which, in a learning context, appears to impede the acquisition/development of efficient heuristics conducive to task performance. The conclusions of the present study are similar to those advanced by Rydell, Shiffrin, et al. (2011), who postulated that females experiencing stereotype threat were 178 also less proficient at acquiring effective heuristics for completing a visuospatial task. However, a significant contribution of the current investigation was the extension of this result to the domain of cognitive-based/decision-making heuristics as well as the ability to empirically verify this supposition through the use of knowledge organization data. Furthermore, the relatively rare use of a repeated measures experimental design to examine the influence of stereotype threat over time and knowledge structure development (Ifenthaler et al., 2011; Ifenthaler & Seel, 2005) provided further insight into the dynamical nature of the effect that would have otherwise been unobserved had a cross-sectional, single observation design been implemented. Stereotype Threat Effects on Cognitive Strategy Acquisition As implied in the rationale of the previous section, the effective organization of knowledge into meaningful procedural relations and the development/refinement of cognitive strategies are closely interrelated processes. Knowledge structures provide the mental framework for interpreting causal relations within a task domain which can then be used to systematically direct and coordinate attention and other cognitive resources towards effective task completion (Chi et al., 1981; Chi et al., 1982; Simon & Simon, 1978). In the present study, a variety of indicators were examined in an attempt to capture potential differences in aspects of cognitive strategy acquisition between female learners in the stereotype threat and control conditions. Specifically, the manner by which individuals structured learning activities related to declarative knowledge acquisition and active task practice as well as the extent to which participants engaged in self-regulatory metacognitive activity were each examined. In addition to these strategic learning behaviors, analyses were also conducted to extract and quantify differences in the decision-making heuristics implied by the composition of the average knowledge structures observed for stereotype threat versus control condition learners. 179 With respect to strategic learning behaviors, the overall findings from this set of analyses indicated that although female learners in both conditions were generally similar in their task practice behaviors, stereotype threat learners tended to exhibit poorer declarative knowledge acquisition behaviors by spending less time studying the most task-critical information available in the manual while simultaneously reporting lower levels of metacognitive awareness related to these learning activities. At a broader level, this pattern of results is intriguing in light of previous research that has advanced decrements to motivation/task engagement as the key mechanism accounting for stereotype threat effects, as opposed to the working memory hypothesis upon which the present study was based (e.g., Crocker et al., 1998; Grimm et al., 2009; Major et al., 1998; Marx & Stapel, 2006a, 2006b; Pronin et al., 2004; Wheeler & Petty, 2001). To the extent the former explanation holds true, noticeable discrepancies in the amount of effort and/or time spent engaged in learning-related activities by individuals facing stereotype threat would be expected; furthermore, the effects of disengagement on knowledge acquisition would likely also be greatly exaggerated in exploratory learning environments in which the individual has near complete control over what, when, and how much to learn. In the present study, observed differences in the amount of time stereotype threatened females spent studying the task manual relative to control condition learners as well as mean differences in levels of metacognitive awareness could both be interpreted as supportive of a (de)motivational or disengagement hypothesis. However, the lack of significant differences observed in participants’ task practice behaviors conflicts with this interpretation as it indicates that learners in both groups were exerting approximately the same degree of effort during the hands-on learning periods. Additionally, the apparent disengagement from task manual study only occurred during the final experimental session and was not accompanied with a decrease in 180 the validity of the concept relationships observed in the Day 3 knowledge structures that might be expected if stereotype threatened learners had simply abandoned their attempts to appropriately learn decision cue-outcome relationships. In fact, the number of functional relationships observed among decision-making concepts actually increased between Days 2 and 3 for stereotype threat learners, suggesting that, if anything, stereotype threatened females had become more knowledgeable of the rules of engagement between these time points. Thus, although decreased engagement during certain phases of declarative knowledge acquisition and diminished levels of metacognitive awareness were observed in the present study, the temporal pattern of these variables in conjunction with the demonstration of equally effortful levels of task practice behavior were not fully consistent with the motivational hypothesis for stereotype threat effects. Although speculative in nature, at least two alternative explanations more consistent with the cognitive/working memory hypothesis for stereotype threat effects could instead account for these findings. First, the observed disengagement from task manual study may have occurred because stereotype threat participants felt that they had memorized nearly all of the critical information and thus their time was better spent learning to recognize/apply this information during the hands-on practice sessions rather than engaging in continued study. As summarized in the previous section, the saliency of the stereotype threat manipulation may have stimulated participants to completely memorize the decision-making cues so as to avoid confirming the validity of the negative stereotype. Such a reaction could have subsequently inhibited recognition that memorization was simply the first—but not only—step in learning how to most efficiently/effectively engage targets in the task. This might also partially help explain why the metacognitive awareness of this group was slightly lower as well; if individuals were employing 181 a portion of their self-regulatory capabilities to continually monitor whether they were confirming the negative stereotype as self-characteristic (which seems plausible given the significant between-group differences observed in the perceived stereotype threat measure), they would likely have had fewer cognitive resources to direct towards identifying/correcting changes in task/learning behaviors. Alternatively, a second possible account for the observed findings could be that the apparent decrease in study time was indicative of stereotype learners changing their strategic learning approach by more narrowly focusing their attention on a smaller subset of topics during later stages of learning (e.g., focusing on relations among only a single set of subdecision cues and outcomes, focusing only on how to monitor defensive perimeters, etc.). This possibility would suggest that stereotype threatened learners did recognize the need to improve their task effectiveness; however, their solution for doing so involved attempting to memorize a smaller range of topics rather than gleaning more broadly applicable heuristics— essentially forsaking the forest for the trees. Unfortunately, the present research is unable to empirically contrast the validity of these possible differences in strategic learning approaches adopted by stereotype threatened learners, marking a significant opportunity for future research in this topic area. While the results of the strategic learning behavior analyses outlined above were less easily interpretable with respect to stereotype threat’s influence on learning outcomes, results from the decision-making strategy analyses were relatively clearer. Of greatest interest, the policy capture findings examining whether female learners across experimental conditions had learned to effectively interpret information to make task-critical decisions nicely complemented the qualitative conclusions reached from examination of participants’ averaged knowledge structures. More specifically, the observation that only control condition learners appeared to be 182 acquiring efficient satisficing heuristics relevant to target engagement on the basis of the knowledge organization data was also borne out in examinations of differences in the decision weights used to interpret task information when making critical decisions employed by female participants across experimental conditions. Interestingly, the results of the MRCM analyses on the development of these decision weights revealed few differences in either average values or changes in these values over time between stereotype threat and control condition learners. Comparison of these values from each group against the optimally derived decision weights, however, unequivocally revealed that control condition participants tended to be substantially better at applying knowledge about cue validity to identify the most probable decision outcome. The rationale underlying the results of the decision weight analyses are strongly reminiscent of the theoretical processes encapsulated in Brunswik’s Lens Model of human perception and decision-making (cf., Brunswik, 1952). In brief, the Lens Model provides a conceptual framework for describing the manner by which the objective diagnostic values of informational cues in an environment are replicated in the judgments of human decision-makers. Brunswik postulated that all decisions are informed by a variety of different cues specific to the criteria’s environment. For example, if one were interested in examining judgments about tenure promotion decisions, a candidate’s publication history, years of experience, service to the department, etc. might all be considered cues that hold some informative value in relation to the decision criteria of interest. In the abstract, each of these cues possesses some absolute degree of ecological validity which characterizes its overall importance, relevance, or value to the final criteria within the decision-making environment. However, when an individual attempts to use these same cues to reach a judgment about the criteria of interest, imperfections in the human reasoning process and/or extraneous situational factors can lead to variability in cue utilization 183 that influences perceptions of the informative/diagnostic value of the various cues. Returning to the example above, individuals on a tenure review committee may tend to over- or under-value the relevance of total number of publications in their tenure decisions due to a variety of past experiences which have shaped their belief in the informative value of this cue in making tenure promotion decisions. As a result, a failure to reproduce the most ecologically valid tenure decision based on information about number of publications could occur if the consistency between observed and objective cue values differs greatly. One significant implication of the Brunswikian Lens Model is that, because the ecological validity of a cue represents its maximal utility for decision-making under a given set of circumstances, individuals who are better at reproducing the ecological validity of various decision-relevant cues in their own judgments should be more effective decision-makers within the context of the applicable task environment. In the present study, the ecological validity of the environmental cues were given by the optimal regression weights computed from the policy capture analyses and reflect the objective likelihood of a particular Final Engagement decision given a particular subdecision outcome, while estimates of cue utilization were provided by the policy capture analyses performed on participants’ observed engagement decisions. The systematic underweighting of cue values observed by female learners in the stereotype threat conditions therefore indicates that the salience of stereotype threat during learning activities significantly impaired individuals’ ability to learn how to appropriately weigh information from the environment to make relevant task decisions. Furthermore, trends in the data also suggested that stereotype threat learners were less proficient at calibrating their cue weights towards the ecologically valid cue values over time in relation to control condition learners. 184 Overall then, the findings from the complete set of analyses pertaining to cognitive strategy development appeared most consistent with the hypothesis that stereotype threat primarily impairs cognitive processes (i.e., working memory) as opposed to motivational processes during learning activities. Particularly relevant to exploratory learning contexts, female learners facing stereotype threat seemed less proficient at effectively structuring their declarative knowledge acquisition activities, which may have also been influenced by poorer self-regulation of strategic learning behaviors. Lastly, the added situational pressures introduced by stereotype threat also appeared to demonstrably impede the development of decision-making heuristics which capitalized on the most ecologically valid cue-outcome relations in the task environment, resulting in stereotype threat learners generally exhibiting suboptimal decision strategies. Stereotype Threat Effects on Task Performance As noted in the introductory paragraphs of this manuscript, the overwhelming majority of investigations on stereotype threat have examined effects on task performance during which it was made clear that participants’ activities would be evaluated for diagnostic or comparative purposes (e.g., Nguyen & Ryan, 2008). In the present study, substantial efforts were made to distinguish learning and performance outcome variables to permit separate analysis of these components in subsequent hypothesis testing. Note that it was never a goal of this investigation to directly assess whether learning outcomes significantly impacted performance outcomes; this result has been demonstrated numerous times across numerous contexts (even within the TANDEM task environment, see Bell, 2002 and Kozlowski, Gully, et al., 2001) and is thus less substantively interesting. Nevertheless, examination of the direction and magnitude of the zeroorder correlations presented in Table 9 suggests that many of the learning outcome measures (e.g., the quantitative knowledge structure metrics and cognitive strategy indicators) were related 185 to performance outcomes in the expected manner. Consequently, it is a virtual certainty that stereotype threat’s impact on task performance measures would be mediated by outcomes associated with the quality of task-relevant learning activities. However, the lack of significant mean or longitudinal differences in the objective performance indicators across conditions of stereotype threat during the learning/practice trials somewhat conflicts with this proposition—especially considering that such effects were present during the final performance rounds. Given that control condition females appeared to possess greater comprehension of the task domain than stereotype threatened females, one might have expected control condition females to have achieved better performance during the practice trials as well. One explanation for this pattern of effects may rest with differences in the instructional primes elicited by the exploratory learning recommendations versus those introduced during the performance trials. Consistent with the tenets of guided active learning interventions (e.g., Mayer, 2004), the practice trials were framed as opportunities to explore and experiment within the task space to identify procedures and strategies which made sense to the learner. Additionally, the provided learning recommendations offered some minimal guidance regarding how to orient one’s learning activities, but did not provide strict instruction regarding how best to satisfy the task objectives or solve relevant problems within the environment. Lastly, although feedback was provided to participants regarding their performance following each learning trial, the feedback instructions notified individuals that the purpose of this information was to help identify areas where one might wish to focus greater attention during subsequent learning activities. The end result of this instructional design element is to deemphasize efforts to reach high levels of performance achievement for all learners during practice trials and instead encourages learners to engage in activities with promote task comprehension/mastery. It is 186 therefore common for improvements in training performance during exploratory learning interventions to be less drastic relative to other training designs (Bell, 2002). Of additional significance, the practice trials for each day always presented the exact same targets in the exact same configuration. As a result, participants could begin to memorize, or at least develop a greater awareness of, the correct engagement decisions for targets presented during the learning trials. Given that there was some evidence that stereotype threatened learners may have been more prone to a learning style based on rote memorization rather than the development of efficient heuristics, this strategy may have permitted these individuals to achieve performance levels during the practice trials equivalent to those produced by control condition learners despite differences in overall task comprehension. However, the effectiveness of this strategy is negated when changes in task demands and/or complexity are introduced which require reliance on adaptive expertise and generalized inferences to achieve task performance (Bell & Kozlowski, 2002, 2008; Ford et al., 1998; Ivancic & Hesketh, 2000), a finding which is also consistent with the need for tasks to be sufficiently difficult in order to elicit stereotype threat effects (cf., Nguyen & Ryan, 2008). Given that such learning outcomes appeared to have been more difficult for individuals targeted by stereotype threat to develop, the observed disparity in performance during the final performance rounds for stereotype threatened females was consistent with the notion that poorer task comprehension relates to lower levels of task achievement. A final consideration related to differences in the task performance outcomes observed in the present study concerns the significant interaction effect between time and stereotype threat reported in Table 19. The direction of this finding indicated that stereotype threat effects during learning appeared to compound the performance difficulties faced by stereotype threatened 187 individuals over time. Specifically, the results of these analyses revealed that individuals who did not face the threat of confirming the validity of a negative stereotype were not only better performers on average than those who were faced with the stereotype, but these individuals also demonstrated greater improvements in performance after additional task practice. This finding clearly suggests that the learning activities of stereotype threatened individuals are not simply less proficient overall; they also place such individuals at a distinct disadvantage relative to other learners that cannot be overcome simply by further exposure to the task environment. Implications and Directions for Future Research Since its inception, stereotype threat theory has generated the greatest amount of interest in the domain of educational testing and personal selection/assessment. The contention that stereotype threat effects might contribute to the prevalence of adverse impact and/or exacerbate subgroup performance differences has stimulated significant debate within the research community regarding the extent to which the phenomenon is truly a legitimate concern in high stakes testing (e.g., Cullen et al., 2004; Cullen et al., 2006; Good et al., 2003; Sackett et al., 2004; Sackett & Ryan, 2012; Schmidt, 2002; Steele & Davies, 2003; Stricker & Ward, 2004). While the present results do not speak directly to whether stereotype threat influences performance assessments in applied settings, the findings summarized herein (as well as those reported by Rydell colleagues) demonstrate that stereotype threat can negatively influence the knowledge acquisition process under certain conditions—which most certainly has the potential to manifest as subsequent performance disparities. Interestingly, one reason frequently cited for why stereotype threat effects are unlikely to manifest in high-stakes testing environments is that such contexts tend to be heavily governed by organizational/legal regulations (e.g., Sackett et al., 2001). Training and learning environments, on the other hand, tend to be far less standardized; in 188 fact, a large proportion of job/task learning tends to occur either informally or through other “on the job” sources that often fall outside the purview of organizational control (Loewenstein & Speltzer, 2000). As a result, the likelihood of conditions conducive to stereotype threat emerging during learning or training activities may be greater than in traditional operational testing contexts. Future research in this area should more closely examine organizational education/training practices and policies for their potential to induce stereotype threat effects that may lead to poorer learning outcomes and, ultimately, domain performance for certain groups of individuals. The primary findings from the current investigation indicate that stereotype threat appears to demonstrably influence the acquisition of domain expertise by inhibiting the development of efficient knowledge structures and heuristic reasoning. Though a critical component of performance in many task applications, such cognitive learning outcomes represent only one portion of the possible construct space related to training and knowledge acquisition that might be negatively influenced by stereotype threat effects. The classification of learning outcomes from Kraiger et al. (1993) summarized in Figure 3 offers a number of alternative variables whose relationships with stereotype threat could be assessed. The results of Rydell, Shiffrin, et al. (2010) provide an initial investigation of the acquisition of skill-based outcomes; however, no research yet exists depicting the influence of stereotype threat on the acquisition of affective-motivational outcomes. This particular set of learning outcomes seems a particularly easy/logical next step for future research in the area given the empirical results and interest which have emerged concerning the influence of stereotype threat on individual’s motivational dispositions (e.g., Crocker et al., 1998; Grimm et al., 2009; Major et al., 1998; Marx & Stapel, 2006a, 2006b, etc.). Continued systematic research based on theoretically grounded 189 conceptualizations of the learning outcomes of interest would be highly beneficial to mapping out the boundary conditions and domains of greatest concern relevant to stereotype threat effects during learning. Among the boundary conditions warranting further exploration, the structure of the learning environment seems likely to have a significant influence on the manifestation of stereotype threat effects on learning outcomes. In the present study, individuals engaged in learning within the context of an exploratory learning paradigm. While the benefits and prevalence of this particular learning environment have been described at length, a variety of other instructional delivery techniques and modalities are commonly encountered in educational and training settings (e.g., proceduralized instruction, behavioral modeling, group learning, etc.). Each of these approaches possesses their own unique sets of advantages and disadvantages, and may be more or less conducive to manifestation of stereotype threat effects. For example, the relatively unstructured nature of the exploratory learning paradigm employed in the current investigation may have played a significant role in the acquisition of poorer task heuristics on the part of stereotype threatened learners as these individuals were allowed to pursue whatever strategic learning approach seemed most appropriate. Perhaps a more integrative approach to training that attempted to impose stricter sequencing of the effective declarative, procedural, and strategic knowledge components of task performance would help to alleviate the observed deficiencies in stereotype threatened learners’ knowledge organization. Future research on the influence of stereotype threat under alternative learning paradigms is needed to more clearly elucidate its potential effects on learning outcomes. As has been emphasized in a variety of points in this manuscript, the use of a longitudinal/repeated measures design was instrumental in revealing the influence of stereotype 190 threat effects on the assessed learning outcomes. Despite conceptual frameworks/models which implicate the dynamic nature of the processes involved in the experience of stereotype threat (cf., Schmader et al., 2008), many studies examining the phenomenon have been conducted using cross-sectional designs at a single point in time. While such approaches may be appropriate for certain questions/contexts, the present results exemplify the importance of research examining the influence of stereotype threat utilizing more dynamic methodological designs. Of particular note, nearly all of the significant between-condition differences observed in the present study had not manifested by the end of Day 1, indicating that the effects of stereotype threat on the learning process and other experiential activities may not manifest or begin to create noticeable discrepancies until some amount of time has elapsed. Furthermore, the significant interactions observed between time and experimental condition for many of the measured learning and performance outcome variables also suggest that the adverse effects of stereotype threat may be cumulative such that poor functioning early on during a key developmental process can significantly impede an individual’s ability to achieve a desired level of effectiveness. In short, future examinations of stereotype threat need strongly consider employing methodological approaches such as repeated measures designs or computational modeling techniques in order to gain greater insight into the underlying mechanisms driving differences in relevant learning or performance variables. Of final note, continued research on how the underlying psychological mechanisms and processes which characterize stereotype threat influence relevant outcomes represents both a conceptually and practically important pursuit. For example, the fact that the presence of a negative stereotype pertaining to performance in mathematical domains triggered threat effects for female participants in the present study despite their not believing the task was strongly 191 related to mathematical ability lends a degree of uncertainty to exactly what conditions need be present in order to elicit stereotype threat. As shown in Figure 1, Schmader et al.’s (2008) summary of the research literature suggested that a positive propositional relation between self and ability is necessary to generate conditions conducive to stereotype threat such that individuals must believe they possess aptitude within the ability domain targeted by the stereotype. In many instances, this linkage is assessed through measures of domain identification that indicate the extent which performance in a particular area is relevant or important to one’s sense of self (Smith & White, 2001). While a number of researchers have theorized and reported empirical findings indicating that domain identification is an important precondition for threat effects (e.g., Steele, 1997; Steele & Aronson, 1995), the pattern of results from the present study and those emerging from other recent research (Rydell, Shiffrin, et al., 2010; Jamieson & Harkins, 2007; see Nguyen & Ryan, 2008, for a review) suggest that this relationship may be either nonessential or, perhaps more likely, inadequately specified. Findings consistent with Grimm et al.’s (2009) postulation that stereotype threat effects can be erased by removing the regulatory mismatch between approach/avoidance performance orientations and the reward structure of the environment seem to indicate that all that may be necessary is for individuals to care about doing well/not doing poorly in order for stereotype threat effects to manifest, regardless of their identification with the domain. More generally, decoding the significance and causal relevance of the working memory accounts advocated by Schmader et al.’s (2008) model and the motivational/disengagement accounts exemplified by Grimm et al. (2009) and other researchers is integral to the development of effective interventions capable of minimizing the deleterious effects of stereotype threat. It seems a virtual certainty that resolution of this ambiguity will involve the integration of 192 components from both theoretical foundations in a manner that acknowledges the reciprocal dynamic relationship that exists among an individual’s cognitive, behavioral, and affective reactions. With respect to learning, the design and demands of the present study all but ensured that learned cognitive inefficiencies would result in worse performance. There may be numerous situations though in which such learned inefficiencies simply make a job or task unnecessarily more difficult, leading to greater fatigue, diminished motivation, burnout, lowered satisfaction, etc. Such a pattern might suggest that stereotype threat effects would exert a stronger influence on cognitive- and/or skill-based outcomes during initial phases of learning, which subsequently leak into more affective/motivational aspects. Irrespective of the true nature of this pattern, research focused solely on identifying differences in outcomes attributable to stereotype threat without simultaneously modeling/testing predictions which permit identification of the primary psychological mechanisms and their interactions responsible for these effects will be far less effective at advancing this research domain towards a more complete understanding of the phenomenon. Study Limitations and Generalizability There are a few limitations to the present study which are relevant to interpreting the validity and scope of the reported findings. With respect to the study sample, the use of a repeated measures design was integral in revealing a number of unique effects. However, the relatively small sample size in relation to most between-group examinations of stereotype threat and concerns related to attrition across the three-day study period are worthy of mention. With respect to sample size, findings from the MRCM analyses which were not supported by the data (e.g., Hypotheses 1-9) were generally not close to being statistically significant, suggesting that any differences in these variables attributable to stereotype threat would be very small and the 193 inclusion of additional participants would not be likely to generate a different pattern of results. Attrition rates were also fairly low (approximately 20% between Days 1 and 3) and did not appear to vary greatly across experimental condition. However, it is possible that those female participants who did choose to drop out from either the control or stereotype threat conditions may have differed in some non-trivial manner from those who completed all three days. Note that the use of the MRCM analyses was somewhat helpful here as, unlike repeated-measures ANOVA tests, they do not employ listwise deletion and therefore make use of all available data points; consequently, individuals with missing data and/or who did not complete all experimental sessions were still included in the reported analyses and contributed to the observed pattern of results. Nevertheless, collecting a larger sample size with data from all possible time points would help to improve confidence in the stability and generalizability of the present findings. Another possible limitation is broadly related to the nature of the TANDEM experimental task and the boundary conditions which this task environment implies for drawing generalizable conclusions about the influence of stereotype threat on learning. The TANDEM task environment is one in which rapid application of learned knowledge to make accurate decisions is integral to task completion; consequently, the acquisition of efficient heuristic reasoning is heavily rewarded in this context and one’s failure to do so is likely to lead to noticeable deficiencies in performance. However, there may be instances in which developing and learning to rely on such cognitive “shortcuts” may not be as strongly related to desired outcomes, and thus stereotype threat effects during learning might not lead to such problematic outcomes. For example, complex tasks which require careful, deliberate interpretation of large quantities of information in order to make decisions and/or instances in which there are many possible courses of action (e.g., selecting/promoting upper-level organizational executives, forecasting, military 194 command and control decisions, etc.) may necessitate that all information—rather than only the most diagnostic—is considered before making a decision. Nevertheless, while continued research in this domain is needed to help clarify the generalizability of stereotype threat’s impact to learning in other contexts, it seems plausible that the present pattern of results should extend to situations in which the acquisition of heuristic-based reasoning is central to task performance. Also of relevance to the discussion of generalizability is that the present results were generated under artificial conditions in an experimental lab setting, a common criticism levied against most stereotype threat research (Sackett & Ryan, 2012). Given that this study marks one of the first examinations of stereotype threat during learning—and the only examination of stereotype threat’s influence on the development of knowledge structures and cognitive strategy acquisition over time—a critical goal of this research was to demonstrate the possible effects that might be engendered by the presence of negative stereotypes in a learning environment. The decision to continually expose participants to the stereotype threat manipulation over the course of the current experiment was made in order to maximize the likelihood of observing a significant stereotype threat effect on the assessed learning outcomes. As was mentioned previously, while there are far fewer regulations/standardizations regarding the construction of training as opposed to personnel selection/assessment systems, the presence of conditions as strongly adverse in real world applications as those employed presently is unlikely. It seems prudent then to consider the present findings as an initial proof of concept demonstrating what can happen under conditions of stereotype threat, with further research needed to understand how severe such outcomes may be under varying conditions. Lastly, although mentioned previously, it is worth noting again that the results from the present research were obtained under an exploratory learning paradigm as opposed to a more 195 traditional proceduralized instructional approach. A significant body of research exists detailing the advantages and disadvantages of such active learning approaches to learning and desirable knowledge acquisition outcomes (cf., Bell, 2002; Bell & Kozlowski, 2008; Kozlowski, Toney, et al., 2001). In general, the literature on this topic indicates that such techniques are particularly useful for enabling learners to develop advanced expertise that allows them to apply learned knowledge and/or acquire new relevant knowledge in a variety of circumstances that extend beyond the training environment. However, many conventional instructional techniques are quite effective at developing foundational declarative knowledge and, to a lesser extent, procedural knowledge in learners. While the present research and that conducted by Rydell and colleagues provides some insights into the possible effects of stereotype threat on these desirable learning outcomes, future research directly comparing these instructional methodologies is needed to ascertain whether they are equally susceptible to similar situational factors. Conclusion The present study was designed to provide insight into the influence of stereotype threat effects on the knowledge acquisition of targeted individuals during learning activities. Utilizing a conceptual framework of critical learning outcomes (Kraiger et al., 1993), empirical attention was specifically focused towards understanding how decrements to working memory purportedly associated with the experience of stereotype threat influenced participants’ organization of knowledge and their development of strategic/heuristic reasoning. Assessment of these outcomes over the course of three experimental sessions revealed that females facing stereotype threat during exploratory learning activities had greater difficulty formulating knowledge structures indicative of efficient/adaptive heuristic reasoning and conducive to performance relative to females who did not experience such situational circumstances. Consistent with broader theories 196 of human cognition and the psychological experiences which purportedly accompany the experience of stereotype threat, it was postulated that such deficiencies emerged due to a perceived need on the part of stereotype threatened individuals to not appear less knowledgeable than others. Such motivations may stimulate stereotype threatened learners to focus more heavily on rote memorization of task-critical concepts as opposed to acquiring more proficient heuristics. There can be little argument that the elicitation of stereotype threat requires a highly specific confluence of situational and intrapersonal characteristics in order to manifest. However, the present results and those of many other researchers reveal that when such conditions come to together to concoct this perfect storm, individuals facing its presence can be impacted in ways that diminish the likelihood of successfully attaining many desirable and important achievements. Overall, the theoretical framework and empirical results summarized in this investigation offer a general point of departure for examining the role of stereotype threat in learning/instructional contexts that may also contribute to a better understanding of the manifestation of stereotype threat during performance. Future research capable of expanding on this basic foundation stands to provide important insights into a theoretically intriguing and potentially practically meaningful area of psychological research. 197 FOOTNOTES 1 While certain characteristics of the environment can make the experience of stereotype threat more or less difficult to manage (e.g., Beilock et al., 2007; Spencer et al., 1999; Steele & Aronson, 1995; Steele & Davies, 2003), they should not alter the presence of stereotype threat. Consider the following analogy. One evening you are trying to fall asleep in your bed when you suddenly hear an annoying sound from somewhere in your room that will not stop. While situational characteristics such as the acoustics of the room, how tired you are, your location relative to the source of the noise, etc. all may affect how distracting that sound is to you, they do not change the fact that an irritating sound exists in your room. In much the same way, certain situational features can make the experience and influence of stereotype threat more salient, but they do not determine its existence in any meaningful way (Steele, 1997). Thus, the conceptualization adopted here—and which mirrors that of Steele and Schmader—holds that once the cognitive imbalance between self, group, and ability is struck, stereotype threat is present and its consequences can be experienced. 2 There are a great many more cognitive models than those summarized here that characterize the mechanisms of working memory and information processing and which may be informative to stereotype threat research. For example, features similar to Baddeley’s (1986) working memory model are found in Anderson and colleagues’ adaptive control of thoughtrational (ACT-R) theory of cognition (e.g., Anderson, 1993, 1996; Anderson & Lebiere, 1998; Anderson, Bothell, Byrne, Douglass, Lebiere, & Qin, 2004), Kieras and Meyer’s executiveprocess/interactive control (EPIC) model (1997; Kieras, Meyer, Mueller, & Seymour, 1999), Newell’s (1990) Soar architecture, and Just and Carpenter’s (1992) 3CAPS framework (capacityconstrained collaborative activation-based production system). For the present purposes however, 198 the characterization of these attention-regulating processes as depicted by Baddeley, Engle and their colleagues adequately describe the underlying mechanisms important to the conceptual focus of this study. 3 For ease of discussion, Table 1 and the associated references in-text refer to these as Studies 1-6; Studies 1-3 refer to the experiments presented in Rydell, Rydell, & Boucher (2010) and Studies 4-6 refer to the experiments conducted in Rydell, Shiffrin, et al. (2010). However, aside from their similar research question, the papers are independent from one another. 4 Even in an experiment where stereotype threat is introduced only during the presentation of the declarative knowledge to be learned and not reintroduced again during subsequent measurements of declarative learning, the cognitive imbalance between group, ability, and self that gives rise to accompanying physiological stress, hyperactive monitoring, and thought suppression responses has been triggered and is unlikely to dissipate by the time learning assessments are taken. The experience of threat has already emerged at that point and, if no other manipulations are introduced, is likely to persist through subsequent attempts to extract verifiably correct or incorrect facts, statements, and “knowledge questions” from participants with self-report measures (e.g., Beilock et al., 2007; Rydell, Rydell, & Boucher, 2010, Study 3; Rydell, Shiffrin, et al., 2010, Study 3). 5 The primary modification made to this version of the automated OSPAN from the original was that participants were not provided with any feedback regarding their performance on the math items, the number of letters they correctly recalled following each memory recall, or their final span score at the end of the working memory task. 199 6 Note that for the purposes of the current study, it was assumed that the similarity between any two concepts in an individual’s knowledge structure satisfied the metric axioms of symmetry and the triangle inequality (Tversky, 1977); that is, the similarity ratings participants provided when asked “How similar is A to B?” were assumed identical to the ratings they would provide if asked “How similar is B to A?” Although previous research by Tversky (1977) has suggested that these features of relational similarity can be (and often are) violated in an individual’s judgment, such differences were not important to the aims of the present study. In instances where these discrepancies are of interest, similarity ratings are obtained for every permutation of concept pair combinations (k*(k-1) ratings). In the analysis of such data, the knowledge structures are said to be “directed” (i.e., concepts are organized according to an observed causal ordering or hierarchy, Steyvers & Tennenbaum, 2005) and the primary substantive question of interest involves interpreting the meaning of the causal chain found in the concept relations. However, in instances where one is interested in simply assessing relational organization/similarity and the logical clustering of concepts within an area—as in the current study—collecting a single rating for each pairwise combination of concepts (a total of k*(k-1)/2 ratings) and analyzing undirected knowledge structures is appropriate. 7 The top 15 performers on the experimental task consisted of three females and five males from the stereotype threat condition, and three females and four males from the control condition. Although this distribution appears to indicate that the stereotype threat manipulation had relatively little impact on the performance capabilities of the very best individuals, it does reflect a reasonably large sex difference in task performance. Bayes’ Theorem can be used to calculate the probability that an individual among the top 15 performers will be of a given sex: 200 | | Based on the Day 3 sample sizes reported in Table 2, the probability of a top 15 performer being a woman was only .05, while the probability for males was nearly four times that amount (.19). 8 For the analyses comparing participant knowledge structures against those of top performers (i.e., knowledge structure similarity and correlation), individuals included in the top performer sample were excluded from the comparison sample. Consequently, these MRCM analyses are based on data from only 139 females. 201 APPENDICES 202 APPENDIX A Online Informed Consent Project Title: Learning in a Radar Control Simulation Investigators: James A. Grand, M.A. Ann Marie Ryan, Ph.D. General Description The purpose of this study is to examine learning and problemand Explanation of solving skills that are important to performance in a variety of Procedure: areas. In this study, you will be tasked with learning to operate a computer-based radar control simulation over multiple practice trials spread over three experimental sessions held on consecutive days. In the radar control simulation, you will be presented with a number of targets on your computer screen which you must assess in order to determine what action to take against each contact. You will also be asked to answer questions and provide ratings about what you learned during the simulation that will help us understand how people approach learning in the task and how those outcomes affect overall performance. This study has two parts. First, you will be asked to complete an online questionnaire; these measures include questions about basic demographics, your SAT/ACT scores, and other characteristics related to the radar control simulation you will learn. After completing the questionnaires, you will then be asked to participate in the radar control simulation on the days you selected. Filling out the online questionnaire will begin immediately upon agreeing to participate in the study at the bottom of the page and will take less than 30 minutes [1 credit] to complete. For the second part of the study, you will go to the ADAPT Lab in Room 204 of the Psychology Building for your scheduled sessions. The first session on Day 1 will last approximately 2 hours [4 credits], while the following sessions on Days 2 and 3 will each last 1.5 hours [3 credits each]. If you complete all experimental sessions, you will earn 11 subject pool credits for participation in this study. Additionally, you will also be eligible to receive a $60 award if you complete the entire experiment; additional information about the monetary awards will be provided during the first experimental session. Winners will be determined at the end of the study and will be contacted so they can claim their prize. 203 Estimated Time Now: 30 minutes for online questionnaire [1 credit] Required: Day 1: 2 hours [4 credits] Day 2: 1.5 hours [3 credits] Day 3: 1.5 hours [3 credits] Risks and Discomforts: None anticipated. Benefits: In addition to your compensation for research, you will gain experience operating a computer-based training simulation. This can be valuable as many organizations use computer-based training and simulation programs to teach and/or assess a variety of knowledge, skills, and abilities in their employees or applicants. You will also get to see the results of a number of interesting psychological measures about yourself (e.g., working memory capacity, “knowledge maps”) at the end of the study. Finally, the findings from this research are expected to improve our understanding of learning and problem-solving which can be used to improve the effectiveness of training and development tools in real-world situations. Participation in this study is completely voluntary. By consenting, you also give permission to the experimenters to access or verify your ACT/SAT score from the University Registrar. Your refusal to participate will involve no penalty or loss of benefits to which you are otherwise entitled. You may refuse to participate in certain procedures or answer certain questions. You are free to withdraw this consent and discontinue participation in this project at any time without penalty. If you choose to withdraw from the study prior to its completion, you will receive credit for the time you have spent in the study (1 credit per 30 minutes). If you have concerns or questions about this study, such as scientific issues, how to do any part of it, or to report an injury (i.e. physical, psychological, social, financial, or otherwise), please contact the project coordinator (James Grand, Department of Psychology, Michigan State University, East Lansing, MI 48824; grandjam@msu.edu; 334-787-2141). If you have questions or concerns about your role and rights as a research participant, would like to obtain information or offer input, or would like to register a complaint about this study, you may contact, anonymously if you wish, the Michigan State University's Human Research Protection Program at 517‐355‐2180, Fax 517‐432‐4503, or e‐mail irb@msu.edu or regular mail at 207 Olds Hall, MSU, East Lansing, MI 48824. All data will be stored for at least three years after the project closes. During that time, only the investigators listed on this form will have access to the data collected in this study, and any data reported for scientific purpose will be in aggregate form. The institutional review board will also have access to study data and results if requested in the case of an audit. Your confidentiality will be protected to the maximum extent allowable by law. All reasonable efforts will be taken to ensure your identity and data will be kept secure and confidential. If you agree to participate, please indicate your consent by selecting “Yes” to the question below. Note that you will also be asked to complete an additional consent form for your participation in the in-person radar control simulation. 204 APPENDIX B In-Person Informed Consent Project Title: Learning in a Radar Control Simulation Investigators: James A. Grand, M.A. Ann Marie Ryan, Ph.D. General Description The purpose of this study is to examine learning and problemand Explanation of solving skills that are important to performance in a variety of Procedure: areas. In this study, you will be tasked with learning to operate a computer-based radar control simulation over multiple practice trials spread over three experimental sessions held on consecutive days. In the radar control simulation, you will be presented with a number of targets on your computer screen which you must assess in order to determine what action to take against each contact. You will also be asked to answer questions and provide ratings about what you learned during the simulation that will help us understand how people approach learning in the task and how those outcomes affect overall performance. You have finished the first portion of this study by completing the online questionnaires. In this part of the study, you will begin learning to complete the radar control simulation task. This first session will last approximately 2 hours [4 credits], while the following sessions on Days 2 and 3 will each last 1.5 hours [3 credits each]. If you complete all experimental sessions, you will earn 11 subject pool credits for participation in this study. Additionally, you will also be eligible to receive a $60 award if you complete the entire experiment. Awards will be based on the combined score you receive on the final performance trials for each day; those scoring in the top 10% of all participants in the study will receive the monetary award. Winners will be determined at the end of the study and will be contacted by the investigator so they can claim their prize. Estimated Time Now: 2 hours [4 credits] Required: Day 2: 1.5 hours [3 credits] Day 3: 1.5 hours [3 credits] Risks and Discomforts: None anticipated. Benefits: In addition to your compensation for research, you will gain experience operating a computer-based training simulation. This can be valuable as many organizations use computer-based training and simulation programs to teach and/or assess a variety of knowledge, skills, and abilities in their employees or 205 applicants. You will also get to see the results of a number of interesting psychological measures about yourself (e.g., working memory capacity, “knowledge maps”) at the end of the study. Finally, the findings from this research are expected to improve our understanding of learning and problem-solving which can be used to improve the effectiveness of training and development tools in real-world situations. Participation in this study is completely voluntary. By consenting, you also give permission to the experimenters to access or verify your ACT/SAT score from the University Registrar. Your refusal to participate will involve no penalty or loss of benefits to which you are otherwise entitled. You may refuse to participate in certain procedures or answer certain questions. You are free to withdraw this consent and discontinue participation in this project at any time without penalty. If you choose to withdraw from the study prior to its completion, you will receive credit for the time you have spent in the study (1 credit per 30 minutes). If you have concerns or questions about this study, such as scientific issues, how to do any part of it, or to report an injury (i.e. physical, psychological, social, financial, or otherwise), please contact the project coordinator (James Grand, Department of Psychology, Michigan State University, East Lansing, MI 48824; grandjam@msu.edu; 334-787-2141). If you have questions or concerns about your role and rights as a research participant, would like to obtain information or offer input, or would like to register a complaint about this study, you may contact, anonymously if you wish, the Michigan State University's Human Research Protection Program at 517‐355‐2180, Fax 517‐432‐4503, or e‐mail irb@msu.edu or regular mail at 207 Olds Hall, MSU, East Lansing, MI 48824. All data will be stored for at least three years after the project closes. During that time, only the investigators listed on this form will have access to the data collected in this study, and any data reported for scientific purpose will be in aggregate form. The institutional review board will also have access to study data and results if requested in the case of an audit. Your confidentiality will be protected to the maximum extent allowable by law. All reasonable efforts will be taken to ensure your identity and data will be kept secure and confidential. If you agree to participate, please indicate your consent by signing below. You are also asked for your name and PID to ensure that you receive credit for participating in the study and to verify your ACT/SAT score. Lastly, please provide your e-mail address and/or a phone number where you can be contacted if you win a prize. Print Name: _______________________________ Date: ____________________ Signature: ________________________________ PID: _____________________ E-mail: _____________________________ Phone: ________________________________ 206 APPENDIX C Participant Feedback/Debriefing Thank you for your participation in this investigation. The purpose of this study was to examine sex differences in learning in domains where a negative stereotype exists about a particular group. Some of you received instructions over the course of the experiment which indicated that women are stereotypically worse at mathematical tasks as a result of inefficient information processing skills than men. Previous research suggests that the presence of such stereotypes can cause members of the negatively stereotyped group to engage in overactive performance monitoring and thought suppression activities in their attempt to avoid proving the negative performance stereotype correct by their actions. Unfortunately, these psychological activities often make the problem worse, and can cause such individuals to do worse on related tasks. This phenomenon is known as stereotype threat. In the present study, we were interested in the extent to which women who were exposed to stereotype threat approached and learned the radar control task differently than women who were not exposed to the threat and men. Our expectation was that informing women that this task measures skills related to math performance—a domain in which men are typically believed to be more proficient—would induce stereotype threat and thus negatively influence learning and performance outcomes in the task. However, please note the following:   Men and women tend to differ very little, if at all, on most tests of mathematical ability. Despite prevalent stereotypes that men are better at math than women, there is substantial evidence which reports that, on average, these differences are often very small and/or not statistically significant. Additionally, there is no evidence to suggest that women are worse at distinguishing relevant from irrelevant information than men. For further reading on this topic, see: o Feingold, A. (1988). Cognitive gender differences are disappearing. American Psychologist, 43, 95-103 o Feingold, A. (1992). Sex difference in intellectual abilities: A new look at an old controversy. Review of Educational Research, 62, 61-84 o Hedges, L.V., & Nowell, A. (1995). Sex differences in mental test scores, variability, and numbers of high-scoring individuals. Science, 269, 41-45). Individuals will be compared against members of their same sex and experimental condition to determine who will win the monetary awards. We expected that women exposed to stereotype threat would be less proficient at the radar control simulation task than others. To ensure that everyone in the study has an equal opportunity to win the monetary awards then, awards will be distributed to males and females separately according to their experimental condition. More specifically, rewards will be allocated according to the following chart: Men Women Top 10% Top 10% ST instructions No ST instructions Top 10% Top 10% Thus, one set of awards will be rewarded to the top 10% of men who received stereotype threat instructions, while a different set of awards will be given to the top 10% of women who received stereotype threat instructions; the same is true for men and women who did not receive stereotype threat instructions. 207 We hope that the information you provided in this study will help us determine whether unfortunate situational influences such as stereotype threat might adversely impact the manner by which people learn and make sense of information. We hope that in doing so, we can ultimately better design and inform educational and training policies to ensure that members of all demographic categories can effectively learn and perform in traditionally stereotyped domains. If you have any questions about this study, please notify the investigator now. We will e-mail you with the results of the knowledge structures you completed for this study within two weeks of this experimental session and will contact winners of the monetary awards upon completion of the experiment. If you have any additional questions about the study or your involvement in it, contact: James Grand, M.A. 348 Psychology Building Phone: (334) 787-2141 e-mail: grandjam@msu.edu If you have questions or concerns about your rights as a research participant, please contact Michigan State University's Human Research Protection Program at 517‐355‐2180, Fax 517‐432‐4503, or e‐mail irb@msu.edu or regular mail at 207 Olds Hall, MSU, East Lansing, MI 48824 208 APPENDIX D Study Protocol Day 1 Session Setting up participant computers 1. Check the Experiment Information Sheet for the day (located in study folder) to see how many people are signed up for the session A. The experiment information sheet contains a number of important items: i. A roster of all individuals signed up to participate in the study, with columns to mark which days/sessions the person attended ii. The dates over which the sessions will take place iii. The condition for the session participants is located in the top right corner 1. O = condition 1 (stereotype threat) 2. T = condition 2 (no stereotype threat) B. James will make sure this information sheet is updated with the correct roster prior to each Day 1 session 2. Log into the lab computers using the following info: A. Username: XXXXX B. Password: XXXXX C. NOTE: If running in the Optima lab, make sure the domain name is set to Psychology 3. Place an informed consent sheet onto every logged in participant computer 4. On the desktop, open the folder labeled “Radar Control Simulation” A. Double-click the shortcut labeled “RadarSim Start” to open up the start page for the Radar Control Simulation experiment. B. Leave the start page open and go back to the Radar Control Simulation folder. C. Double-click the shortcut labeled “OSPAN” to open up the Automated Operation Span working memory task. D. Press Alt+Tab until you get back to the Radar Control Simulation folder and close the folder. E. At this point, only the OSPAN task and Radar Control Simulation start page should be opened. 5. If needed, Alt+Tab back to the OSPAN task so that it is on the screen. A. In the window labeled “Enter Subject Number,” enter in the number written in the top right corner of the informed consent document you placed at that computer and press OK. NOTE that it is very important that this number be correct, so double check to make sure you’ve entered in the correct number before pressing OK. B. In the window labeled “Enter Session Number,” leave it at 1 and press OK. C. At the confirmation window, press OK. You should now see the instructions for the OSPAN task on screen. Leave this open. 209 6. Repeat steps 4-6 for every participant computer you have logged in. A. Once done, the participant computers are all set! Preparing the Experimenter Station & PowerPoint Training 7. Log into the experimenter station with the following log-in information A. If running in Optima Lab (Room 335) i. Make sure domain is set to IOPSYCH ii. Username: XXXXX iii. Password: XXXXX B. If running in Adapt Lab (Room 204) i. Username: XXXXX ii. Password: XXXXX C. Once the computer is turned on, check to make sure the volume is turned on for the computer 8. Log into the experimenter website by doing the following: A. Go to 35.8.48.6/james1/exppage.asp B. Password: XXXXX 9. Check to make sure that the area labeled “Current Condition” is set to the correct condition number for the experiment that day A. To check the condition number for the group you are running, look in the top right corner of the Experiment Information Sheet i. If O (stands for “one”) is circled, Current Condition should be set to 1 ii. If T (stands for “two”) is circled, Current Condition should be set to 2 B. If you need to change the Current Condition, select the appropriate number from the drop-down box and click the button labeled Submit i. IF YOU NOTICE THIS IS DIFFERENT THAN IT SHOULD BE, PLEASE CALL ME AND CHECK WITH ME FIRST BEFORE CHANGING! C. Close out of the experimenter page when finished 10. Turn on the projector in the room by pressing the power button on the bottom of the projector 11. On the desktop, double-click on the icon labeled “Microsoft PowerPoint Viewer” to open up the PowerPoint training presentation A. When the program starts, it will ask you to browse to the file you wish to run. Navigate to the Desktop, and open the folder labeled Radar Control Simulation Training B. Select the appropriate PowerPoint training file based on the experimental condition: i. If running participants for Condition 1 (O on the Experiment Information Sheet), select the file named “training_condition1 (ST)” ii. If running participants for Condition 2 (T on the Experiment Information Sheet), select the file named “training_condition2 (NST)” 210 C. The PowerPoint viewer will start with the selected presentation. To make the presentation full screen, click the on the small arrow cursor in the bottom right of the window and select full screen i. The training video is now set to run – I recommend clicking to start to make sure it works and the sound is turned on! 12. You’re all set! The room is now ready to go for participants. Running the Task 13. Set out the Experimenter Information Sheet. As participants enter, provide the following instructions: I have placed the experiment roster for today’s study here. Please find your name on the roster and sign your name under the Day 1 column. I will use this sheet to record attendance and assign credits, so it is important that you sign in on this sheet every day you attend the study. If you do not see your name listed on the roster please let me know. [If somebody’s name is not on the roster, go to the Troubleshooting section in the back of this document] After you have signed in on the roster, you may select any computer that is logged in and has a consent form. Please take a moment to read this form as it provides important information about today’s experiment. If after reading the form you still wish to participate in the study, please provide the requested information and sign your name in the spaces provided on the back of the sheet. Please do not press anything on the computer until I instruct you to do so. 14. Unless all participants have arrived, wait to begin the study until a few minutes after the scheduled start time. When you are ready to begin: A. Collect the informed consent forms from participants – check to make sure they are signed! B. Place the Do Not Disturb sign on the door and shut the door. i. Once the Do Not Disturb sign is up and the door is closed, the experiment has started. No one else is allowed in the room, so if someone arrives late, tough beans. 15. To begin the study, read the following to participants: Thank you for attending the Radar Control Simulation study today. Today is the first of three sessions for this study; this session will last approximately 2 hours, while the remaining sessions will last approximately 1.5 hours. At the end of today’s session, I will provide you with a reminder slip that has the date and time for the next session. Note that the credits for this experiment will be updated on the HPR site after the third session. Before we begin, please place all your cell phones on silent and put them away for the remainder of today’s session. 211 Before starting the Radar Control Simulation, you will first complete a brief task in which you will be asked to memorize a sting of letters while performing math problems. Please follow the instructions exactly as they are presented to you on the screen. When you complete the task, raise your hand to let me know you are finished and then remain sitting quietly until everyone is finished. 16. Once everybody has finished taking the OSPAN task, read the following instructions: Please press the Space Bar to exit the memory task. [Wait to make sure everyone closes out of the task]. You should now be at the log-in page for the Radar Control Simulation task. Please enter “start” as the start code and then press submit. On the next page, please enter in the requested information. If you receive an error, check to make sure you entered in your PID correctly; if you still receive an error, please let me know. [If somebody continues to receive an error when trying to log-in, go to the Troubleshooting section in the back of this document] 17. After everybody has successfully logged in, read the following instructions: I will now begin a brief video that will introduce you to the study as well as how to operate the Radar Control Simulation. Please pay close attention to the video. If you need to move in order to see the screen better, please feel free to do so now. 18. Begin the PowerPoint presentation by clicking on the screen (you should turn off a couple of the lights in the room to make it easier to see). The presentation should proceed automatically through the slides until it reaches the end (lasts about 10 minutes). Once it is complete, read the following instructions: Before we start the Radar Control Simulation for today, are there any questions? [If you get any questions about how to play the game, let them know that everything they need to know to play the game can be found in the manual and they will need to look it up] You may now press Next on your screen. We will now begin the familiarization trial for today where you will be able to briefly look at the task manual and radar screen before beginning the first learning trial for the day. NOTE that the presentation of the manual and radar screen during the familiarization trial is VERY short. This time is simply meant to allow you to see what these screens look like and familiarize yourself with how to use some of their functions. I will now put the first password onto the screen. 19. Advance the PowerPoint presentation to the first password. Check to make sure that everybody is able to enter the manual without any problems. Once the timer for the manual has expired and everybody is on the next green screen, read the following instructions: You will now have a chance to familiarize yourself with the radar screen and how to use its menus. Please make note of the following: 212 A. To hook a contact on the radar, you will use the LEFT-mouse button. To operate your sensors and radar menus, you will use the RIGHT-mouse button. B. At the start of every trial, you MUST RIGHT-CLICK on Start Exercise from the OPER menu to begin the trial. I will now provide the password needed to advance to the radar screen. 20. Advance the PowerPoint presentation to the next password. At this point you should walk around the room and make sure that everybody was able to enter the game and clicked on “Start Exercise” from the OPER menu. To do so, glance at the timer on each person’s screen – if it still reads “Paused 1:00” they have not started the simulation. Show them how to start the simulation by RIGHT-CLICKING on the OPER menu and then RIGHT-CLICKING on Start_Exercise. Remind the participant that they will need to do this every time they begin a new trial. Once everybody has completed the familiarization trial and viewed the practice feedback, read the following instructions: From here on, I will project the passwords needed to advance the task up on the projector screen. When you see the password on the screen, you may enter it into the password field and press Submit. I will put the password on the screen once everybody is ready to proceed to the next step. Once you have completed all the learning trials and the single performance trial for today, you will be asked to complete a series of measures. These measures are important to the task, and therefore it is important to us that you to attend to them as fully and conscientiously as you do the Radar Control Simulation. Before we begin, please put on the headphones located at your computer. Once you have put the headphones on, press the button labeled “Test Sound” to confirm that your headphones are working and that you can hear audio. Please raise your hand if you are unable to hear any sound from your headphones. [Check to make sure everyone can hear the audio from the computer. If somebody’s computer is not playing audio, make sure that the volume is not muted and is turned up enough to hear. If somebody would like to adjust the volume on the headphones, they may do so by using the controls located on the headphone cord.] If there are no questions before we begin, I will now put the first password on the projector screen. 21. Advance the PowerPoint presentation to the next password. From here on, you should simply monitor participants as they complete the task and provide the passwords when needed. NOTE that you will only present the next password once all participants are ready to advance (i.e., you see the green screen on everybody’s computer). A. Some things to keep track of: i. There is no talking allowed or use of cell phones. If anybody is doing either of these please ask them to leave. 213 ii. Jot down a note of any unusual activity that you see (i.e., somebody’s not paying attention to the task, talking, cell phone use) or any computer/task malfunctions (i.e., you had to restart someone’s game, etc.) 22. While people are completing the task, make sure you fill out the reminder slips and set them out for people to take with them as they leave. Closing out the experiment 23. Once the last person has finished the experiment and has left the room, you will need to go to all the computers that were used by a participant that day and complete two tasks: A. Enter in “finish1” as the End Day code at the screen currently on the computer i. This should advance the screen to the Welcome Back screen (this is the screen where people will begin on Day 2) ii. Once on the Welcome Back screen, exit out of the window B. Open up the Radar Control Simulation folder on the desktop and double-click the Copy OSPAN data program i. This will open up a command window that will ask you to press any key to continue...go ahead and press any key to continue ii. The program will copy over the TWO working memory/OSPAN files from the participant’s computer to the server iii. When it’s done, press any key to continue and it will close on its own 24. Close the training/password PowerPoint Presentation and log out of the experimenter computer 25. Make sure all the consent forms and Experiment Information Sheet are put away 26. Turn off the lights and projector in the room (you may leave the participant computers on and logged in) and make sure the door is closed when you leave Day 2 Session Setting up participant computers 1. Check the Experiment Information Sheet for the day (located in study folder) to see how many people are signed up for the session A. The experiment information sheet contains a number of important items: i. A roster of all individuals signed up to participate in the study, with columns to mark which days/sessions the person attended ii. The dates over which the sessions will take place iii. The condition for the session participants is located in the top right corner 1. O = condition 1 (stereotype threat) 2. T = condition 2 (no stereotype threat) B. James will make sure this information sheet is updated with the correct roster prior to each Day 1 session 214 2. If needed, log into the lab computers. 3. On the desktop, open the folder labeled “Radar Control Simulation” A. Double-click the shortcut labeled “RadarSim Restart” to open up the welcome back log-in page for the Radar Control Simulation experiment. B. Close the Radar Control Simulation folder. C. At this point, only the Radar Control Simulation Restart page should be opened. 4. Repeat steps 3 for every participant computer you have logged in. A. Once done, the participant computers are all set! Preparing the Experimenter Station & PowerPoint Presentation 5. Log into the experimenter station. 6. Turn on the projector in the room by pressing the power button on the bottom of the projector 7. On the desktop, open the folder labeled Radar Control Simulation Training and open up the appropriate PowerPoint training file (it takes a second to open, it’s a big file...) A. Select the appropriate PowerPoint training file based on the experimental condition: i. If running participants for Condition 1 (O on the Experiment Information Sheet), select the file named “training_condition1 (ST)” ii. If running participants for Condition 2 (T on the Experiment Information Sheet), select the file named “training_condition2 (NST)” B. Navigate the presentation to begin at Slide 53 C. Start the presentation in full screen mode and make sure it is projecting on the screen 8. You’re all set! The room is now ready to go for participants. Running the Task 9. Set out the Experimenter Information Sheet. As participants enter, ask them to sign in under the appropriate day, take a seat at one of the logged-in computers, and enter their PID on the log-in screen (NOTE: Participants do not have to sit at the same computer as they did on previous days) 10. Unless all participants have arrived, wait to begin the study until a few minutes after the scheduled start time. When you are ready to begin: A. Collect the informed consent forms from participants – check to make sure they are signed! B. Place the Do Not Disturb sign on the door and shut the door. i. Once the Do Not Disturb sign is up and the door is closed, the experiment has started. No one else is allowed in the room, so if someone arrives late, tough beans. 215 11. To begin the study, read the following to participants: Welcome back to the Radar Control Simulation study. Today’s session will last approximately 1.5 hours. Unlike yesterday’s session, there will be no training video nor will you complete the memorization task. Before we begin, please place all your cell phones on silent and put them away for the remainder of today’s session. Similar to yesterday’s session, we will begin with a short familiarization period for you to re-acclimate yourself with the radar manual and screen. I will present you with the passwords to enter these screens when everyone is ready to proceed Are there any questions before we get started? [Answer any questions] At this point, you may press the Next button on your computer and I will provide you with the first password. 12. Advance the PowerPoint presentation to the first password. EVERYBODY WILL HAVE 30 SECONDS TO LOOK AT THE MANUAL. ONCE THEY ARE DONE WITH THE MANUAL, YOU WILL PRESENT THE NEXT PASSWORD TO BEGIN A SHORT TRIAL WITH THE RADAR TASK. Once participants have completed the familiarization trial and feedback page, read the following instructions: The remainder of today’s session will be identical to yesterday’s session; I will project the passwords needed to advance the task up on the projector screen when everybody is ready to advance. When you see the password on the screen, you may enter it into the password field and press Submit. Once you have completed all the learning trials and the single performance trial for today, you will again complete a series of measures. These measures are important to the task, and therefore it is important to us that you to attend to them as fully and conscientiously as you do the Radar Control Simulation. Before we begin, please put on the headphones located at your computer. Once you have put the headphones on, press the button labeled “Test Sound” to confirm that your headphones are working and that you can hear audio. Please raise your hand if you are unable to hear any sound from your headphones. If there are no questions before we begin, I will now put the password on the projector screen. 13. Advance the PowerPoint presentation to the next password. From here on, you should simply monitor participants as they complete the task and provide the passwords when needed. NOTE that you will only present the next password once all participants are ready to advance (i.e., you see the green screen on everybody’s computer). 216 C. Some things to keep track of: i. There is no talking allowed or use of cell phones. If anybody is doing either of these please ask them to leave. ii. Jot down a note of any unusual activity that you see (i.e., somebody’s not paying attention to the task, talking, cell phone use) or any computer/task malfunctions (i.e., you had to restart someone’s game, etc.) 14. While people are completing the task, make sure you fill out the reminder slips and set them out for people to take with them as they leave. Closing out the experiment 15. Once the last person has finished and left the room, you will need to go to all the computers that were used by a participant that day and enter in “finish2” as the End Day code A. This should advance the screen to the Welcome Back screen (this is the screen where people will begin on Day 3) B. Once on the Welcome Back screen, exit out of the window 16. Close the training/password PowerPoint log out of the experimenter computer 17. Put away the Experiment Information Sheet 18. Turn off the lights and projector in the room (you may leave the participant computers on and logged in) and make sure the door is closed when you leave Day 3 Session Setting up participant computers 1. Check the Experiment Information Sheet for the day (located in study folder) to see how many people are signed up for the session A. The experiment information sheet contains a number of important items: i. A roster of all individuals signed up to participate in the study, with columns to mark which days/sessions the person attended ii. The dates over which the sessions will take place iii. The condition for the session participants is located in the top right corner 1. O = condition 1 (stereotype threat) 2. T = condition 2 (no stereotype threat) B. James will make sure this information sheet is updated with the correct roster prior to each Day 1 session 2. If needed, log into the lab computers. 3. On the desktop, open the folder labeled “Radar Control Simulation” A. Double-click the shortcut labeled “RadarSim Restart” to open up the welcome back log-in page for the Radar Control Simulation experiment. 217 B. Close the Radar Control Simulation folder. C. At this point, only the Radar Control Simulation Restart page should be opened. 4. Repeat steps 3 for every participant computer you have logged in. A. Once done, the participant computers are all set! Preparing the Experimenter Station & PowerPoint Presentation 5. Log into the experimenter station. 6. Turn on the projector in the room by pressing the power button on the bottom of the projector 7. On the desktop, open the folder labeled Radar Control Simulation Training and open up the appropriate PowerPoint training file (it takes a second to open, it’s a big file...) A. Select the appropriate PowerPoint training file based on the experimental condition: i. If running participants for Condition 1 (O on the Experiment Information Sheet), select the file named “training_condition1 (ST)” ii. If running participants for Condition 2 (T on the Experiment Information Sheet), select the file named “training_condition2 (NST)” B. Navigate the presentation to begin at Slide 87 C. Start the presentation in full screen mode and make sure it is projecting on the screen 8. You’re all set! The room is now ready to go for participants. Running the Task 9. Set out the Experimenter Information Sheet. As participants enter, ask them to sign in under the appropriate day, take a seat at one of the logged-in computers, and enter their PID on the log-in screen (NOTE: Participants do not have to sit at the same computer as they did on previous days) 10. Unless all participants have arrived, wait to begin the study until a few minutes after the scheduled start time. When you are ready to begin: C. Collect the informed consent forms from participants – check to make sure they are signed! D. Place the Do Not Disturb sign on the door and shut the door. iii. Once the Do Not Disturb sign is up and the door is closed, the experiment has started. No one else is allowed in the room, so if someone arrives late, tough beans. 11. To begin the study, read the following to participants: Welcome back to the final session for the Radar Control Simulation study. Today’s session will last approximately 1.5 hours and the procedure will be identical to yesterday’s session. 218 Before we begin, please place all your cell phones on silent and put them away for the remainder of today’s session. Similar to yesterday’s session, we will begin with a short familiarization period for you to re-acclimate yourself with the radar manual and screen. I will present you with the passwords to enter these screens when everyone is ready to proceed Are there any questions before we get started? [Answer any questions] At this point, you may press the Next button on your computer and I will provide you with the first password. 12. Advance the PowerPoint presentation to the first password. Advance the PowerPoint presentation to the first password. EVERYBODY WILL HAVE 30 SECONDS TO LOOK AT THE MANUAL. ONCE THEY ARE DONE WITH THE MANUAL, YOU WILL PRESENT THE NEXT PASSWORD TO BEGIN A SHORT TRIAL WITH THE RADAR TASK. Once participants have completed the familiarization trial and feedback page, read the following instructions: The remainder of today’s session will be identical to yesterday’s session; I will project the passwords needed to advance the task up on the projector screen when everybody is ready to advance. When you see the password on the screen, you may enter it into the password field and press Submit. Once you have completed all the learning trials and the single performance trial for today, you will again complete a series of measures. These measures are important to the task, and therefore it is important to us that you to attend to them as fully and conscientiously as you do the Radar Control Simulation. Before we begin, please put on the headphones located at your computer. Once you have put the headphones on, press the button labeled “Test Sound” to confirm that your headphones are working and that you can hear audio. Please raise your hand if you are unable to hear any sound from your headphones. If there are no questions before we begin, I will now put the password on the projector screen. 13. Advance the PowerPoint presentation to the next password. From here on, you should simply monitor participants as they complete the task and provide the passwords when needed. NOTE that you will only present the next password once all participants are ready to advance (i.e., you see the green screen on everybody’s computer). E. Some things to keep track of: iv. There is no talking allowed or use of cell phones. If anybody is doing either of these please ask them to leave. 219 v. Jot down a note of any unusual activity that you see (i.e., somebody’s not paying attention to the task, talking, cell phone use) or any computer/task malfunctions (i.e., you had to restart someone’s game, etc.) 14. While people are completing the task, make sure you fill out the reminder slips and set them out for people to take with them as they leave. Closing out the experiment 15. Close the training/password PowerPoint Presentation and log out of the experimenter computer 16. Make sure the Experiment Information Sheet is put away 17. Turn off the lights and projector in the room (you may leave the participant computers on and logged in) and make sure the door is closed when you leave. Troubleshooting I don’t foresee there being any significant technical issues with the task...however, below I’ve listed a few scenarios that may come up. Note that you can always call me if you are having difficulty with something and I will do my best to troubleshoot remotely (or on site, if I’m around) 1. Somebody’s not on the Participant Roster! a. This means that the individual did not complete the sign-up on the HPR site and, by extension, probably did not complete the required online questionnaire (so you’ll have the problem below as well). b. If you have enough computers, let them write in their name on the roster and participate. c. If there is enough time before the session starts (i.e., at least 5 minutes), start up a web browser and go to http://35.8.48.6/james1. Have them complete the consent and questionnaire there. d. If there’s not enough time before the session starts, see #2 below 2. Somebody’s trying to log-in with their PID and it’s not working! a. This could mean a few different things, but most likely it’s either A) they entered a different/the wrong PID when they completed the online questionnaire or B) the participant did not complete the online questionnaire portion of the study b. In either case, here’s what you should do: i. Ask the person to give you their PID ii. Continue on with the PowerPoint training iii. Call me as soon as you can – there are some things I will need to do in the database so that they can continue on with the task 220 c. In the case of scenario B above, I will make a temporary fix that will allow them to participate in the session. However, they will need to take the online questionnaire before they leave the lab i. At the end of the session, have them go to http://35.8.48.6/james1 and complete the consent and questionnaire 3. Something does not seem to be working correctly in the task! (e.g., not displaying correctly, error message, etc.) a. Try to restart the simulation for the person by exiting the window and clicking on the RadarSim Restart shortcut in the desktop folder b. If this doesn’t fix the problem or the error persists, call James 221 APPENDIX E Practice Trial Feedback Page 1 Participant Feedback INSTRUCTIONS You will now have an opportunity to review your activity from this past practice period. You should use the information provided on the following screens to guide your study and practice. Remember that once you leave a screen, you cannot go back to review it again. You will have 1 minute to review the feedback pages. Advance to the next screen to begin reviewing your feedback. Page 2 SCORING Your total score on this past trial: XXXX points TARGET ENGAGEMENT RESULTS Number of non-marker targets engaged: XX out of 21 targets You correctly engaged XX targets, which earned you XX points. You incorrectly engaged XX targets, which cost you XX points. PERIMETER INTRUSIONS You allowed XX targets to cross the inner perimeter, which cost you XX points. You allowed XX targets to cross the outer perimeter, which cost you XX points. Together, you lost XX points due to targets crossing your defensive perimeters. Advance to the next screen to continue reviewing your feedback. Page 3 CONTACT DECISIONS Number of contacts for which you made a correct Type (Air, Surface, Sub) decision: XX out of XX total contacts engaged Number of contacts for which you made a correct Class (Civilian, Military) decision: XX out of XX total contacts engaged Number of contacts for which you made a correct Intent (Peaceful, Hostile) decision: XX out of XX total contacts engaged Number of contacts for which you made a correct Final Engagement (Clear, Warn, Mark) decision: XX out of XX total contacts engaged Remember, you must make each of the four decisions above correctly for EACH target in order to receive points! Even making one of the above decisions incorrectly will cause you to lose points. Advance to the next screen to continue reviewing your feedback. 222 Page 4 OTHER INFORMATION Average time spent per target: XX seconds Number of pop-up targets you engaged: XX Number of pop-up targets you engaged correctly: XX Number of high priority targets you engaged (correctly or incorrectly) that threatened the inner perimeter: X out of 4 Number of high priority targets you engaged (correctly or incorrectly) that threatened the outer perimeter: X out of 7 You did not hook a marker target on this trial. [You hooked and used a marker target correctly on this trial.] [You hooked a marker target, but engaged it. Engaging marker targets does not earn you points in the trial.] You did not use the Zoom feature on this trial. [You used the Zoom feature on this trial.] Page 5 FEEDBACK COMPLETE This concludes the trial feedback. Please click Next to exit the feedback program and wait for further instructions. 223 APPENDIX F Exploratory Learning Recommendations (adapted from Bell, 2002) An effective method for learning the skills required in the Radar Control Simulation is to explore the task and develop your understanding of it. As you practice the scenarios, explore the task to understand what is occurring in the scenario and to discover the best strategies to deal with the situation. Also, experiment with different strategies and methods in your attempts to learn the skills needed to perform effectively in the simulation. The following is a list of questions that may prove useful in guiding your learning within the task: Gathering and Interpreting Information  Have you learned how to efficiently gather the required number of information pieces from your sensors needed to process a target?  Have you learned how to interpret the cues needed to make accurate decisions about a target's Type?  Have you learned how to interpret the cues needed to make accurate decisions about a target's Class?  Have you learned how to interpret the cues needed to make accurate decisions about a target's Intent?  Have you learned how to use information about a target's Type, Class, and Intent to make accurate decisions about whether to Clear, Warn, or Mark the target?  Have you figured out how to use the in-game feedback to determine whether you correctly engaged a target? Monitoring Defensive Perimeters  Have you discovered how to use the Zoom function to help you look for targets near the inner and outer perimeters?  Have you learned how to identify and use the Marker targets in order to locate and monitor the outer defensive perimeter?  Have you developed a strategy for monitoring your inner and outer perimeters for pop-up targets? Prioritizing Targets and Maximizing your Score  Have you learned to distinguish high priority targets from low priority targets?  Have you come up with a strategy for prioritizing which target(s)s to engage in order to maximize the number of points you earn and minimize the number of points you lose? 224 APPENDIX G Stereotype Threat Condition Instructions (adapted from Beilock et al., 2007; Rydell, Shiffrin, et al., 2010) Full Instructions As mentioned during the first experimental session, there has been some controversy about whether there are true sex differences in math ability. A good deal of research indicates that males consistently score higher than females on standardized tests of math ability; however there are many cases where no such sex differences emerge. The research you are participating in today is aimed at better understanding why these differences exist. One reason why women appear to do more poorly on some math tasks (like word problems on the SAT or ACT) is that, relative to men, women are worse at quickly and correctly distinguishing relevant information needed to solve a problem from irrelevant/distracting information that is also provided. This skill is directly assessed in the radar control task you will be using in this study. By learning to operate the radar control simulation, we will be able to evaluate how you develop these skills and why women tend to have more difficulty processing such information than men. It is important that you give a strong effort during the experiment to help us in our analysis of why men and women differ in this math-related skill. Shortened Instructions As was mentioned at the outset, one reason why women perform more poorly than men on some math tasks like word problems on the SAT or ACT is that women tend to have more difficultly correctly choosing information needed to solve a problem from irrelevant/distracting information that is also provided. The research you are participating in today is aimed at examining your ability to learn this skill so that we may better understand why women have more difficulty processing such information than men. It is important that you continue to give a strong effort during the experiment to help us in our analysis of this topic. Performance Trial Instructions You have now finished all of your practice trials for today and will complete one final trial. This trial will look and function similarly to the previous scenarios you have practiced, but it will be more challenging. For this trial, the following changes have been made: 1. The trial length has been increased to eight minutes 2. The total number of targets has been increased substantially 3. Any target that crosses your defensive perimeters will now cause you to lose 150 points Remember that your performance on this trial will be taken into consideration when selecting the winners of the cash prize, so you should try to score as many points as possible. 225 APPENDIX H Control Condition Instructions (adapted from Beilock et al., 2007; Rydell, Shiffrin, et al., 2010) Full instructions As mentioned during the first experimental session, you may have noticed during your daily experiences that some people seem to be very good at “picking up on” how to perform new tasks they have never seen before. However, surprisingly little is known about the mental processes underlying this skill. This research is aimed at better understanding how individuals approach novel tasks and complete them. One reason why certain people do better on such tasks is that they may be better at quickly and correctly distinguishing relevant information needed to solve a problem from irrelevant/distracting information that is also provided. This skill is involved in completing the radar control task you will be using in this study. By observing how people learn to operate the radar control simulation, we hope to see how people develop these skills and why some people are differ on them. It is important that you give a strong effort during the experiment to help us in our analysis of why people differ in this skill. Shortened Instructions As was mentioned at the outset, people's ability to correctly distinguish important information needed to solve a problem from irrelevant/distracting information that is also provided may contribute to performance on many novel tasks. The research you are participating in today is aimed at examining this topic, the manner by which individuals learn this skill, and the extent to which it influences effectively operating the radar control simulation. It is important to us that you continue to give a strong effort during the experiment to help us in our analysis of this topic. Performance Trial Instructions You have now finished all of your practice trials for today and will complete one final trial. This trial will look and function similarly to the previous scenarios you have practiced, but it will be more challenging. For this trial, the following changes have been made: 1. The trial length has been increased to eight minutes 2. The total number of targets has been increased substantially 3. Any target that crosses your defensive perimeters will now cause you to lose 150 points The purpose of this trial is to give you an opportunity to test your newly developed skills in a more challenging context. 226 APPENDIX I Online Individual Difference Measures Demographics The questions below ask you to provide some basic information about yourself. Please answer these questions by selecting or typing the appropriate response. 1. What is your sex? a. Female b. Male 2. What is your age? a. ____ 3. Is English your first language (i.e., the language you consider your “native” language and the one in which you are most comfortable conversing in)? a. No b. Yes 4. Are you right- or left-handed? a. Right-handed b. Left-handed 5. How often do you play any sort of video game (on computer, console, phone, etc.)? a. Never (e.g., less than once a month) b. Rarely (e.g., once a week) c. Sometimes (e.g., 2-3 times a week) d. Frequently (e.g., 4-5 times a week) e. Always (e.g., daily) Cognitive Ability (SAT/ACT scores) In the spaces below, please provide your highest SAT and/or ACT score. Note that this score will ONLY be used for research purposes and will be kept confidential. If you do not remember your score, please put a zero in the space for SAT score. SAT score: _____________ ACT score: _____________ 227 Math Domain Identification (Smith & White, 2001) Using the following scale, please indicate the number that best describes how much you agree with each of the statements below. 1 Strongly disagree 1. 2. 3. 4. 2 Moderately Disagree 3 Neither agree or disagree 4 Moderately Agree 5 Strongly agree Mathematics is one of my best subjects. I have always done well in Math. I get good grades in Math. I do badly on tests of Mathematics (R). Please indicate the number that best describes you for each of the statements below using the following scale: 1 Not at all 5. 6. 7. 8. 2 3 Somewhat 4 How much do you enjoy math-related subjects? How likely would you be to take a job in a math-related field? How important is Math to the sense of who you are? How important is it to you to be good at Math? 9. Compared to other students, how good are you at math? a. Very poor b. Poor c. About the same d. Better than average e. Excellent 228 5 Very much APPENDIX J Experimental Session Measures Metacognitive Activity (Bell, 2002) For each of the items below, rate the extent to which you were thinking about these issues during the practice and performance trials today. 1 Never 2 Rarely 3 Sometimes 4 Frequently 5 Constantly 1. While practicing the simulation, I monitored how well I was learning its requirements. 2. I thought carefully about my performance on the previous trial before selecting what to study and practice. 3. As I performed in the practice trials, I evaluated how well I was learning the skills of the simulation. 4. When my methods were not successful, I experimented with different procedures for performing the task. 5. I considered the skills that needed the most work when choosing what to study and practice. 6. I tried to monitor closely the areas where I needed the most study and practice. 7. I noticed where I made the most mistakes during practice and focused on improving those areas. 8. I carefully determined what to study and practice in order to improve on weaknesses identified in previous trials. 9. I used my performance on the previous trial to revise how I would approach the task on the next trial. 10. I thought about new strategies for improving my performance. 11. I thought ahead to what I would do next to improve my performance. 12. I told myself things to encourage me to try harder. Perceived Stereotype Threat (adapted from Ployhart, Ziegert, & McFarland, 2003) The following series of questions ask about your perceptions and feelings regarding the experiment and radar control simulation task you just completed. Please answer each question honestly and openly to the best of your ability using the provided scale. 1 Strongly disagree 2 Disagree 3 Neither agree or disagree 4 Agree 5 Strongly agree 1. People likely believe that I would perform poorly on this task because of my gender. 2. This task may have been easier for people of my gender. (R) 229 3. The experimenter expected me to do poorly on this task because of my gender. 4. I was not worried that anyone would draw conclusions about my abilities on this task based on my gender. (R) 5. Tasks like the one I just completed have been used to discriminate against people from my gender. 6. During the experiment, I wanted to show that people of my gender could perform well on the radar control simulation task. 7. A negative opinion exists about how members of my gender should perform on this type of task. Manipulation Check Please indicate the extent to which you agree with each of the following statements using the provided scale. 1 Strongly disagree 2 Disagree 3 Neither agree or disagree 4 Agree 5 Strongly agree 1. The radar control task assesses skills related to mathematical ability. 2. People with high mathematical ability would likely do well on the radar control task. Declarative Knowledge (adapted from Bell, 2002) The following is a test that asks questions about facts you may have learned about the radar control simulation. Please answer each question to the best of your ability. [Day 1 Test] 1. If a target’s Response is Authorized, what is its likely Intent? a. Military b. Hostile c. Civilian d. Peaceful* 2. Which of the following characteristics indicates that a target is a Submarine? a. Speed = 20 knots* b. Communication Time = 55 seconds c. Direction of Origin = Orange Bay d. Countermeasures = Jamming 230 3. A Maneuvering Pattern of Code Delta indicates the target is which of the following? a. Air b. Military* c. Surface d. Civilian 4. A Green Beach Direction of Origin indicates the target is which of the following? a. Unknown b. Submarine c. Peaceful* d. Military 5. If a target’s Altitude/Depth is 10 feet, what is the Type of the target? a. Air* b. Surface c. Submarine d. Unknown 6. If a target’s Signal Strength is Indistinct, what Class does this suggest for the target? a. Surface b. Civilian c. Military d. Unknown* 7. If a target’s characteristics are Communication Time = 20 seconds and Speed = 50 knots, which of the following actions should you take? a. Choose Intent as Peaceful b. Choose Type as Surface c. Get another piece of information d. Choose Type as Air* 8. A Communication Time of 52 seconds indicates that the target is likely: a. Air b. Surface* c. Submarine d. Unknown 9. If a target’s characteristics are Countermeasures = None and Maneuvering Pattern = Code Foxtrot, which of the following actions should you take? a. Choose Class as Military b. Choose Intent as Peaceful c. Choose Class as Civilian* d. Choose Intent as Unknown 231 10. Which of the following targets should you make the final engagement decision to Clear? a. Air, Military, Peaceful* b. Submarine, Civilian, Peaceful c. Air, Civilian, Peaceful d. Surface, Civilian, Hostile 11. If a target’s Speed is 40 knots, what does this suggest about the target? a. The target is Surface b. The target is Civilian c. The target is Air* d. The target is Military [Day 2 Test] 1. If a target’s Identification Tag is Prince, what is its likely Intent? a. Military b. Hostile c. Peaceful* d. Civilian 2. Which of the following characteristics indicates that a contact is a Surface target? a. Speed = 20 knots b. Direction of Origin = Blue Lagoon c. Countermeasures = Jamming d. Communication Time = 55 seconds* 3. A Response that is Inaudible indicates the target is which of the following? a. Military b. Civilian c. Unknown* d. Hostile 4. Jamming Countermeasures suggest that the target is which of the following? a. Surface b. Military* c. Submarine d. Civilian 5. If a target’s Speed is 31 knots, what is the Type of the target? a. Surface* b. Air c. Submarine d. Unknown 232 6. Which of the following targets should you make the final engagement decision to Warn? a. Air, Military, Hostile b. Surface, Civilian, Peaceful c. Air, Military, Peaceful d. Submarine, Civilian, Peaceful* 7. If a target’s characteristics are Countermeasures = Inactive and Signal Strength = Weak, which of the following actions should you take? a. Choose Class as Civilian b. Get another piece of information* c. Choose Class as Military d. Choose Intent as Peaceful 8. A Communication Time of 110 seconds indicates that the target is likely: a. Air b. Surface c. Submarine* d. Unknown 9. If a target’s characteristics are Speed = 35 knots and Altitude/Depth = 15 feet, which of the following actions should you take? a. Choose Type as Surface b. Choose Type as Air* c. Choose Class as Military d. Choose Intent as Peaceful 10. If a target’s Response = Invalid, this suggests that the target falls into which category? a. Intent is Hostile* b. Class is Civilian c. Type is Air d. Class is Unknown 11. If a target’s Direction of Origin is Orange Bay, what does this suggest about the target? a. The target is Military b. The target is a Submarine c. The target is Hostile* d. The target is Peaceful [Day 3 Test] 1. If a target’s Maneuvering Pattern is Code Foxtrot, what is its likely Class? a. Civilian* b. Military c. Hostile d. Peaceful 233 2. Which of the following characteristics indicates that a contact is an Air target? a. Speed = 30 knots b. Communication Time = 35 seconds* c. Maneuvering Pattern = Code Delta d. Signal Strength = Weak 3. Which of the following targets should you make the final engagement decision to Mark? a. Surface, Civilian, Hostile b. Air, Civilian, Hostile c. Submarine, Military, Peaceful d. Submarine, Military, Hostile* 4. Which of the following characteristics indicates that a contact is Hostile? a. Maneuvering Pattern = Code Delta b. Communication Time = 115 seconds c. Response = Invalid* d. Countermeasures = Jamming 5. If a target’s Speed = 35 knots, what is the Type of the target? a. Air* b. Surface c. Submarine d. Unknown 6. If a target’s Identification Tag is Tango, what Intent does this suggest for the target? a. Civilian b. Peaceful c. Unknown d. Hostile* 7. If a target’s characteristics are Signal Strength = Weak and Maneuvering Pattern = Code Delta, which of the following actions should you take? a. Choose Intent as Hostile b. Choose Class as Civilian c. Choose Class as Military* d. Choose Intent as Peaceful 8. A Communication Time of 40 seconds indicates that the target is likely: a. Air* b. Surface c. Submarine d. Unknown 234 9. If a target’s characteristics are Direction of Origin = Orange Bay and Identification = Golf, which of the following actions should you take? a. Choose Class as Military b. Choose Type as Air c. Choose Intent as Hostile d. Get another piece of information* 10. If a target’s Direction of Origin is Blue Lagoon, this suggests that the target falls into which category? a. Intent is Unknown* b. Class is Military c. Intent is Peaceful d. Class is Unknown 11. If a target’s Speed = 0 knots, what does this suggest about the target? a. The target is Civilian b. The target is Air c. The target is Surface d. The target is a Submarine* 235 APPENDIX K Knowledge Structure Assessment Instructions (cf., Goldsmith et al., 1991) [Page 1 Instructions] Your task in this next exercise will involve judging the relatedness of pairs of concepts central to completing the Radar Control Simulation. In making these types of judgments, there are several ways to think about the items being judged. For instance, two concepts might be related because they share common features, frequently occur together, help you perform a task in the simulation, or serve a similar goal in the simulation. While this kind of detailed analysis is possible, our concern is to obtain your initial impression of "overall relatedness" among the presented concepts. Therefore, please base your ratings on your first impression of relatedness. The list of concepts important to operating the Radar Control Simulation are listed below; take a moment to look at the list and locate one or two highly related and unrelated pairs to give yourself an idea of what it means for two things to be highly versus not highly related. For example, why might Identify contact Class as Civilian and Make decision to Warn contact be related concepts? Concepts 1) Identify contact type as Air 9) Make decision to Warn contact 2) Identify contact type as Surface 10) Make decision to Mark contact 3) Identify contact type as Submarine 11) Gain/Lose Points 4) Identify contact class as Civilian 12) Zoom out/Zoom In 5) Identify contact class as Military 13) Monitor inner perimeter 6) Identify contact intent as Peaceful 14) Monitor outer perimeter 7) Identify contact intent as Hostile 15) Find/engage pop-up targets 8) Make decision to Clear contact 16) Prioritize targets (engage targets likely to cross a perimeter first) 236 [Page 2 Instructions] In this exercise, a pair of concepts from the previous list will be presented on the screen along with a 9-point "relatedness" scale as shown in the example below. Using this scale, you are to indicate your judgment of relatedness for each pair by selecting the appropriate number on the scale. You can think of these numbers as points along a relatedness scale, with higher numbers representing greater relatedness. Thus, if you feel that the two concepts shown to you are not related at all, you would select "1" for that concept pair; if you feel the concepts are highly related, you would select "9" for the concept pair. For example: How related are... Dog and Pet? Not at all related          1 2 3 4 5 6 7 8 9 Highly related [Page 3 Instructions] Before beginning the rating task, here are a few important things to note when making your ratings:    Try to use the full range of the rating scale to make your ratings. o Values near the ends of the scale (e.g,. 1-3, 7-9) should be used if you are very certain about how two concepts are related to each other o Values from the middle of the scale (e.g., 4-6) should be used to reflect either medium relatedness or uncertainty about how the concepts relate Go with your “gut” when making ratings. o It’s best to make quick/intuitive judgments rather than deliberate on each pair o A good goal to shoot for is to spend no more than 5 seconds on each rating Do your best to provide an honest and accurate portrayal of how you believe these concepts are related to each other. o At the end of the experiment, you will receive a copy of your knowledge maps that you will be able to compare against others in the experiment to see how you match up—so the more honest you are, the better (and more interesting) your results will be! On Days 2 and 3, participants also saw the following instruction:  Base your relatedness ratings on what you’ve learned about the task up to this point. o Your ideas about how concepts in the task relate to one another may have changed since last time due to the extra practice you’ve had with the task o Make your judgments based on your current understanding of the task rather than trying to reproduce the ratings you provided previously 237 REFERENCES 238 REFERENCES Ackerman, P.L. (1986). Individual differences in information processing: An investigation of intellectual abilities and task performance during practice. Intelligence, 10, 109-139. Ackerman, P.L. (1987). Individual differences in skill learning: An integration of psychometric and information processing perspectives. Psychological Bulletin, 102, 3-27. Ackerman, P.L., Beier, M.E., & Boyle, M.O. (2005). Working memory and intelligence: The same or different constructs? Psychological Bulletin, 131, 30-60. Ackerman, P.L., Bowen, K.R., Beier, M.E., & Kanfer, R. (2001). Determinants of individual differences and gender differences in knowledge. Journal of Educational Psychology, 93, 797-825. Adams, J.W., & Hitch G.J. (1997). Working memory and children’s mental addition. Journal of Experimental Child Psychology, 67, 21-38. Aiman-Smith, L., Scullen, S.E., & Barr, S.H. (2002). Conducting studies of decision making in organizational contexts: A tutorial for policy-capturing and other regression-based techniques. Organizational Research Methods, 5, 388-414. Anderson, J.R. (1982). Acquisition of a cognitive skill. Psychological Review, 89, 369-406. Anderson, J.R. (1993a). Rules of the mind. Hillsdale, NJ: Erlbaum. Anderson, J. A., (1993b). Problem-solving and learning. American Psychologist, 48, 35-44. Anderson, J.R. (1996). ACT: A simple theory of complex cognition. American Psychologist, 51, 355-365. Anderson, J.R., & Lebiere, C. (1998). The atomic components of thought. Mahwah, NJ: Erlbaum. Anderson, J.R., Bothell, D., Byrne, M.D., Douglass, S., Lebiere, C., & Qin, Y. (2004). An integrated theory of the mind. Psychological Review, 111, 1036-1060. Anderson, J.R., Reder, L.M., & Lebiere, C. (1996). Working memory: Activation limitations on retrieval. Cognitive Psychology, 30, 221–256. Aronson, J., Lustina, M.J., Good, C., Keough, K., Steele, C.M., & Brown, J. (1999). When White men can’t do math: Necessary and sufficient factors in stereotype threat. Journal of Experimental Social Psychology, 35, 29-46. Ausubel, D.P. (1963). Cognitive structure and the facilitation of meaningful verbal learning. Journal of Teacher Education, 14, 217–221. Baddeley, A.D. (1986). Working memory. Oxford, England: Oxford University Press. 239 Baddeley, A.D. (1992). Working memory. Science, 255. 556-559. Baddeley, A.D. (1997). Human memory: Theory and practice. East Sussex, England: Psychology Press. Baddeley, A.D. (2000). The episodic buffer: A new component of working memory? Trends in Cognitive Sciences, 4, 417–423. Baddeley, A.D. (2001). Is working memory still working? American Psychologist, 56, 849–864. Baddeley, A.D. (2003). Working memory: Looking back and looking forward. Nature Reviews Neuroscience, 4, 829-839, Baddely, A.D., & Hitch, G. (1974). Working memory. In G.A. Bower (Ed.), The psychology of learning and motivation (Vol. 8, pp. 47-89). New York: Academic Press. Baldwin, T.T., Ford, J.K, & Blume, B.D. (2009). Transfer of training 1988-2008: An updated review and agenda for future research. In G.P. Hodgkinson & J.K Ford (Eds.), International review of industrial and organizational psychology (Vol. 24, pp. 41-70). Barrouillet, P., Bernardin, S., & Camos, V. (2004). Time constraints and resource sharing in adults’ working memory spans. Journal of Experimental Psychology: General, 133, 83– 100. Barrouillet, P., & Lépine, R. (2005). Working memory and children’s use of retrieval to solve addition problems. Journal of Experimental Child Psychology, 91, 183–204. Bates, D., Maechler, M., Bolker, B. (2011). lme4: Linear mixed-effects models using S4 classes. R package version 0.999375-42. http://CRAN.R-project.org/package=lme4 Baumeister. R.F., & Vohs, K.D. (2004). Handbook of self-regulation: Research, theory, and applications. New York: Guilford. Beier, M.E., & Ackerman, P.L. (2005). Working memory and intelligence: Different constructs. Reply to Oberauer et al. (2005) and Kane et al. (2005). Psychological Bulletin, 131, 7275. Beilock, S.L., & Carr, T.H. (2005). When high-powered people fail: Working memory and “choking under pressure” in math. Psychological Science, 16, 101-105. Beilock, S.L., Gunderson, E.A., Ramirez, G., & Levine, S.C. (2010). Female teachers’ math anxiety affects girls’ math achievement. Proceedings of the National Academy of Sciences of the United States of America, 107, 1860-1863. 240 Beilock, S.L., Jellison, W.A., Rydell, R.J., McConnell, A.R., & Carr, T.H. (2006). On the causal mechanisms of stereotype threat: Can skills that don’t rely heavily on working memory still be threatened? Personality and Social Psychology Bulletin, 32, 1059–1071. Beilock, S.L., Rydell, R.J., & McConnell, A.R. (2007). Stereotype threat and working memory: Mechanisms, alleviation, and spillover. Journal of Experimental Psychology: General, 136, 256-176. Bell, B.S. (2002). An examination of the instructional, motivational, and emotional elements of error training. (Doctoral Dissertation). Michigan State University, East Lansing, MI. Bell, B.S., Kanar, A.M., & Kozlowski, S.W.J. (2008). Current issues and future directions in simulation-based training in North America. The International Journal of Human Resource Management, 19 (8), 1416-1434. Bell, B.S., & Kozlowski, S.W.J. (2002). Adaptive guidance: Enhancing self-regulation, knowledge, and performance in technology-based training. Personnel Psychology, 55, 267-306. Bell, B.S., & Kozlowski, S.W.J. (2007). Advances in technology-based training. In S. Werner (Ed.), Managing Human Resources in North America (pp. 27-42). New York: Routledge. Bell, B.S., & Kozlowski, S.W.J. (2008). Active learning: Effects of core training design elements on self-regulatory processes, learning, and adaptability. Journal of Applied Psychology, 93, 296-316. Ben-Zeev, T., Fein, S., & Inzlicht, M. (2005). Arousal and stereotype threat. Journal of Experimental Social Psychology, 41, 174–181. Bereiter, C., & Scardamalia, M. (1985). Cognitive coping strategies and the problem of “inert” knowledge. IN S. Chipman, J.W. Segal, & R. Glaser (Eds.), Thinking and learning skills: Vol. 2. Research and open questions (pp. 65-80). Hillsdale, NJ: Erlbaum. Blascovich, J., Spencer, S.J., Quinn, D., & Steele, C. (2001). African Americans and high blood pressure: The role of stereotype threat. Psychological Science, 12, 225–229. Bosson, J.K., Haymovitz, E.L., & Pinel, E.C. (2004). When saying and doing diverge: The effects of stereotype threat on self-reported versus non-verbal anxiety. Journal of Experimental Social Psychology, 40, 247–255. Brodish, A.B., & Devine, P.G. (2009). The role of performance-avoidance goals and worry in mediating the relationship between stereotype threat and performance. Journal of Experimental Social Psychology, 45, 180-185. 241 Brown, R.P., & Day, E.A. (2006). The difference isn't black and white: Stereotype threat and the race gap on Raven's Advanced Progressive matrices. Journal of Applied Psychology, 91, 979-985. Brown, R.P., & Lee, M.N. (2005). Stigma consciousness and the race gap in college academic achievement. Self & Identity, 4, 149–157. Brown, R.P., & Pinel, E.C. (2003). Stigma on my mind: Individual differences in the experience of stereotype threat. Journal of Experimental Social Psychology, 39, 626–633. Brunswik, E. (1952). The conceptual framework of psychology. International Encyclopedia of Unified Science, (Vol. 1, No. 10, pp. 655-768). Chicago, IL: The University of Chicago Press. Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical linear models. Newbury Park, CA: Sage. Budd, D., Whitney, P., & Turley, K.J. (1995). Individual differences in working memory strategies for reading expository text. Memory & Cognition, 23, 735–748. Cable, D., & Judge, T. (1994). Pay preferences and job search decisions: A person-organization fit perspective. Personnel Psychology, 47, 317-348. Cadinu, M., Maass, A., Rosabianca, A., & Kiesner, J. (2005). Why do women underperform under stereotype threat? Psychological Science, 16, 572-578. Campbell, J.P., McCloy, R.A., Oppler, S.H., & Sager, C.E. (1993). A theory of performance. In N. Schmitt & W.C. Borman (Eds.), Personnel selection in organizations. (pp. 35-70). San Francisco: Jossey-Bass Publishers. Cantor J., & Engle, R.W. (1993). Working-memory capacity as long-term memory activation: An individual differences approach. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19, 1101–1114. Carpenter P.A., Just M.A., & Shell P. (1990). What one intelligence test measures: A theoretical account of the processing in the Raven Progressive Matrices Test. Psychological Review, 97, 404-431. Carroll, J.B. (1993). Human cognitive abilities: A survey of factor-analytic studies. New York: Cambridge University Press. Cattell R.B. (1943). The measurement of adult intelligence. Psychological Bulletin, 40, 153–193. Chandler, M. (1999). Secrets of the SAT [PBS Frontline]. Boston, MA: WGBH Studios. Chase, W.G. & Simon, H.A. (1973). Perception in chess. Cognitive Psychology, 4, 55-81. 242 Chi, M.T.H., Feltovich, P., & Glaser, R. (1981). Categorization and representation of physics problems by experts and novices. Cognitive Science, 5, 121-152. Chi, M.T.H., Glaser, R., & Farr, M.J. (1988). The nature of expertise. Hillsdale, NJ: Erlbaum. Chi, M.T.H., Glaser, R., & Rees, E, (1982). Expertise in problem solving. In R. Sternberg (Ed.), Advances in the psychology of human intelligence. (pp. 7-75). Hillsdale, NJ: Erlbaum. Cloud, J. (2009). How stereotypes defeat the stereotyped. TIME. Retrieved from http://www.time.com/time/health/article/0,8599,1897009,00.html. Colby, S., Lee, S., Lewinger, J.P., & Bull, S. (2010). pmlr: Penalized multinomial logistic regression. R package version 1.0. http://CRAN.R-project.org/package=pmlr. Cole, B., Matheson, K., & Anisman, H. (2007). The moderating role of ethnic identity and social support on relations between well-being and academic performance. Journal of Applied Social Psychology, 37, 592-615. Conway A.R.A., Cowan N., Bunting M.F., Therriault D.J., & Minkoff S.R.B. (2002). A latent variable analysis of working memory capacity, short-term memory capacity, processing speed, and general fluid intelligence. Intelligence, 30, 163–184. Conway, A.R.A., Kane, M.J., Bunting, M.F., Hambrick, D.Z., Wilhelm, O., & Engle, R.W. (2005). Working memory span tasks: A methodological review and user’s guide. Psychonomic Bulletin & Review, 12, 769-786. Crocker, J., Major, B., & Steele, C. (1998). Social stigma. In D. Gilbert, S. Fiske, and G. Lindzey (Eds.), The handbook of social psychology, (Vol. 2, 4th ed., pp. 504-553). Boston: McGraw-Hill. Croizet, J., & Claire, T. (1998). Extending the concept of stereotype threat to social class: The Intellectual underperformance of students from low socioeconomic backgrounds. Personality and Social Psychology Bulletin, 24, 588-594. Croizet, J.-C., Després, G., Gauzins, M.-E., Huguet, P., Leyens, J.-P., & Méot, A. (2004). Stereotype threat undermines performance by triggering a disruptive mental load. Personality and Social Psychology Bulletin, 30, 721–731. Cullen, M.J., Hardison, C.M., & Sackett, P.R. (2004). Using SAT-grade and ability-job performance relationships to test predictions derived from stereotype threat theory. Journal of Applied Psychology, 89, 220-230. Cullen, M.J., Waters, S.D., & Sackett, P.R. (2006). Testing stereotype threat theory predictions for math-identified and non-math-identified students by gender. Human Performance, 19, 421-440. 243 Daily, L.Z., Lovett, M.C.,& Reder, L.M. (2001). Modeling individual differences in working memory performance: A source activation account. Cognitive Sciences, 25, 315–353. Daneman, M., & Merikle, P.M. (1996). Working memory and language comprehension: A metaanalysis. Psychonomic Bulletin & Review, 3, 422-433. Danaher, K., & Crandall, C.S. (2008). Stereotype threat in applied settings re-examined. Journal of Applied Social Psychology, 38, 1639-1655. Day, E., Winfred, A., & Gettman, D. (2001). Knowledge structures and the acquisition of a complex skill, Journal of Applied Psychology, 86, 1022-1033. Dearholt, D.W., & Schvaneveldt, R.W. (1990). Properties of pathfinder networks. In R.W. Schvaneveldt (Ed.), Pathfinder associative networks: Studies in knowledge organization (pp. 1-30). Norwood, NJ: Ablex Publishing Corp. Deary, I.J, Strand S., Smith, P., & Fernandes, C. (2007) Intelligence and educational achievement. Intelligence 35, 13–21. de Freitas, S., & Neumann, T. (2009). The use of “exploratory learning” for supporting immersive learning in virtual environments. Computers & Education, 52, 343-352. Debowski, S., Wood, R.E., & Bandura, A. (2001). Impact of guided exploration and enactive exploration on self-regulatory mechanisms and information acquisition through electronic search. Journal of Applied Psychology, 86, 1129–1141. DeShon, R.P., Kozlowski, S.W.J., Schmidt, A.M., Milner, K.R., & Weichmann, D. (2004). A multiple-goal, multilevel model of feedback effects on the regulation of individual and team performance. Journal of Applied Psychology, 89, 1035-1056. Dorsey, D.W., Campbell, G.E., Foster, L.L., & Miles, D.E. (1999). Assessing knowledge structures: Relations with experience and posttraining performance. Human Performance, 12, 31-57. Dunlosky, J., & Kane, M.J. (2007). The contributions of strategy use to working memory span: A comparison of strategy assessment methods. The Quarterly Journal of Experimental Psychology, 60, 1227-1245. Engle, R.W. (2002). Working memory capacity as executive attention. Current Directions in Psychological Science, 11, 19-23. Engle, R.W., & Kane, M.J. (2004). Executive attention, working memory capacity, and a twofactor theory of cognitive control. In B. Ross (Ed.), The psychology of learning and motivation (pp. 145–199). New York: Academic Press. 244 Engle, R.W., Tuholski, S.W., Laughlin, J.E., & Conway, A.R.A. (1999). Working memory, short-term memory, and general fluid intelligence: A latent variable approach. Journal of Experimental Psychology: General, 128, 309–331. Esposito, C. (1990). A graph-theoretic approach to concept clustering. In R.W. Schvaneveldt (Ed.), Pathfinder associative networks: Studies in knowledge organization (pp. 89-100). Norwood, NJ: Ablex Publishing Corp. Espy, K.A., McDiarmid, M.M., Cwik, M.F., Stalets, M.M., Hamby, A., & Senn, T.F. (2004). The contribution of executive functions to emergent mathematics skills in preschool children. Developmental Neuropsychology, 26, 465–486. Faria, A.J. (1998). Business simulation games: Current usage levels—an update. Simulation & Gaming, 29, 295-308. Faria, A.J., & Nulsen, R. (1996). Business simulation games: Current usage levels a ten year update. Developments in Business Simulation & Experiential Exercises, 23, 22-28. Feldman Barrett, L., Tugade, M.M., & Engle, R.W. (2004). Individual differences in working memory capacity and dual-process theories of the mind. Psychological Bulletin, 130, 553-573. Fernandez-Duque, D., Baird, J. A., & Posner, M. I. (2000). Executive attention and metacognitive regulation. Consciousness and Cognition, 9, 288–307. Festinger, L. (1957). A theory of cognitive dissonance. Stanford, CA: Stanford University Press. Forbes, C.E., Schmader, T., & Allen, J.J.B. (2008). The role of devaluing and discounting in performance monitoring: A neurophysiological study of minorities under threat. Social Cognition and Affective Neuroscience, 3, 253-261. Ford, J.K., & Kraiger, K. (1995). The application of cognitive constructs and principles to the instruction systems model of training: Implications for needs assessment, design and transfer. In C. L. Cooper & I. T. Robertson (Eds.), International review of industrial and organizational psychology (pp. 1–48). Chichester, United Kingdom: Wiley. Ford, J.K., Smith, E.M., Weissbein, D.A., Gully, S.M., & Salas, E. (1998). Relationships of goal orientation, metacognitive activity, and practice strategies with learning outcomes and transfer. Journal of Applied Psychology, 83, 218-233. Frantz, C.M., Cuddy, A.J.C., Burnett, M., Ray, H., & Hart, A. (2004). A threat in the computer: The race implicit association test as a stereotype threat experience. Personality and Social Psychology Bulletin, 30, 1611-1624. Frese, M., Albrecht, K., Altmann, A., Lang, J., Papstein, P.V., Peyerl, R., et al. (1988). The effects of an active development of the mental model in the training process: 245 Experimental results in a word processing system. Behaviour and Information Technology, 7, 295–304. Frey, M.C., & Detterman, D.K. (2004). Scholastic assessment or g? The relationship between the Scholastic Assessment Test and general cognitive ability. Psychological Science, 15, 373-378. Gagne, R.M. (1984). Learning outcomes and their effects: Useful categories of human performance. American Psychologist, 39, 377-385. Gawronski, B., & Bodenhausen, G.V. (2006). Associative and propositional processes in evaluation: An integrative review of implicit and explicit attitude change. Psychological Bulletin, 132, 692–731. Geary, D.C., Hoard, M.K., Byrd-Craven, J., & DeSoto, M.C. (2004). Strategy choices in simple and complex addition: Contributions of working memory and counting knowledge for children with mathematical disability. Journal of Experimental Child Psychology, 88, 121-151. Gigerenzer, G. (1991). From tools to theories: A heuristic of discovery in cognitive psychology. Psychological Review, 98, 254–267. Gigerenzer, G. (1993). The bounded rationality of probabilistic mental models. In K. I. Manktelow & D. E. Over (Eds.), Rationality: Psychological and philosophical perspectives (pp. 284–313), London: Routledge. Gigerenzer, G., & Selten, R. (Eds.). (2001). Bounded rationality: The adaptive toolbox. Cambridge, MA: MIT Press. Gigerenzer, G., Todd, P.M., & the ABC Research Group. (1999). Simple heuristics that make us smart. New York: Oxford University Press. Gilhooly, K.J., Logie, R.H., Wetherick, N.E., & Wynn, V. (1993). Working memory and strategies in syllogistic-reasoning tasks. Memory & Cognition, 21, 115-124. Glaser, R. (1990). The reemergence of learning theory within instructional research. American Psychologist, 45, 29-39. Goldsmith, T.E., & Davenport, D.M. (1990). Assessing structural similarity of graphs. In R.W. Schvaneveldt (Ed.), Pathfinder associative networks: Studies in knowledge organization (pp. 75-87). Norwood, NJ: Ablex. Goldsmith, T.E., Johnson, P.J., Acton, W.H. (1991). Assessing structural knowledge. Journal of Educational Psychology, 83, 88-96. 246 Goldstein, I.L., & Ford, J.K. (2002). Training in organizations: Needs assessment, development, th and evaluation (4 ed.). Belmont, CA: Wadsworth. Gonzales, P.M., Blanton, H., & Williams, K.J. (2002). The effects of stereotype threat and double-minority status on the test performance of Latino women. Personality and Social Psychology Bulletin, 28, 659-670. Good, C., Aronson, J., & Harder, J.A. (2008). Problems in the pipeline: Stereotype threat and women’s achievement in high-level math courses. Journal of Applied Developmental Psychology, 29, 17-28. Good, C., Aronson, J., & Inzlicht, M. (2003). Improving adolescents' standardized test performance: An intervention to reduce the effects of stereotype threat. Journal of Applied Developmental Psychology, 24, 645-662. Gottfredson, L.S. (1997) Why g matters: The complexity of everyday life. Intelligence 24, 79– 132. Grand, J.A., & Kozlowski, S.W.J. (in press). Seven basic principles for adaptability training in synthetic learning environments. In C. Best, G. Galanis, J. Kerry, & R. Sottilare (Eds.), Fundamental issues in defence training and simulation. Aldershot, UK: Ashgate. Grand, J.A., Ryan, A.M., Schmitt, N., & Hmurovic, J. (2011). How far does stereotype threat reach? The potential detriment of face validity in cognitive ability testing. Human Performance, 24, 1-28. Grier, R.A., Warm, J.S., Dember, W.N., Matthews, G., Galinsky, T.L., Szalma, J.L., & Parasuraman, R. (2003). The vigilance decrement reflects limitations in effortful attention, not mindlessness. Human Factors, 45, 349–359. Grimm, L.R., Markman, A.B., Maddox, W.T., & Baldwin, G.C. (2009). Stereotype threat reinterpreted as a regulatory mismatch. Journal of Personality and Social Psychology, 96, 288-304. Halpern, D. F. (2000). Sex differences in cognitive abilities (3rd ed.). Mahwah, NJ: L. Erlbaum Associates. Halpern, D.F., Benbow, C.P., Geary, D.C., Gur, R.C., Hyde, J.S., & Gernsbacher, M.A. (2007). The science of sex differences in science and mathematics. Psychological Science in the Public Interest, 8, 1-51. Harkins, S. G. (2006). Mere effort as the mediator of the evaluation–performance relationship. Journal of Personality and Social Psychology, 91, 436–455. 247 Harrison, L.A., Stevens, C.M., Monty, A.N., & Coakley, C.A. (2006). The consequences of stereotype threat on the academic performance of white and non-white lower income college students. Social Psychology of Education, 9, 341-357. Heider, F. (1958). The psychology of interpersonal relations. New York: Wiley. Heinze, G., & Schemper, M. (2002). A solution to the problem of separation in logistic regression. Statistics in Medicine, 21, 2409-2419. Hernstein, R.J., & Murray, C. (1994). The bell curve: Intelligence and class structure in American life. New York, NY: Free Press. Hess, T.M., Auman, C., Colcombe, S.J., & Rahhal, T.A. (2003). The impact of stereotype threat on age differences in memory performance. Journal of Gerontology: Psychological Sciences, 58, 3–11. Higgins, E.T. (1987). Self-discrepancy theory: A theory relating self and affect. Psychological Review, 94, 319–340. Hunt, E. (1994). Problem-solving. In R.J. Sternberg (Ed.), Thinking and problem-solving. (pp. 215-232). San Diego, CA: Academic Press. Hyde, J.S., Fennema, E., & Lamon, S.J. (1990). Gender differences in mathematics performance: A meta-analysis. Psychological Bulletin, 107,139-153. Ifenthaler, D., Masduki, I., & Seel, N.M. (2011). The mystery of cognitive structure and how we can detect it: Tracking the development of cognitive structures over time. Instructional Science, 39, 41-61. Ifenthaler, D., & Seel, N.M. (2005). The measurement of change: Learning-dependent progression of mental models. Technology, Instruction, Cognition and Learning, 2, 317– 336. Imbo, I., & Vandierendonck, A. (2007). The development of strategy use in elementary school children: Working memory and individual differences. Journal of Experimental Child Psychology, 96, 284-309. Interlink. (2011). FAQs. Retrieved July 12, 2011 from http://interlinkinc.net/FAQ.html Inzana, C.M., Driskell, J.E., Salas, E., &Johnston, J.H. (1996). Effects of preparatory information on enhancing performance under stress. Journal of Applied Psychology, 81, 429-435. Inzlicht, M., & Ben-Zeev, T. (2000). A threatening intellectual environment: Why females are susceptible to experiencing problem-solving deficits in the presence of males. Psychological Science, 11, 365-371. 248 Inzlicht, M., McKay, L., & Aronson, J. (2006). Stigma as ego depletion: How being the target of prejudice affects self-control. Psychological Science, 17, 262–269. Iran-Nejad, A. (1990). Active and dynamic self-regulation of learning processes. Review of Educational Research, 60, 573-602. Ivancic, K., & Hesketh, B. (2000). Learning from error in a driving simulation: Effects on driving skill and self-confidence. Ergonomics, 43, 1966–1984. Jaeggi, S.M., Buschkuehl, M., Jonides, J., & Perrig, W.J. (2008). Improving fluid intelligence with training on working memory. Proceedings of the National Academy of Sciences of the United States of America, 105, 6829-6833. Jamieson, J.P., & Harkins, S.G. (2007). Mere effort and stereotype threat performance effects. Journal of Personality and Social Psychology, 93, 544–564. Jensen, A.R. (1998). The g factor: The science of mental ability. Westport, CT: Praeger. Johns, M., Inzlicht, M., & Schmader, T. (2008). Stereotype threat and executive resource depletion: The influence of emotion regulation. Journal of Experimental Psychology: General, 137, 691-705. Johnson-Laird, P. (1983). Mental models. Cambridge, MA: Harvard University Press. Jonassen, D.H., Beissner, K., & Yacci, M. (1993). Structural knowledge: Techniques for representing, conveying, and acquiring structural knowledge. Hilsdale, NJ: Lawrence Erlbaum. Josephs, R.A., Newman, M.L., Brown, R.P., & Beer, J.M. (2003). Status, testosterone, and human intellectual performance: Stereotype threat as status concern. Psychological Science, 14, 158–163. Just, M.A., & Carpenter, P.N. (1992). A capacity theory of comprehension: Individual differences in working memory. Psychological Review, 99, 122–149. Kahneman, D., & Frederick, S. (2005). A model of heuristic judgment. In K.J. Holyoak & R.G. Morrison (Eds.), The Cambridge handbook of thinking and reasoning. (pp. 267-293). New York: Cambridge University Press. Kamouri, A.L., Kamouri, J., & Smith, K.H. (1986). Training by exploration: Facilitating the transfer of procedural knowledge through analogical reasoning. International Journal of Man-Machine Studies, 24, 171-192. Kane, M.J., Bleckley, M.K., Conway, A.R.A., & Engle, R.W. (2001). A controlled-attention view of WM capacity. Journal of Experimental Psychology: General, 130, 169–183. 249 Kane, M.J., & Engle, R.W. (2003). Working-memory capacity and the control of attention: The contributions of goal neglect, response competition, and task set to Stroop interference. Journal of Experimental Psychology: General, 132, 47–70. Kane, M.J., Hambrick, D.Z., & Conway, A.R.A. (2005). Working memory capacity and fluid intelligence are strongly related constructs: Comment on Ackerman, Beier, and Boyle (2005). Psychological Bulletin, 131, 66-71. Kane, M.J., Hambrick, D.Z., Tuholski, S.W., Wilhem, O., Payne, T.W., & Engle, R.W. (2004). The generality of working memory: A latent variable approach to verbal and visuospatial memory span and reasoning. Journal of Experimental Psychology: General, 133, 189– 217. Kane, M.J., Conway, A.R.A, Hambrick, D.Z., & Engle, R.W. (2007). Variation in working memory as variation in executive attention and control. In A.R.A. Conway, C. Jarrold, M.J. Kane, A. Miyake, & J.N. Towse (Eds.), Variation in working memory (pp. 21-48). Oxford, England: Oxford University Press. Kanfer, R., & Ackerman, P.L. (1989). Training the human information processor. In I.L. Goldstein (Ed.), Training and development in organizations (pp. 121-182). San Francisco: Jossey-Bass. Karren, R.J., & Barringer, M.W. (2002). A review of the policy-capturing methodology in organizational research: Guidelines for research and practice. Organizational Research Methods, 5, 337-361. Keifer, A.K., & Sekaquaptewa, D. (2007). Implicit stereotypes and women's math performance: How implicit gender-math stereotypes influence women's susceptibility to stereotype threat. Journal of Experimental Social Psychology, 43, 825-832. Keller, J. (2002). Blatant stereotype threat and women’s math performance: Self-handicapping as a strategic means to cope with obtrusive negative performance expectations. Sex Roles, 47, 193–198. Keller, J. (2007). Stereotype threat in classroom settings: The interactive effect of domain identification, task difficulty and stereotype threat on female students' math performance. British Journal of Educational Psychology, 77, 323-338. Kieras, D., & Meyer, D.E. (1997). An overview of the EPIC architecture for cognition and performance with application to human-computer interaction. Human-Computer Interaction, 12, 391–438. Kieras, D.E., Meyer, D.E., Mueller, S., & Seymour, T. (1999). Insights into working memory from the perspective of the EPIC architecture for modeling skilled perceptual-motor performance. In P. Shah & A. Miyake (Eds.), Models of working memory: Mechanisms of 250 active maintenance and executive control (pp. 183–223). Cambridge, England: Cambridge University Press. Koch, S.C., Müller, S.M., & Sieverding, M. (2008). Women and computers. Effects of stereotype threat on attribution of failure. Computers & Education, 51, 1795-1803. Koenig, A.M., & Eagly, A.H. (2005). Stereotype threat in men on a test of social sensitivity. Sex Roles, 52, 489-496. Koenig, K.A., Frey, M.C., & Detterman, D.K. (2008). ACT and general cognitive ability. Intelligence, 36, 153-160. König, C.J., Bühner, M., & Mürling, G. (2005). Working memory, fluid intelligence, and attention are predictors of multitasking performance, but polychronicity and extraversion are not. Human Performance, 18, 243-266. nd Kosslyn, S.M., & Rosenberg, R.S. (2004). Psychology: The brain, the person, the world (2 ed.). Boston, MA: Pearson. Koubek, R.J., Clarkston, T.P., & Calvez, V. (1994). The training of knowledge structures for manufacturing tasks: An empirical study. Ergonomics, 37, 765–780. Kozlowski, S.W.J., Gully, S.M., Brown, K.G., Salas, E., Smith, E.M., & Nason, E.R. (2001). Effects of training goals and goal orientation traits on multidimensional training outcomes and performance adaptability. Organizational Behavior and Human Decision Processes, 85, 1-31. Kozlowski, S.W.J., Toney, R.J., Mullins, M.E., Weissbein, D.A., Brown, K.G., & Bell, B.S. (2001). Developing adaptability: A theory for the design of integrated-embedded training systems. In E. Salas (Ed.), Advances in human performance and cognitive engineering research (Vol. 1, pp. 59-123). Amsterdam: JAI/Elsevier Science. Kraiger, K., Ford, J.K, & Salas, E. (1993). Application of cognitive, skill-based, and affective theories of learning outcomes to new methods of training evaluation. Journal of Applied Psychology, 78, 311-328. Kraiger, K., Salas, E., & Cannon-Bowers, J.A. (1995). Measuring knowledge organization as a method for assessing learning during training. Human Factors, 37, 804-816. Kray, L.J., Galinksy, A.D., & Thompson, L. (2002). Reversing the gender gap in negotiations: An exploration of stereotype regeneration. Organizational Behavior and Human Decision Processes, 87, 386-409. Kray, L.J., Thompson, L., & Galinsky, A. (2001). Battle of the sexes: Gender stereotype confirmation and reactance in negotiations. Journal of Personality and Social Psychology, 80, 942–958. 251 Kristof-Brown, A.L., Jansen, K.J., & Colbert, A.E. (2002). A policy-capturing study of the simultaneous effects of fit with jobs, groups, and organizations. Journal of Applied Psychology, 87, 985-993. Kyllonen, P.C. (1996). Is working memory capacity Spearman’s g? In I. Dennis & P. Tapsfield (Eds.), Human abilities: Their nature and measurement (pp. 49-75). Hillsdale, NJ: Erlbaum. Kyllonen P.C., & Christal R.E. (1990). Reasoning ability is (little more than) working-memory capacity? Intelligence, 14, 389-433. Lievens, F., Reeve, C.L., & Heggestad, E.D. (2007). An examination of psychometric bias due to retesting on cognitive ability tests in selection settings. Journal of Applied Psychology, 92, 1672-1682. Lesko, A.C., & Corpus, J.H. (2006). Discounting the difficult: How high math identified women respond to stereotype threat. Sex Roles, 54, 113-125. Levy, B. (1996). Improving memory in old age through implicit self-stereotyping. Journal of Personality and Social Psychology, 71, 1092-1107. Leyens, J.-P., Désert, M., Croizet, J.-C., & Darcis, C. (2000). Stereotype threat: Are lower status and history of stigmatization preconditions of stereotype threat? Personality and Social Psychology Bulletin, 26, 1189-1199. Lipshitz, R., Levy, D.L., & Orchen, K. (2006). Is this problem to be solved? A cognitive schema of effective problem-solving. Thinking and Reasoning, 12, 413-430. Loewenstein, M.A., & Speltzer, J.R. (2000). Formal and informal training: Evidence from the NLSY. Research in Labor Economics, 18, 403-438. Loman, N.L., & Mayer, R.E. (1983). Signaling techniques that increase the understandability of expository prose. Journal of Educational Psychology, 75, 402-412. Maas, C.J., & Hox, J.J. (2005). Sufficient sample sizes for multilevel modeling. Methodology, 1, 86-92. Major, B., Spencer, S.J., Schmader, T., Wolfe, C.T., & Crocker, J. (1998). Coping with negative stereotypes about intellectual performance: The role of psychological disengagement. Personality and Social Psychology Bulletin, 24, 34-50. Marshall, H. (1996). Recent and emerging theoretical frameworks for research on classroom teaching: Contributions and limitations [Special Issue]. Educational Psychologist, 31(3/4). 252 Marshall, N., & Glock, M.D. (1979). Comprehension of connected discourse: A study into the relationships between the structure of text and information recalled. Reading Research Quarterly. 14, 10-56. Marx, D.M. & Goff, P.A. (2005). Clearing the air: The effect of experimenter race on target's test performance and subjective experience. British Journal of Social Psychology, 44, 645657. Marx, D.M., & Stapel, D.A. (2006a). Distinguishing stereotype threat from priming effects: On the role of the social self and threat-based concerns. Journal of Personality and Social Psychology, 91, 243–254. Marx, D.M., & Stapel, D.A. (2006b). It’s all in the timing: Measuring emotional reactions to stereotype threat before and after taking a test. European Journal of Social Psychology, 36, 687–698. Marx, D.M., Stapel, D.A., & Muller, D. (2005). We can do it: The interplay of construal orientation and social comparison under threat. Journal of Personality and Social Psychology, 88, 432-446. Mayer, R.E. (2004). Should there be a three-strikes rule against pure discovery learning? The case for guided methods of instruction. American Psychologist, 59, 14-19. McDaniel, M.A., & Schlager, M.S. (1990). Discovery learning and transfer of problem-solving skills. Cognition and Instruction, 7, 129–159. McNamara, D.S., & Scott, J.L. (2001). Working memory capacity and strategy use. Memory & Cognition, 29, 10-17. Medin, D.L., Ross, N.O., Atran, S. Cox, D., Coley, J., Proffitt, J.B., & Blok, S. (2006). Folkbiology of freshwater fish. Cognition, 99, 237-273. Mendoza-Denton, R., Purdie, V., Downey, G., & Davis, A. (2002). Sensitivity to status-based rejection: Implications for African-American students’ college experience. Journal of Personality and Social Psychology, 83, 896–918. Messick, S. (1984). Abilities and knowledge in educational achievement testing: The assessment of dynamic cognitive structures. In B.S. Plake (Ed.), Social and technical issues in testing: Implications for test construction and usage (pp. 156-172). Hillsdale, NJ: Erlbaum. Meyer, B.J.F., Brandt, D.M., & Bluth, G.J. (1980). Use of top-level structure in text: Key for reading comprehension of ninth-grade students. Reading Research Quarterly, 16, 72-103. Meyer, B.J.F., & Rice, E. (1982). The interaction of reader strategies and the organization of text. Text, 2, 155-192. 253 McGlone, M.S., & Aronson, J. (2006). Stereotype threat, identity salience, and spatial reasoning. Journal of Applied Developmental Psychology, 27, 486-493. McKay, P.F., Doverspike, D., Bowen-Hilton, D., & Martin, Q. D. (2002). Stereotype threat effects on the Raven Advanced Progressive Matrices scores of African Americans. Journal of Applied Social Psychology, 32, 767–787. McKay, P.F., Doverspike, D., Bowen-Hilton, D, & McKay, Q.D. (2003). The effects of demographic variables and stereotype threat on black/white differences in cognitive ability test performance. Journal of Business and Psychology, 18, 1-14. Miller, G.A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review. 63, 81-97. Muraven, M., & Baumeister, R.F. (2000). Self-regulation and depletion of limited resources: Does self-control resemble a muscle? Psychological Bulletin, 126, 247–259. Nagy, P. (1984). Cognitive structure and the spatial metaphor. In P. Nagy (Ed.), The representation of cognitive structure (p. 1-11). Toronto, Canada: Ontario Institute for Studiesin Education. Neisser, U., Boodoo, G., Bouchard, T.J., Boykin, A.W., Brody, N., Ceci, S.J.,...Urbina, S. (1996). Intelligence: Knowns and unknowns. American Psychologist, 51, 77-101. Newell, A. (1990). Unified theories of cognition. Cambridge, MA: Harvard University Press. Nguyen, H-H., O’Neal, A., & Ryan, A.M. (2003). Relating test-taking attitudes and skills and stereotype threat effects to the racial gap in cognitive ability test performance. Human Performance, 16, 261-293. Nguyen, H-.H., & Ryan, A.M. (2008). Does stereotype threat test affect test performance of minorities and women? A meta-analysis of experimental evidence. Journal of Applied Psychology, 93, 1314-1334. Nosek, B.A., Banaji, M.R., & Greenwald, A.G. (2002). Math = male, me = female, therefore math ≠ me. Journal of Personality and Social Psychology, 83, 44–59. O’Brien, L.T., & Crandall, C.S. (2003). Stereotype threat and arousal: Effects on women’s math performance. Personality and Social Psychology Bulletin, 29, 782–789. Oberauer, K., Schulze, R., Wilhelm, O., & Süß, H-M. (2005). Working memory and intelligence—their correlation and their relation: Comment on Ackerman, Beier, and Boyle (2005). Psychological Bulletin, 131, 61-65. Phillips, D.C. (1998). How, why, what, when, and where: Perspectives on constructivism in psychology and education. Issues in Education, 3, 151–194. 254 Ployhart, R.E., Ziegert, J.C., & McFarland, L.A. (2003). Understanding racial differences on cognitive ability tests in selection contexts: An integration of stereotype threat and applicant reactions research. Human Performance, 16, 231-259. Prawat, R.S. (1989). Promoting access to knowledge, strategy, and disposition in students: A research synthesis. Review of Educational Research, 59, 1-41. Pressing, J. (1999). The referential dynamics of cognition and action. Psychological Review, 106, 714-747. Pressley, M., Snyder, B.S., Levin, J.R., Murray, H.G., & Ghatala, E.S. (1987). Perceived Readiness for Examination Performance (PREP): Produced by initial reading of text and text containing adjunct questions. Reading Research Quarterly, 22, 219-236. Pronin, E., Steele, C., & Ross, L. (2004). Identity bifurcation in response to stereotype threat: Women and mathematics. Journal of Experimental Social Psychology, 40, 152-168. Quinn, D.M., Kahng, S.K., & Crocker, J. (2004). Discreditable: Stigma effects of revealing a mental illness history on test performance. Personality and Social Psychology Bulletin, 30, 803-815. Quinn, D.M., & Spencer, S.J. (2001). The interference of stereotype threat with women’s generation of mathematical problem-solving strategies. Journal of Social Issues, 57, 55– 71. R Development Core Team (2012). R: A language and environment for statistical computing. R Statistical Computing, Vienna, Austria. www.R-project.org. Radvansky, G.A., & Zacks, R.T. (1991). Mental models and the fan effect. Journal of Experimental Psychology: Learning, Memory, and Cognition, 17, 940-953. Reder, L.M., & Anderson, J.R. (1980). A comparison of texts and their summaries: Memorial consequences. Journal of Verbal Learning & Verbal Behavior, 19, 121-134. Rieman, J. (1996). A field study of exploratory learning strategies. ACM Transactions on Computer-Human Interaction, 3, 189-218. Rivers, C. (2007). Shock jocks wield dangerous ‘stereotype threat.’ WomensEnews. Retrieved from http://www.womensenews.org/story/media-stories/070416/shock-jocks-wielddangerous-stereotype-threat. Rohde, T.E, & Thompson L.A. (2007) Predicting academic achievement with cognitive ability. Intelligence 35, 83–92. 255 Rosen, V.M., & Engle, R.W. (1997). The role of working memory capacity in retrieval. Journal of Experimental Psychology: General, 126, 211–227. Rosen, V.M., & Engle, R.W. (1998). Working memory capacity and suppression. Journal of Memory and Language, 39, 418–436. Rotundo, M., & Sackett, P.R. (2002). The relative importance of task, citizenship, and counterproductive performance to global ratings of job performance: A policy-capturing approach. Journal of Applied Psychology, 87, 66-80. Rouse, W.B., & Morris, N.M. (1986). On looking into the black box: Prospects and limits in the search for mental models. Psychological Bulletin, 100, 349-363. Rowe, A.L., Cooke, N.J., Hall, E.P., & Halgren, T.L. (1996). Toward an on-line knowledge assessment methodology: Building on the relationship between knowing and doing. Journal of Experimental Psychology: Applied, 2, 31–47. Royer, J.M., Tronsky, L.N., Chan, Y., Jackson, S.J., & Marchant, H. (1999). Math-fact retrieval as the cognitive mechanism underlying gender differences in math test performance. Contemporary Educational Psychology, 24, 181-266. Rumelhart, D.E., & Norman, D.A. (1978). Accretion, tuning and restructuring: Three models of learning. In R. L. Klatzky & J. W. Cotton (Eds.), Semantic factors in cognition (pp. 37– 53). Hillsdale, NJ: Lawrence Erlbaum. Rydell, R.J., Rydell, M.T., & Boucher, K.L. (2010). The effect of negative performance stereotypes on learning. Journal of Personality and Social Psychology, 99, 883-896. Rydell, R.J., Shiffrin, R.M., Boucher, K.L., Van Loo, K., & Rydell, M.T. (2010). Stereotype threat prevents perceptual learning. Proceedings of the National Academy of Sciences of the United States of America, 107, 14042-14047. Sackett, P.R., Hardison, C.M., & Cullen, M.J. (2004). On interpreting stereotype threat as accounting for African American–White differences on cognitive tests. American Psychologist, 59, 7–13. Sackett, P.R., & Ryan, A.M. (2012). Concerns about generalizing stereotype threat research findings to operational high stakes testing. In M. Inzlicht & T. Schmader (Eds.), Stereotype threat (pp. 249-263). Oxford Press. Sackett, P.R., Schmitt, N., Ellingson, J.E., & Kabin, M.B. (2001). High-stakes testing in employment, credentialing, and higher education: Prospects in a post-affirmative-action world. American Psychologist, 56, 302–318. 256 Scherbaum, C.A., & Ferreter, J.M. (2009). Estimating statistical power and required sample sizes for organizational research using multilevel modeling. Organizational Research Methods, 12, 347-367. Schmader, T. (2002). Gender identification moderates stereotype threat effects on women's Math performance. Journal of Experimental Social Psychology, 38, 194-201. Schmader, T. (2010). Stereotype threat deconstructed. Current Directions in Psychological Science, 19, 14-18. Schmader, T., Forbes, C.E., Zhang, S., & Berry Mendes, W. (2009). A metacognitive perspective on the cognitive deficits experience in intellectually threatening environments. Personality and Social Psychology Bulletin, 35, 584-596. Schmader, T., & Johns, M. (2003). Converging evidence that stereotype threat reduces working memory capacity. Journal of Personality and Social Psychology, 85, 440–452. Schmader, T., Johns, M., & Barquissau, M. (2004). The costs of accepting gender differences: The role of stereotype endorsement in women's experience in the math domain. Sex Roles, 50, 835-850. Schmader, T., Johns, M., & Forbes, C. (2008). An integrated process model of stereotype threat effects on performance. Psychological Review, 115, 336-356. Schmidt, F.L. (2002). The role of general cognitive ability and job performance: Why there cannot be a debate. Human Performance, 15, 187-210. Schoenfeld, A.H., & Herrmann, D.J. (1982). Problem perception and knowledge structure in expert and novice mathematical problem solvers. Journal of Experimental Psychology: Learning, Memory, and Cognition, 8, 484–494. Schuelke, M.J., Day, E.A., McEntire, L.E., Boatman, P.R., Boatman, J.E., Kowollik, V., & Wang, X. (2009). Relating indices of knowledge structure coherence and accuracy to skill-based performance: Is there utility in using a combination of indices? Journal of Applied Psychology, 94, 1076-1085. Schunn C.D., & Reder L.M. (2001). Another source of individual differences: Strategy adaptivity to changing rates of success. Journal of Experimental Psychology: General, 130, 59-76. Schwartz, D.L., & Bransford, J.D. (1998). A time for telling. Cognition and Instruction, 16, 475522. Seel, N. M. (1999). Educational diagnosis of mental models: Assessment problems and technology-based solutions. Journal of Structural Learning and Intelligent Systems, 14, 153–185. 257 Seibt, B., & Förster, J. (2004). Stereotype threat and performance: How self-stereotypes influence processing by inducing regulatory foci. Journal of Personality and Social Psychology, 87, 38–56. Shavelson, R.J. (1972). Some aspects of the correspondence between content structure and cognitive structure in Physics education. Journal of Educational Psychology, 63, 225– 234. Shavelson, R.J. (1974). Methods for examining representations of a subject-matter structure in student memory. Journal of Research in Science Teaching, 11, 231–249. Shepard, R.N. (1987). Toward a universal law of generalization for psychological science. Science, 237, 1317-1323. Shiffrin, R.M., & Lightfoot, N. (1997). Perceptual learning of alphanumeric-like characters. Psychology of Learning and Motivation, 36, 45-81. Shiffrin, R.M., & Schneider, W. (1977). Controlled and automatic human information processing: II. Perceptual learning, automatic attending, and a general theory. Psychological Review, 84, 127-190. Shih, M., Pittinsky, T.L., & Ambady, N. (1999). Stereotype susceptibility: Identity salience and shifts in quantitative performance. Psychological Science, 10, 80-83. Shih, M., Pittinsky, T.L., & Trahan, A. (2006). Domain-specific effects of stereotypes on performance. Self and Identity, 5, 1-14. Shimamura, A.P. (2000). Toward a cognitive neuroscience of metacognition. Consciousness and Cognition, 9, 313-323. Simon, D., & Simon, H.A (1978). Individual differences in solving physics problems. In R. Siegler (Ed.), Children's .thinking: What develops? (pp. 324-348). Hillsdale, NJ: Erlbaum.. Simon, H.A. (1956). Rational choice and the structure of the environment. Psychological Review, 63, 129-138. Simon, H. A. (1957). Model of man: Social and rational. New York: Wiley. Simon, H.A. (1974). How big is a chunk? Science, 183, 482-488. Simon, H.A. (1990). Invariants of human behavior. Annual Review of Psychology, 41, 1‐19. Smith, J.L., & White, P.H. (2001). Development of the domain identification measure: A tool for investigating stereotype threat effects. Educational and Psychological Measurement, 61, 1040-1057. 258 Smith, J. (2004). Understanding the process of stereotype threat: A review of meditational variables and new performance goal directions. Educational Psychology Review, 16, 177206. Smith, J. (2006). The interplay among stereotypes, performance-avoidance goals, and women’s math performance expectations. Sex Roles, 54, 287-296. Smith, J.L., Sansone, C., & White, P.H. (2007). The stereotyped task engagement process: The role of interest and achievement motivation. Journal of Educational Psychology, 99, 99114. Smith, M.E., McEvoy, L.K., & Gevins, A. (1999). Neurophysiological indices of strategy development and skill acquisition. Cognitive Brain Research, 7, 389-404. Smith-Jentsch, K.A., Mathieu, J.E., & Kraiger, K. (2005). Investigating linear and interactive effects of shared mental models on safety and efficiency in a field setting. Journal of Applied Psychology, 90, 523–535. Snijders, T.A.B. & R.J. Bosker (1993). Standard errors and sample sizes in two-level research. Journal of Educational Statistics, 18, 237-260. Spencer, S.J., Steele, C.M., & Quinn, D.M. (1999). Stereotype threat and women's math performance. Journal of Experimental Social Psychology, 35, 4-28. Stangor, C. (2000). (Ed.). Stereotypes and prejudice. Philadelphia, PA: Psychology Press. Stangor, C., & Lange, J. (1994). Mental representations of social groups: Advances in conceptualizing stereotypes and stereotyping. Advances in Experimental Social Psychology, 26, 357-416. Steffe, L.P., & Gale, J. (Eds.). (1995). Constructivism in education. Mahwah, NJ: Erlbaum. Steele, C.M. (1997). A threat in the air: How stereotypes shape intellectual identity and performance. American Psychologist, 52, 613-629. Steele, C.M., & Aronson, J. (1995). Stereotype threat and the intellectual test performance of African Americans. Journal of Personality and Social Psychology, 69, 797–811. Steele, C.M., & Aronson, J. (1998). Stereotype threat and the test performance of academically successful African Americans. In C. Jencks & M. Phillips (Eds.), The Black–White test score gap (pp. 401–427). Washington, DC: Brookings. Steele, C.M., & Davies, P.G. (2003). Stereotype threat and employment testing: A commentary. Human Performance, 16, 311-326. 259 Steele, C.M., Spencer, S.J., & Aronson, J. (2002). Contending with group image: The psychology of stereotype and social identity threat. In M. P. Zanna (Ed.), Advances in experimental social psychology (pp. 379–440). San Diego, CA: Academic Press. Sternberg, R.J., Conway, B.E., Ketron, J.L., & Bernstein, M. (1981). People’s conceptions of intelligence. Journal of Personality and Social Psychology, 41, 37-55. Sternberg, R.J., & Grigorenko, E.L. (2004). Intelligence and culture: How culture shapes what intelligence means, and the implications for a science of well-being. Philosophical Transactions of the Royal Society B, 359, 1427-1434. Steyvers, M., & Tennenbaum, J.B. (2005). The large-scale structure of semantic networks: Statistical analyses and a model of semantic growth. Cognitive Science, 29, 41-78. Stone, J. (2002). Battling doubt by avoiding practice: The effect of stereotype threat on selfhandicapping in white athletes. Personality and Social Psychology Bulletin, 28, 16671678. Stone, J., Lynch, C.I., Sjomeling, M., & Darley, J.M. (1999). Stereotype threat effects on black and white athletic performance. Journal of Personality and Social Psychology, 77, 12131227. Stone, J., & McWhinnie, C. (2008). Evidence that blatant versus subtle stereotype threat cues impact performance through dual processes. Journal of Experimental Social Psychology, 44, 445-452. Stricker, L.J., & Ward, W.C. (2004). Stereotype threat, inquiring about test takers' ethnicity and gender, and standardized test performance. Journal of Applied Social Psychology, 34, 665-693. Stricker, L.J., & Ward, W.C. (2008). Stereotype threat in applied settings re-examined: A reply. Journal of Applied Social Psychology, 38, 1656-1663. Summers, G.J. (2004). Today’s business simulation industry. Simulation & Gaming, 35, 208241. Sweller, J., Mawer, R.F., & Ward, M.R. (1983). Development of expertise in mathematical problem solving. Journal of Experimental Psychology: General, 112, 639-661. Taber, K.S. (2000). Multiple frameworks? Evidence of manifold conceptions in individual cognitive structure. International Journal of Science Education & Training, 22, 399–417. te Nijenhuis J., van Vianen A.E.M., & van der Flier H. (2007) Score gains on g-loaded tests: No g. Intelligence 35, 283–300. 260 Todd, P.M., & Gigerenzer, G. (2007). Environments that make us smart: Ecological rationality. Current Directions in Psychological Science, 16, 167-171. Turner, M.L., & Engle, R.W. (1989). Is working memory task dependent? Journal of Memory and Language, 28, 127–154. Tversky, A. (1977). Features of similarity. Psychological Review, 84, 327-352. Unsworth, N., Heitz, R.P., Schrock, J.C., & Engle, R.W. (2005). An automated version of the operation span task. Behavior Research Methods, 37, 498-505. van Merriënboer, J.J.G., & Sweller, J. (2005). Cognitive load theory and complex learning: Recent developments and future directions. Educational Psychology Review, 17, 147-177. von Hippel, W., von Hippel, C., Conway, L., Preacher, K. J., Schooler, J. W., & Radvansky, G. A. (2005). Coping with stereotype threat: Denial as an impression management strategy. Journal of Personality and Social Psychology, 89, 22-35. Wagner, R.K., & Sternberg, R.J. (1984). Alternative conceptions of intelligence and their implications for education. Review of Educational Research, 54, 179-223. Walsh, M., Hickey, C., & Duffy, J. (1999). Influence of item content and stereotype situation on gender differences in mathematical problem solving. Sex Roles, 41, 219-240. Walton, G.M., & Cohen, G.L. (2003). Stereotype lift. Journal of Experimental Social Psychology, 39, 456–467. Watts, D.J. (1999). Small worlds: The dynamics of networks between order and randomness. Princeton, NJ: Princeton University Press. Watts, D.J., & Strogatz, S.H. (1998). Collective dynamics of “small-world” networks. Nature, 393, 440–442. Weaver, J.L., Bowers, C.A., Salas, E., & Cannon-Bowers, J.A. (1995). Networked simulations: New paradigms for team performance research. Behavioral Research Methods, Instruments, & Computers, 27, 12–24. Webber, S.S., Chen, G., Payne, S.C., Marsh, S.M., & Zacarro, S.J. (2000). Enhancing team mental model measurement with performance appraisal practices. Organizational Research Methods, 3, 307-322. Weiss, H.M. (1990). Learning theory and industrial psychology. In M.D. Dunnnette & L.M. Hough (Eds.), Handbook of industrial and organizational psychology. Palo Alto, CA: Consulting Psychologists Press. 261 Wenzlaff, R. M., & Wegner, D. M. (2000). Thought suppression. Annual Review of Psychology, 51, 59–91. Wexley, K.N., & Latham, G.P. (1991). Developing and training human resources in organizations. New York: Harper Collins. Wout, D., Danso, H., Jackson, J., & Spencer, S. (2008). The many faces of stereotype threat: Group- and self-threat. Journal of Experimental Social Psychology, 44, 792-799. Wraga, M., Helt, M., Jacobs, E., & Sullivan, K. (2007). Neural basis of stereotype-induced shifts in women’s mental rotation performance. Social Cognition and Affective Neuroscience, 2, 12–19. Yeung, N.C.J., & von Hippel, C. (2008). Stereotype threat increases the likelihood that female drivers in a simulator run over jaywalkers. Accident Analysis & Prevention, 40, 667-674. Yopyk, D.J.A., & Prentice, D.A. (2005). Am I an athlete or a student? Identity salience and stereotype threat in student-athletes. Basic and Applied Social Psychology, 27, 329-336. Zentall, S.S. (1990). Fact-retrieval automatization and math problem solving by learning disabled, attention disordered, and normal adolescents. Journal of Educational Psychology, 82, 856–865. 262